BruteSuppression: a size reduction method for Apriori rule sets

412 Accesses
14 Citations
Explore all metrics

Abstract

Association rule mining can provide genuine insight into the data being analysed; however, rule sets can be extremely large, and therefore difficult and time-consuming for the user to interpret. We propose reducing the size of Apriori rule sets by removing overlapping rules, and compare this approach with two standard methods for reducing rule set size: increasing the minimum confidence parameter, and increasing the minimum antecedent support parameter. We evaluate the rule sets in terms of confidence and coverage, as well as two rule interestingness measures that favour rules with antecedent conditions that are poor individual predictors of the target class, as we assume that these represent potentially interesting rules. We also examine the distribution of the rules graphically, to assess whether particular classes of rules are eliminated. We show that removing overlapping rules substantially reduces rule set size in most cases, and alters the character of a rule set less than if the standard parameters are used to constrain the rule set to the same size. Based on our results, we aim to extend the Apriori algorithm to incorporate the suppression of overlapping rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

Expert deduction rules in data mining with association rules: a case study

Article 18 May 2018

Generalised Partial Association in Causal Rules Discovery

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The implementation of Apriori used for our experiments is restricted to categorical attributes.
Where ATT is categorical, ‘=’ is the only applicable OP.
We use the notation A ~B to indicate the set {x: x ∈ A ∧ x ∉ B}.
Records with the value ‘?’ in the HouseVotes dataset are not removed, as they do not represent missing data.

References

Agrawal, R., Imieliński, T., Swami, A. (1993). Mining association rules between sets of items in large databases. In ACM SIGMOD record (Vol. 22, pp. 207–216). ACM.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487–499).
Ali, K., Manganaris, S., Srikant, R. (1997). Partial classification using association rules. In Proceedings of the third international conference on knowledge discovery and data mining (pp. 115–118).
Balcázar, J. (2009). Confidence width: An objective measure for association rule novelty. In Workshop on quality issues, measures of interestingness and evaluation of data mining models QIMIE (Vol. 9).
Bayardo, R., Agrawal, R., Gunopulos, D. (2000). Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery, 4(2), 217–240.
Article Google Scholar
Bayardo Jr., R., & Agrawal, R. (1999). Mining the most interesting rules. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 145–154). ACM.
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J., Yang, C. (2001). Finding interesting associations without support pruning. IEEE Transactions on Knowledge and Data Engineering, 13(1), 64–78.
Article Google Scholar
Freitas, A. (1999). On rule interestingness measures. Knowledge-Based Systems, 12(5–6), 309–315.
Article Google Scholar
Fukuda, T., Morimoto, Y., Morishita, S., Tokuyama, T. (1996). Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. In ACM SIGMOD Record (Vol. 25, pp. 13–23). ACM.
Gebhardt, F. (1991). Choosing among competing generalizations. Knowledge Acquisition, 3(4), 361–380.
Article Google Scholar
Goodman, L., & Kruskal, W. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49(268), 732–764.
MATH Google Scholar
Hills, J., Davis, L.M., Bagnall, A. (2012). Interestingness measures for fixed consequent rules. In Intelligent Data Engineering and Automated Learning - IDEAL 2012 (pp. 68–75).
Hussain, F., Liu, H., Suzuki, E., Lu, H. (2000). Exception rule mining with a relative interestingness measure. In Knowledge discovery and data Mining. Current issues and new applications (pp. 86–97).
de la Iglesia, B., Richards, G., Philpott, M., Rayward-Smith, V. (2006). The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification. European Journal of Operational Research, 169(3), 898–917.
Article MathSciNet MATH Google Scholar
Lavrač, N., Flach, P., Zupan, B. (1999). Rule evaluation measures: A unifying view. In Inductive logic programming (pp. 174–185).
Liu, H., Lu, H., Feng, L., Hussain, F. (1999). Efficient search of reliable exceptions. In Methodologies for knowledge discovery and data mining (pp 194–204).
Liu, H., Liu, L., Zhang, H. (2011). A fast pruning redundant rule method using Galois connection. Applied Soft Computing, 11(1), 130–137.
Article Google Scholar
Major, J., & Mangano, J. (1995). Selecting among rules induced from a hurricane database. Journal of Intelligent Information Systems, 4(1), 39–52.
Article Google Scholar
Ohsaki, M., Kitaguchi, S., Okamoto, K., Yokoi, H., Yamaguchi, T. (2004). Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In Knowledge discovery in databases: PKDD 2004 (pp. 362–373).
Paper Authors (2012). Companion website. https://sites.google.com/site/brutesuppression. Accessed 30 Nov 2012.
Reynolds, A., & de la Iglesia, B. (2006). Rule induction using multi-objective metaheuristics: Encouraging rule diversity. In International joint conference on neural networks, 2006. IJCNN’06. (pp 3343–3350). IEEE.
Richards, G., & Rayward-Smith, V. (2001). Discovery of association rules in tabular data. In Proceedings of the IEEE international conference on data mining (p. 465). IEEE Computer Society.
Richards, G., & Rayward-Smith, V. (2005). The discovery of association rules from tabular databases comprising nominal and ordinal attributes. Intelligent Data Analysis, 9(3), 289–307.
Google Scholar
Sarma, P.K.D., & Mahanta, A.K. (2012). Reduction of number of association rules with inter itemset distance in transaction databases. International Journal of Database Management Systems, 4(5), 61–82.
Article Google Scholar
Shaharanee, I.N.M. and Hadzic, F., Dillon, T.S. (2011). Interestingness measures for association rules based on statistical validity. Knowledge-Based Systems, 24(3), 386–392.
Article Google Scholar
Tamir, R., & Singer, Y. (2006). On a confidence gain measure for association rule discovery and scoring. The VLDB Journal, 15(1), 40–52.
Article Google Scholar
Xu, Y., Li, Y., Shaw, G. (2011). Reliable representations for association rules. Data & Knowledge Engineering, 70(6), 555–575.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
Jon Hills, Anthony Bagnall, Beatriz de la Iglesia & Graeme Richards

Authors

Jon Hills
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Bagnall
View author publications
You can also search for this author in PubMed Google Scholar
Beatriz de la Iglesia
View author publications
You can also search for this author in PubMed Google Scholar
Graeme Richards
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jon Hills.

Additional information

This work was supported by the UEA Annual Alumni Fund.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hills, J., Bagnall, A., de la Iglesia, B. et al. BruteSuppression: a size reduction method for Apriori rule sets. J Intell Inf Syst 40, 431–454 (2013). https://doi.org/10.1007/s10844-012-0232-5

Download citation

Received: 19 January 2012
Revised: 05 December 2012
Accepted: 06 December 2012
Published: 10 January 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10844-012-0232-5

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

Expert deduction rules in data mining with association rules: a case study

Generalised Partial Association in Causal Rules Discovery

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

BruteSuppression: a size reduction method for Apriori rule sets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

Expert deduction rules in data mining with association rules: a case study

Generalised Partial Association in Causal Rules Discovery

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation