×

Itemset mining: a constraint programming perspective. (English) Zbl 1353.68233

Summary: The field of data mining has become accustomed to specifying constraints on patterns of interest. A large number of systems and techniques has been developed for solving such constraint-based mining problems, especially for mining itemsets. The approach taken in the field of data mining contrasts with the constraint programming principles developed within the artificial intelligence community. While most data mining research focuses on algorithmic issues and aims at developing highly optimized and scalable implementations that are tailored towards specific tasks, constraint programming employs a more declarative approach. The emphasis lies on developing high-level modeling languages and general solvers that specify what the problem is, rather than outlining how a solution should be computed, yet are powerful enough to be used across a wide variety of applications and application domains.
This paper contributes a declarative constraint programming approach to data mining. More specifically, we show that it is possible to employ off-the-shelf constraint programming techniques for modeling and solving a wide variety of constraint-based itemset mining tasks, such as frequent, closed, discriminative, and cost-based itemset mining. In particular, we develop a basic constraint programming model for specifying frequent itemsets and show that this model can easily be extended to realize the other settings. This contrasts with typical procedural data mining systems where the underlying procedures need to be modified in order to accommodate new types of constraint, or novel combinations thereof. Even though the performance of state-of-the-art data mining systems outperforms that of the constraint programming approach on some standard tasks, we also show that there exist problems where the constraint programming approach leads to significant performance improvements over state-of-the-art methods in data mining and as well as to new insights into the underlying data mining problems. Many such insights can be obtained by relating the underlying search algorithms of data mining and constraint programming systems to one another. We discuss a number of interesting new research questions and challenges raised by the declarative constraint programming approach to data mining.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)

Software:

OPL; cc(FD); CLOSET; MINION; LCM

References:

[1] Agrawal, Rakesh; Imielinski, Tomasz; Swami, Arun N., Mining association rules between sets of items in large databases, (Proceedings of the ACM SIGMOD International Conference on Management of Data (1993), ACM Press), 207-216
[2] Agrawal, Rakesh; Mannila, Hiekki; Srikant, Ramakrishnan; Toivonen, Hannu; Inkeri Verkamo, A., Fast discovery of association rules, (Advances in Knowledge Discovery and Data Mining (1996), AAAI Press), 307-328
[3] Apt, Krzysztof R.; Wallace, Mark, Constraint Logic Programming Using Eclipse (2007), Cambridge University Press: Cambridge University Press New York, NY, USA · Zbl 1119.68044
[4] Bay, Stephen D.; Pazzani, Michael J., Detecting change in categorical data: mining contrast sets, (Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (1999), ACM Press), 302-306
[5] Bayardo, Roberto J.; Agrawal, Rakesh; Gunopulos, Dimitrios, Constraint-based rule mining in large, dense databases, Data Mining and Knowledge Discovery, 4, 2/3, 217-240 (2000)
[6] Beldiceanu, Nicolas; Carlsson, Mats; Demassey, Sophie; Petit, Thierry, Global constraint catalogue: past, present and future, Constraints, 12, 21-62 (March 2007) · Zbl 1128.68092
[7] Bessiere, Christian; Hebrard, Emmanuel; OʼSullivan, Barry, Minimising decision tree size as combinatorial optimisation, (Principles and Practice of Constraint Programming. Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, vol. 5732 (2009), Springer), 173-187
[8] Bonchi, Francesco; Goethals, Bart, FP-bonsai: the art of growing and pruning small fp-trees, (Advances in Knowledge Discovery and Data Mining. Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol. 3056 (2004), Springer), 155-160
[9] Bonchi, Francesco; Lucchese, Claudio, Extending the state-of-the-art of constraint-based pattern discovery, Data and Knowledge Engineering, 60, 2, 377-399 (2007)
[10] Brailsford, Sally C.; Potts, Chris N.; Smith, Barbara M., Constraint satisfaction problems: algorithms and applications, European Journal of Operational Research, 119, 3, 557-581 (1999) · Zbl 0938.90055
[11] Bucila, Cristian; Gehrke, Johannes; Kifer, Daniel; White, Walker M., DualMiner: a dual-pruning algorithm for itemsets with constraints, Data Mining and Knowledge Discovery, 7, 3, 241-272 (2003)
[12] Chang, Ming-Wei; Ratinov, Lev-Arie; Rizzolo, Nicholas; Roth, Dan, Learning and inference with constraints, (Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008), AAAI Press), 1513-1518
[13] Cheng, Hong; Xifeng, Yan; Han, Jiawei; Hsu, Chih-Wei, Discriminative frequent pattern analysis for effective classification, (Proceedings of the 23rd International Conference on Data Engineering (2007), IEEE), 716-725
[14] Cheng, Hong; Xifeng, Yan; Han, Jiawei; Yu, P. S., Direct discriminative pattern mining for effective classification, (Proceedings of the 24th International Conference on Data Engineering (2008), IEEE), 169-178
[15] Cussens, James, Bayesian network learning by compiling to weighted max-sat, (Proceedings of the 24th Conference in Uncertainty in Artificial Intelligence (2008), AUAI Press), 105-112
[16] De Raedt, Luc; Guns, Tias; Nijssen, Siegfried, Constraint programming for itemset mining, (Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008), ACM), 204-212 · Zbl 1353.68233
[17] De Raedt, Luc; Guns, Tias; Nijssen, Siegfried, Constraint programming for data mining and machine learning, (Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (2010), AAAI Press), 1513-1518 · Zbl 1353.68233
[18] De Raedt, Luc; Kramer, Stefan, The levelwise version space algorithm and its application to molecular fragment finding, (Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (2001), Morgan Kaufmann), 853-862
[19] De Raedt, Luc; Zimmermann, Albrecht, Constraint-based pattern set mining, (Proceedings of the Seventh SIAM International Conference on Data Mining (2007), SIAM), 1-12
[20] Dong, Guozhu; Li, Jinyan, Efficient mining of emerging patterns: discovering trends and differences, (Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (1999), ACM Press), 43-52
[21] Fan, Wei; Zhang, Kun; Cheng, Hong; Gao, Jing; Xifeng, Yan; Han, Jiawei; Yu, Philip S.; Verscheure, Olivier, Direct mining of discriminative and essential frequent patterns via model-based search tree, (Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008), ACM), 230-238
[22] Frisch, Alan M.; Grum, Matthew; Jefferson, Christopher; Martínez Hernández, Bernadette; Miguel, Ian, The design of essence: a constraint language for specifying combinatorial problems, (Proceedings of the 20th International Joint Conference on Artificial Intelligence (2007), Morgan Kaufmann), 80-87
[23] Fürnkranz, Johannes; Flach, Peter A., ROC ‘n’ rule learning - towards a better understanding of covering algorithms, Machine Learning, 58, 1, 39-77 (2005) · Zbl 1075.68071
[24] (Ganter, Bernhard; Stumme, Gerd; Wille, Rudolf, Formal Concept Analysis, Foundations and Applications. Formal Concept Analysis, Foundations and Applications, Lecture Notes in Computer Science, vol. 3626 (2005), Springer) · Zbl 1087.68003
[26] Gent, Ian P.; Jefferson, Christopher; Miguel, Ian, MINION: a fast scalable constraint solver, (Proceeding of the 17th European Conference on Artificial Intelligence (2006), IOS Press), 98-102
[27] Bart Goethals, Mohammed J. Zaki, Advances in frequent itemset mining implementations: report on FIMIʼ03, in: SIGKDD Explorations Newsletter, vol. 6, 2004, pp. 109-117.; Bart Goethals, Mohammed J. Zaki, Advances in frequent itemset mining implementations: report on FIMIʼ03, in: SIGKDD Explorations Newsletter, vol. 6, 2004, pp. 109-117.
[28] Grosskreutz, Henrik; Rüping, Stefan; Wrobel, Stefan, Tight optimistic estimates for fast subgroup discovery, (Machine Learning and Knowledge Discovery in Databases. Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, vol. 5211 (2008), Springer), 440-456
[29] Tias Guns, Siegfried Nijssen, Luc De Raedt, \(k\); Tias Guns, Siegfried Nijssen, Luc De Raedt, \(k\) · Zbl 1211.68142
[30] Han, J.; Pei, J.; Yin, Y., Mining frequent patterns without candidate generation, (Proceedings of the ACM SIGMOD International Conference on Management of Data (2000), ACM Press), 1-12
[31] Han, Jiawei; Cheng, Hong; Xin, Dong; Yan, Xifeng, Frequent pattern mining: current status and future directions, Data Mining and Knowledge Discovery, 15, 1, 55-86 (2007)
[32] Kavsek, Branko; Lavrac, Nada; Jovanoski, Viktor, APRIORI-SD: adapting association rule learning to subgroup discovery, (Advances in Intelligent Data Analysis. Advances in Intelligent Data Analysis, Lecture Notes in Computer Science, vol. 2810 (2003), Springer), 230-241
[33] Mannila, Heikki; Toivonen, Hannu, Levelwise search and borders of theories in knowledge discovery, Data Mining and Knowledge Discovery, 1, 3, 241-258 (1997)
[34] Morimoto, Yasuhiko; Fukuda, Takeshi; Matsuzawa, Hirofumi; Tokuyama, Takeshi; Yoda, Kunikazu, Algorithms for mining association rules for binary segmentations of huge categorical databases, (Proceedings of 24rd International Conference on Very Large Data Bases (1998), Morgan Kaufmann), 380-391
[35] Morishita, Shinichi; Sese, Jun, Traversing itemset lattice with statistical metric pruning, (Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2000), ACM), 226-236
[36] Nijssen, Siegfried; Fromont, Élisa, Optimal constraint-based decision tree induction from itemset lattices, Data Mining and Knowledge Discovery, 21, 1, 9-51 (2010)
[37] Nijssen, Siegfried; Guns, Tias, Integrating constraint programming and itemset mining, (Machine Learning and Knowledge Discovery in Databases. Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, vol. 6322 (2010), Springer), 467-482 · Zbl 1353.68233
[38] Nijssen, Siegfried; Guns, Tias; De Raedt, Luc, Correlated itemset mining in ROC space: a constraint programming approach, (Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009), ACM), 647-656 · Zbl 1353.68233
[39] Nijssen, Siegfried; Kok, Joost N., Multi-class correlated pattern mining, (Knowledge Discovery in Inductive Databases. Knowledge Discovery in Inductive Databases, Lecture Notes in Computer Science, vol. 3933 (2005), Springer), 165-187 · Zbl 1178.68206
[40] Pasquier, Nicolas; Bastide, Yves; Taouil, Rafik; Lakhal, Lotfi, Discovering frequent closed itemsets for association rules, (Database Theory. Database Theory, Lecture Notes in Computer Science, vol. 1540 (1999), Springer), 398-416 · Zbl 0983.68511
[41] Pei, Jian; Han, Jiawei, Can we push more constraints into frequent pattern mining?, (Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000), ACM), 350-354
[42] Pei, Jian; Han, Jiawei; Lakshmanan, Laks V. S., Mining frequent item sets with convertible constraints, (Proceedings of the IEEE International Conference on Data Engineering (2001), IEEE), 433-442
[43] Pei, Jian; Han, Jiawei; Mao, Runying, Closet: an efficient algorithm for mining frequent closed itemsets, (ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2000), ACM), 21-30
[44] Perron, Laurent, Search procedures and parallelism in constraint programming, (Principles and Practice of Constraint Programming. Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, vol. 1713 (1999), Springer), 346-360
[45] Rossi, Francesca; van Beek, Peter; Walsh, Toby, Handbook of Constraint Programming (Foundations of Artificial Intelligence) (2006), Elsevier Science Inc. · Zbl 1175.90011
[46] Schulte, Christian, Programming Constraint Services: High-Level Programming of Standard and New Constraint Services, Lecture Notes in Computer Science, vol. 2302 (2002), Springer · Zbl 0994.68038
[47] Schulte, Christian; Stuckey, Peter J., Efficient constraint propagation engines, Transactions on Programming Languages and Systems, 31, 1, 1-43 (2008)
[48] Sese, Jun; Morishita, Shinichi, Answering the most correlated \(n\) association rules efficiently, (Principles of Data Mining and Knowledge Discovery. Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, vol. 2431 (2002), Springer), 410-422 · Zbl 1020.68886
[49] Shenoy, Pradeep; Haritsa, Jayant R.; Sudarshan, S.; Bhalotia, Gaurav; Bawa, Mayank; Devavrat, Shah, Turbo-charging vertical mining of large databases, (Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (2000), ACM), 22-33
[50] Soulet, Arnaud; Crømilleux, Bruno, An efficient framework for mining flexible constraints, (Advances in Knowledge Discovery and Data Mining. Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol. 3518 (2005), Springer), 43-64
[51] Uno, Takeaki; Kiyomi, Masashi; Arimura, Hiroki, LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining, (Proceedings of the 1st International Workshop on Open Source Data Mining (2005), ACM), 77-86
[52] Van Hentenryck, Pascal; Deville, Yves, (The Cardinality Operator: A New Logical Connective for Constraint Logic Programming (1993), MIT Press: MIT Press Cambridge, MA, USA), 383-403
[53] Van Hentenryck, Pascal; Perron, Laurent; Puget, Jean-Francois, Search and strategies in OPL, ACM Transations Computational Logic, 1, 2, 285-320 (2000) · Zbl 1365.90281
[54] Van Hentenryck, Pascal; Saraswat, Vijay A.; Deville, Yves, Design, implementation, and evaluation of the constraint language cc(FD), Journal of Logic Programming, 37, 1-3, 139-164 (1998) · Zbl 0920.68026
[55] Wrobel, Stefan, An algorithm for multi-relational discovery of subgroups, (Principles of Data Mining and Knowledge Discovery. Principles of Data Mining and Knowledge Discovery, Lecture Notes in Computer Science, vol. 1263 (1997), Springer), 78-87
[56] Javeed Zaki, Mohammed; Gouda, Karam, Fast vertical mining using diffsets, (Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2003), ACM), 326-335
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.