×

Hybrid ASP-based approach to pattern mining. (English) Zbl 1472.68152

Summary: Detecting small sets of relevant patterns from a given data set is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like answer set programming (ASP) seem well suited for specifying such criteria in a form of constraints. Although progress has been made, on the one hand, on solving individual mining problems and, on the other hand, developing generic mining systems, the existing methods focus either on scalability or on generality. In this paper, we make steps toward combining local (frequency, size, and cost) and global (various condensed representations like maximal, closed, and skyline) constraints in a generic and efficient way. We present a hybrid approach for itemset, sequence, and graph mining which exploits dedicated highly optimized mining systems to detect frequent patterns and then filters the results using declarative ASP. To further demonstrate the generic nature of our hybrid framework, we apply it to a problem of approximately tiling a database. Experiments on real-world data sets show the effectiveness of the proposed method and computational gains for itemset, sequence, and graph mining, as well as approximate tiling.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68N17 Logic programming
68T10 Pattern recognition, speech recognition

References:

[1] Aggarwal, C. C.2015. Data Mining: The Textbook. Springer, Cham. · Zbl 1311.68001
[2] Agrawal, R., Imielinski, T. and Swami, A.1993. Mining association rules between sets of items in large databases. In SIGMOD 1993, ACM, New York, NY, USA, 207-216.
[3] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A. I.1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Menlo Park, 307-328.
[4] Alviano, M., Dodaro, C., Faber, W., Leone, N. and Ricca, F.2013. WASP: A native asp solver based on constraint learning. In International Conference on Logic Programming and Nonmonotonic Reasoning. Springer, Berlin, 54-66.
[5] Aoga, J. O. R., Guns, T. and Schaus, P.2016. An efficient algorithm for mining frequent sequence with constraint programming. In ECML PKDD, Springer, Cham, 315-330.
[6] Babai, L.2015. Graph isomorphism in quasipolynomial time. CoRR abs/1512.03547. · Zbl 1376.68058
[7] Bonchi, F. and Lucchese, C.2006. On condensed representations of constrained frequent patterns. Knowledge and Information Systems9, 2, 180-201.
[8] Cook, S. A.1971. The complexity of theorem-proving procedures. In Proc. of the 3rd Annual ACM Symposium on Theory of Computing, Shaker Heights, OH, USA, 3-5 May 1971, Harrison, M. A., Banerji, R. B., and Ullman, J. D., Eds. ACM, New York, 151-158. · Zbl 0253.68020
[9] Eiter, T., Brewka, G., Dao-Tran, M., Fink, M., Ianni, G. and Krennwallner, T.2009a. Combining nonmonotonic knowledge bases with external sources. In Frontiers of Combining Systems, 7th International Symposium, Ghilardi, S. and Sebastiani, R., Eds. Frontiers of Combining Systems. FroCoS 2009. Lecture Notes in Computer Science, vol. 5749. Springer, Berlin, Heidelberg, 16-18 Sep. 2009, 18-42. · Zbl 1193.68242
[10] Eiter, T., Ianni, G. and Krennwallner, T.2009b. Answer set programming: A primer. In 5th International Reasoning Web Summer School (RW 2009), Brixen/Bressanone, Italy, 30 Aug.-4 Sep. 2009. LNCS, vol. 5689. Springer, Berlin. · Zbl 1254.68248
[11] Faber, W., Pfeifer, G., Leone, N., Dell’Armi, T. and Ielpa, G.2008. Design and implementation of aggregate functions in the DLV system. CoRR abs/0802.3137. · Zbl 1156.68010
[12] Gebser, M., Guyet, T., Quiniou, R., Romero, J. and Schaub, T.2016. Knowledge-based sequence mining with ASP. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, AAAI, New York, NY, USA, 9-15 July 2016.
[13] Gebser, M., Kaminski, R., Kaufmann, B. and Schaub, T.2012. Answer Set Solving in Practice. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers, San Rafael. · Zbl 1251.68060
[14] Gebser, M., Kaufmann, B., Neumann, A. and Schaub, T.2007. clasp: A conflict-driven answer set solver. In LPNMR, Springer, 260-265. · Zbl 1149.68332
[15] Geerts, F., Goethals, B. and Mielikäinen, T.2004. Tiling databases. In Proc. of the 7th International Conference on Discovery Science, DS 2004, Springer, Padova, Italy, 2-5 Oct. 2004, 278-289. · Zbl 1110.68373
[16] Gelfond, M. and Lifschitz, V.1988. The stable model semantics for logic programming. In Proceedings of ICLP/SLP, MIT Press, 1070-1080.
[17] Guns, T., Dries, A., Nijssen, S., Tack, G. and De Raedt, L.2017. MiningZinc: A declarative framework for constraint-based mining. Artificial Intelligence244, 6-29. · Zbl 1404.68106
[18] Guns, T., Nijssen, S., and De Raedt, L.2013. k-pattern set mining under constraints. IEEE Transactions on Knowledge and Data Engineering25, 2, 402-418.
[19] Guns, T., Paramonov, S. and Négrevergne, B.2016. On declarative modeling of structured pattern mining. In Declarative Learning Based Programming, Papers from the 2016 AAAI Workshop, AAAI Press, Phoenix, AZ, USA, 13 Feb. 2016.
[20] Guyet, T., Moinard, Y. and Quiniou, R.2014. Using answer set programming for pattern mining. CoRR abs/1409.7777.
[21] Jabbour, S., Sais, L. and Salhi, Y.2013. Boolean satisfiability for sequence mining. In 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, 27 Oct.-1 Nov. 2013, ACM, 649-658.
[22] Jabbour, S., Sais, L. and Salhi, Y.2015. Decomposition based SAT encodings for itemset mining problems. In PAKDD, Springer, 662-674.
[23] Järvisalo, M.2011. Itemset mining as a challenge application for answer set enumeration. In Proc. of the 11th International Conference on Logic Programming and Nonmonotonic Reasoning, LPNMR 2011, Vancouver, Canada, 16-19 May 2011, Springer, 304-310.
[24] Kimelfeld, B. and Kolaitis, P. G.2014. The complexity of mining maximal frequent subgraphs. ACM Transactions on Database Systems39, 4, 32-33. · Zbl 1474.68106
[25] Lifschitz, V.2008. What is answer set programming? AAAI, Chicago, IL.
[26] Mannila, H. and Toivonen, H.1997. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery1, 3, 241-258.
[27] Métivier, J., Loudni, S. and Charnois, T.2013. A constraint programming approach for mining sequential patterns in a sequence database. CoRR abs/1311.6907.
[28] Miettinen, P.2008. On the positive-negative partial set cover problem. Information Processing Letters108, 4, 219-221. · Zbl 1191.68251
[29] Miettinen, P.2009. Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms. Ph.D. thesis, Department of Computer Science, University of Helsinki.
[30] Miettinen, P.2015. Generalized matrix factorizations as a unifying framework for pattern set mining: Complexity beyond blocks. In ECMLPKDD 2015, Springer, 36-52.
[31] Miettinen, P., Mielikäinen, T., Gionis, A., Das, G. and Mannila, H.2008. The discrete basis problem. IEEE Transactions on Knowledge and Data Engineering20, 10, 1348-1362.
[32] Négrevergne, B., Dries, A., Guns, T. and Nijssen, S.2013. Dominance programming for itemset mining. In 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7-10 Dec. 2013, IEEE, 557-566.
[33] Négrevergne, B. and Guns, T.2015. Constraint-based sequence mining using constraint programming. In CPAIOR, Springer, 288-305. · Zbl 1427.68068
[34] Neumann, S. and Miettinen, P.2017. Reductions for frequency-based data mining problems. In Proc. of the 2017 IEEE International Conference on Data Mining, IEEE, New Orleans, LA, USA, 997-1002.
[35] Paramonov, S., Van Leeuwen, M., Denecker, M. and De Raedt, L.2015. An exercise in declarative modeling for relational query mining. In ILP, Springer, 166-182.
[36] Pei, J. and Han, J.2000. Can we push more constraints into frequent pattern mining? In ACM SIGKDD, ACM, Boston, MA, USA, 350-354.
[37] Pei, J., Han, J. and Mao, R.2000. CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX, USA, 21-30.
[38] Pensa, R. G. and Boulicaut, J.-F.2005. Towards fault-tolerant formal concept analysis. In AI*IA 2005, Springer, 212-223. · Zbl 1155.68524
[39] Rojas, W. U., Boizumault, P., Loudni, S., Crémilleux, B. and Lepailleur, A.2014. Mining (soft-) skypatterns using dynamic CSP. In CPAIOR, Springer, 71-87. · Zbl 1407.68416
[40] Simons, P., Niemelä, I. and Soininen, T.2002. Extending and implementing the stable model semantics. Artificial Intelligence138, 1-2, 181-234. · Zbl 0995.68021
[41] Tyukin, A., Kramer, S. and Wicker, J.2014. BMaD - A boolean matrix decomposition framework. In ECML PKDD 2014, Calders, T., Esposito, F., Hüllermeier, E. and Meo, R., Eds., Springer, 481-484.
[42] Van Der Hallen, M., Paramonov, S., Leuschel, M. and Janssens, G.2016. Knowledge representation analysis of graph mining. CoRR abs/1608.08956. · Zbl 1485.68248
[43] Yan, X. and Han, J.2002. gSpan: Graph-based substructure pattern mining. In ICDM 2002, Maebashi City, Japan, Japan, IEEE.
[44] Yan, X., Han, J. and Afshar, R.2003. CloSpan: Mining closed sequential patterns in large datasets. In SDM. 166-177.
[45] Zaki, M. J., Parthasarathy, S., Ogihara, M. and Li, W.1997. New algorithms for fast discovery of association rules. Technical report, University of Rochester, Rochester, NY, USA.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.