×

Itemset frequency satisfiability: complexity and axiomatization. (English) Zbl 1136.68021

Summary: Computing frequent itemsets is one of the most prominent problems in data mining. We study the following related problem, called FREQSAT, in depth: given some itemset-interval pairs, does there exist a database such that for every pair the frequency of the itemset falls into the interval? This problem is shown to be \(\mathbf {NP}\)-complete. The problem is then further extended to include arbitrary Boolean expressions over items and conditional frequency expressions in the form of association rules. We also show that, unless \(\mathbf {P}\) equals \(\mathbf {NP}\), the related function problem – find the best interval for an itemset under some frequency constraints – cannot be approximated efficiently. Furthermore, it is shown that FREQSAT is recursively axiomatizable, but that there cannot exist an axiomatization of finite arity.

MSC:

68P15 Database theory
68Q17 Computational difficulty of problems (lower bounds, completeness, difficulty of approximation, etc.)
68Q25 Analysis of algorithms and problem complexity

References:

[1] Abiteboul, S.; Hull, R.; Vianu, V., Foundations of Databases (1995), Addison-Wesley · Zbl 0848.68031
[2] F. Afrati, A. Gionis, H. Mannila, Approximating a collection of frequent sets, in: Proc. KDD Int. Conf. Knowledge Discovery in Databases, 2004, pp. 12-19; F. Afrati, A. Gionis, H. Mannila, Approximating a collection of frequent sets, in: Proc. KDD Int. Conf. Knowledge Discovery in Databases, 2004, pp. 12-19
[3] R. Agrawal, T. Imilienski, A. Swami, Mining association rules between sets of items in large databases, in: Proc. ACM SIGMOD Int. Conf. Management of Data, 1993, pp. 207-216; R. Agrawal, T. Imilienski, A. Swami, Mining association rules between sets of items in large databases, in: Proc. ACM SIGMOD Int. Conf. Management of Data, 1993, pp. 207-216
[4] R. Agrawal, R. Srikant, Privacy-preserving data mining, in: Proc. ACM SIGMOD Int. Conf. Management of Data, 2000, pp. 439-450; R. Agrawal, R. Srikant, Privacy-preserving data mining, in: Proc. ACM SIGMOD Int. Conf. Management of Data, 2000, pp. 439-450
[5] M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi, Blocking anonymity threats raised by frequent itemset mining, in: Proc. IEEE Int. Conf. on Data Mining, 2005, pp. 561-564; M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi, Blocking anonymity threats raised by frequent itemset mining, in: Proc. IEEE Int. Conf. on Data Mining, 2005, pp. 561-564
[6] Bastide, Y.; Taouil, R.; Pasquier, N.; Stumme, G.; Lakhal, L., Mining frequent patterns with counting inference, SIGKDD Explorations, 2, 2, 66-75 (2000)
[7] R.J. Bayardo, Efficiently mining long patterns from databases, in: Proc. ACM SIGMOD Int. Conf. Management of Data, 1998, pp. 85-93; R.J. Bayardo, Efficiently mining long patterns from databases, in: Proc. ACM SIGMOD Int. Conf. Management of Data, 1998, pp. 85-93
[8] R.J. Bayardo, B. Goethals, M.J. Zaki (Eds.) Proc. of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, FIMI 2004. CEUR-WS.org, 2004; R.J. Bayardo, B. Goethals, M.J. Zaki (Eds.) Proc. of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, FIMI 2004. CEUR-WS.org, 2004
[9] J.-F. Boulicaut, A. Bykowski, C. Rigotti, Approximation of frequency queries by means of free-sets, in: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, 2000, pp. 75-85; J.-F. Boulicaut, A. Bykowski, C. Rigotti, Approximation of frequency queries by means of free-sets, in: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, 2000, pp. 75-85
[10] Boulicaut, J.-F.; Bykowski, A.; Rigotti, C., Free-sets: A condensed representation of boolean data for the approximation of frequency queries, Data Mining and Knowledge Discovery, 7, 1, 5-22 (2003)
[11] T. Calders, Axiomatization and Deduction Rules for the Frequency of Itemsets, Ph.D. Thesis, University of Antwerp, Belgium, 2003; T. Calders, Axiomatization and Deduction Rules for the Frequency of Itemsets, Ph.D. Thesis, University of Antwerp, Belgium, 2003 · Zbl 1019.68034
[12] T. Calders, Computational complexity of itemset frequency satisfiability, in: Proc. PODS Int. Conf. Principles of Database Systems, 2004, pp. 143-154; T. Calders, Computational complexity of itemset frequency satisfiability, in: Proc. PODS Int. Conf. Principles of Database Systems, 2004, pp. 143-154
[13] Calders, T., Deducing bounds on the support of itemsets, (Database Technologies for Data Mining. Database Technologies for Data Mining, Springer LNAI, vol. 2682 (2004)), 214-233 · Zbl 1099.68609
[14] T. Calders, B. Goethals, Mining all non-derivable frequent itemsets, in: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, Springer 2002, pp. 74-85; T. Calders, B. Goethals, Mining all non-derivable frequent itemsets, in: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, Springer 2002, pp. 74-85 · Zbl 1020.68566
[15] Calders, T.; Goethals, B., Non-derivable itemset mining, Data Mining and Knowledge Discovery, 14, 1, 171-206 (2007)
[16] Calders, T.; Paredaens, J., Axiomatization of frequent itemsets, Theoretical Computer Science, 290, 1, 669-693 (2003) · Zbl 1019.68034
[17] Calders, T.; Rigotti, C.; Boulicaut, J.-F., A survey on condensed representations for frequent sets, (Boulicaut, J.-F.; de Raedt, L.; Mannila, H., Constraint-based mining and inductive databases. Constraint-based mining and inductive databases, LNCS, vol. 3848 (2005), Springer) · Zbl 1172.68446
[18] X. Chen, M.E. Orlowska, A further study on inverse frequent set mining, in: Proc. ADMA Int. Conf. Advanced Data Mining and Applications, 2005, pp. 753-760; X. Chen, M.E. Orlowska, A further study on inverse frequent set mining, in: Proc. ADMA Int. Conf. Advanced Data Mining and Applications, 2005, pp. 753-760
[19] Dantzig, G. B., Linear Programming and Extensions (1963), Princeton University Press · Zbl 0108.33103
[20] C. Dwork, Ask a better question, get a better answer a new approach to private data analysis, in: Proc. ICDT Int. Conf. Database Theory, 2007, pp. 18-27; C. Dwork, Ask a better question, get a better answer a new approach to private data analysis, in: Proc. ICDT Int. Conf. Database Theory, 2007, pp. 18-27
[21] Fagin, R.; Halpern, J.; Megiddo, N., A logic for reasoning about probabilities, Information and Computation, 87, 1, 2, 78-128 (1990) · Zbl 0811.03014
[22] Frisch, A. M.; Haddawy, P., Anytime deduction for probabilistic logic, Artificial Intelligence, 69, 1, 2, 93-112 (1994) · Zbl 0809.03016
[23] Georgakopoulos, G.; Kavvadias, D.; Papadimitriou, C. H., Probabilistic satisfiability, Journal of Complexity, 4, 1-11 (1988) · Zbl 0647.68049
[24] Goethals, B., Frequent set mining, (The Data Mining and Knowledge Discovery Handbook (Springer, 2005)), 377-397, (Chapter 17)
[25] B. Goethals, J. Muhonen, H. Toivonen, Mining non-derivable association rules, in: Proc. SIAM Int. Conf. on Data Mining, 2005; B. Goethals, J. Muhonen, H. Toivonen, Mining non-derivable association rules, in: Proc. SIAM Int. Conf. on Data Mining, 2005
[26] Hailperin, T., Sentential Probability Logic (1996), Lehigh University Press · Zbl 0922.03026
[27] P. Hansen, B. Jaumard, Probabilistic satisfiability. Les Cahiers du GERAD G-96-31, GERAD, 1996; P. Hansen, B. Jaumard, Probabilistic satisfiability. Les Cahiers du GERAD G-96-31, GERAD, 1996
[28] P. Hansen, B. Jaumard, G.-B.D. Nguets, M.P. de Aragäo, Models and algorithms for probabilistic and bayesian logic, in: Proc. IJCAI Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 1995, pp. 1862-1868; P. Hansen, B. Jaumard, G.-B.D. Nguets, M.P. de Aragäo, Models and algorithms for probabilistic and bayesian logic, in: Proc. IJCAI Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 1995, pp. 1862-1868
[29] Hipp, J.; Güntzer, U.; Nakhaeizadeh, G., Algorithms for association rule mining—a general survey and comparison, SIGKDD Explorations, 2, 1, 58-64 (2000)
[30] Jaeger, M., Automatic derivation of probabilistic inference rules, Journal of Approximate Reasoning, 28, 1, 1-22 (2001) · Zbl 0984.68157
[31] S. Jaroszewicz, D.A. Simivici, Pruning redundant association rules using maximum entropy principle, in: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, 2002, pp. 135-147; S. Jaroszewicz, D.A. Simivici, Pruning redundant association rules using maximum entropy principle, in: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, 2002, pp. 135-147 · Zbl 1048.68788
[32] Lindell, Y.; Pinkas, B., Privacy Preserving Data Mining, Journal of Cryptology, 15, 3, 177-206 (2002) · Zbl 1010.94008
[33] Lukasiewicz, T., Local probabilistic deduction from taxonomic and probabilistic knowledge-bases over conjunctive events, Journal of Approximate Reasoning, 21, 23-61 (1999) · Zbl 0961.68135
[34] Lukasiewicz, T., Probabilistic logic programming with conditional constraints, ACM Transactions on Computational Logic, 2, 3, 289-339 (2001) · Zbl 1171.68762
[35] H. Mannila, H. Toivonen, Multiple uses of frequent sets and condensed representations, in: Proc. KDD Int. Conf. Knowledge Discovery in Databases, 1996; H. Mannila, H. Toivonen, Multiple uses of frequent sets and condensed representations, in: Proc. KDD Int. Conf. Knowledge Discovery in Databases, 1996
[36] Mielikäinen, T., On inverse frequent set mining, (2nd IEEE ICDM Workshop on Privacy Preserving Data Mining, PPDM (2003), IEEE), 18-23
[37] Nilsson, N., Probabilistic logic, Artificial Intelligence, 28, 71-87 (1986) · Zbl 0589.03007
[38] Paris, J. B., (The Uncertain Reasoner’s Companion. The Uncertain Reasoner’s Companion, Tracts in Theoretical Computer Science, 39 (1994), Cambridge University Press) · Zbl 0838.68104
[39] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proc. ICDT Int. Conf. Database Theory, 1999, pp. 398-416; N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proc. ICDT Int. Conf. Database Theory, 1999, pp. 398-416 · Zbl 0983.68511
[40] Pavlov, D.; Mannila, H.; Smyth, P., Beyond independence: Probabilistic models for query approximation on binary transaction data, IEEE Transactions on Knowledge and Data Engineering, 15, 6, 1409-1421 (2003)
[41] Samarati, P., Protecting respondents’ identities in microdata release, IEEE Transactions on Knowledge and Data Engineering, 13, 6, 1010-1027 (2001)
[42] B. Sayrafi, D. Van Gucht, Differential constraints, in: Proc. PODS Int. Conf. Principles of Database Systems, 2005; B. Sayrafi, D. Van Gucht, Differential constraints, in: Proc. PODS Int. Conf. Principles of Database Systems, 2005
[43] Tan, P.-N.; Steinbach, M.; Kumar, V., Introduction to Data Mining (2005), Addison-Wesley
[44] Tatti, N., Computational complexity of queries based on itemsets, Information Processing Letters, 98, 5, 183-187 (2006) · Zbl 1187.68255
[45] Tatti, N., Safe projections of binary data sets, Acta Informatica, 42, 8-9, 617-638 (2006) · Zbl 1089.68037
[46] Y. Wang, X. Wu, Approximate inverse frequent itemset mining: Privacy, complexity, and approximation, in: Proc. IEEE Int. Conf. on Data Mining, 2005; Y. Wang, X. Wu, Approximate inverse frequent itemset mining: Privacy, complexity, and approximation, in: Proc. IEEE Int. Conf. on Data Mining, 2005
[47] X. Wu, Y. Wu, Y. Wang, Y. Li, Privacy aware market basket data set generation: A feasible approach for inverse frequent set mining, in: Proc. SIAM Int. Conf. on Data Mining, 2005; X. Wu, Y. Wu, Y. Wang, Y. Li, Privacy aware market basket data set generation: A feasible approach for inverse frequent set mining, in: Proc. SIAM Int. Conf. on Data Mining, 2005
[48] X. Yan, H. Cheng, J. Han, D. Xin, Summarizing itemset patterns: a profile-based approach, in: Proc. KDD Int. Conf. Knowledge Discovery in Databases, 2005, pp. 314-323; X. Yan, H. Cheng, J. Han, D. Xin, Summarizing itemset patterns: a profile-based approach, in: Proc. KDD Int. Conf. Knowledge Discovery in Databases, 2005, pp. 314-323
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.