×

Axiomatization of frequent itemsets. (English) Zbl 1019.68034

Summary: Mining association rules is very popular in the data mining community. Most algorithms designed for finding association rules start with searching for frequent itemsets. Typically, in these algorithms, counting phases and pruning phases are interleaved. In the counting phase, partial information about the frequencies of selected itemsets is gathered. In the pruning phase as much as possible of the search space is pruned, based on the counting information. We introduce frequent set expressions to represent (possible partial) information acquired in the counting phase. A frequent set expression is a pair containing an itemset and a fraction that is a lower bound on the actual frequency of the itemset. A system of frequent sets is a collection of such pairs. We give an axiomatization for those systems that are complete in the sense that they explicitly contain all information they logically imply. Every system of frequent sets has a unique completion that actually represents all knowledge that can be derived. We also study sparse systems, in which not for every frequent set an expression is given. Furthermore, we explore the links with probabilistic logics.

MSC:

68P15 Database theory
Full Text: DOI

References:

[1] R. Agrawal, T. Imilienski, A. Swami, Mining association rules between sets of items in large databases, in: P. Buneman, S. Jajodia (Eds.), Proc. ACM SIGMOD, Washington, DC, 1993, pp. 207-216.; R. Agrawal, T. Imilienski, A. Swami, Mining association rules between sets of items in large databases, in: P. Buneman, S. Jajodia (Eds.), Proc. ACM SIGMOD, Washington, DC, 1993, pp. 207-216.
[2] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: J.B. Bocca, M. Jarke, C. Zaniolo (Eds.), Proc. VLDB, Santiago de Chile, Chile, 1994, pp. 487-499.; R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: J.B. Bocca, M. Jarke, C. Zaniolo (Eds.), Proc. VLDB, Santiago de Chile, Chile, 1994, pp. 487-499.
[3] R.J. Bayardo, Efficiently mining long patterns from databases, in: L.M. Haas, A. Tiwary (Eds.), Proc. ACM SIGMOD, Seattle, Washington, 1998, pp. 85-93.; R.J. Bayardo, Efficiently mining long patterns from databases, in: L.M. Haas, A. Tiwary (Eds.), Proc. ACM SIGMOD, Seattle, Washington, 1998, pp. 85-93.
[4] T. Calders, J. Paredaens, A theoretical framework for reasoning about frequent itemsets, Technical Report 00-6, Department of Mathematics and Computer Science, Universiteit Antwerpen, Belgium, 2000, http://win-www.uia.ac.be/u/calders/download/axiom.ps; T. Calders, J. Paredaens, A theoretical framework for reasoning about frequent itemsets, Technical Report 00-6, Department of Mathematics and Computer Science, Universiteit Antwerpen, Belgium, 2000, http://win-www.uia.ac.be/u/calders/download/axiom.ps · Zbl 1019.68034
[5] T. Calders, J. Paredaens, Axiomatization of frequent sets, in: J. Van den Bussche, V. Vianu (Eds.), Proc. ICDT, London, UK, 2001, pp. 204-218.; T. Calders, J. Paredaens, Axiomatization of frequent sets, in: J. Van den Bussche, V. Vianu (Eds.), Proc. ICDT, London, UK, 2001, pp. 204-218. · Zbl 1047.68567
[6] Fagin, R.; Vardi, M. Y., Armstrong databases for functional and inclusion dependencies, Inform. Process. Lett., 16, 1, 13-19 (1983) · Zbl 0501.68056
[7] Fagin, R.; Halpern, J.; Megiddo, N., A logic for reasoning about probabilities, Inform. and Comput., 87, 1-2, 78-128 (1990) · Zbl 0811.03014
[8] Frisch, A. M.; Haddawy, P., Anytime deduction for probabilistic logic, Artif. Intell., 69, 1-2, 93-112 (1994) · Zbl 0809.03016
[9] Georgakopoulos, G.; Kavvadias, D.; Papadimitriou, C. H., Probabilistic satisfiability, J. Complexity, 4, 1-11 (1988) · Zbl 0647.68049
[10] Hadley, G., Linear Programming (1962), Addison-Wesley: Addison-Wesley Reading, MA · Zbl 0102.36304
[11] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), Proc. ACM SIGMOD, Dallas, TX, 2000, pp. 1-12.; J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), Proc. ACM SIGMOD, Dallas, TX, 2000, pp. 1-12.
[12] P. Hansen, B. Jaumard, Probabilistic satisfiability, Les Cahiers du GERAD, G-96-31, 1996.; P. Hansen, B. Jaumard, Probabilistic satisfiability, Les Cahiers du GERAD, G-96-31, 1996.
[13] P. Hansen, B. Jaumard, G.-B.D. Nguets, M.P. de Aragäo, Models and algorithms for probabilistic and Bayesian logic, in: Proc. IJCAI’95, Montreal, Canada, 1995, pp. 1862-1868.; P. Hansen, B. Jaumard, G.-B.D. Nguets, M.P. de Aragäo, Models and algorithms for probabilistic and Bayesian logic, in: Proc. IJCAI’95, Montreal, Canada, 1995, pp. 1862-1868.
[14] L.V.S. Laksmanan, R.T. Ng, J. Han, A. Pang, Optimization of constrained frequent set queries with 2-variable Constraints, in: A. Delis, C. Faloutsos, S. Ghandeharizadeh (Eds.), Proc. ACM SIGMOD, Philadelphia, PA, 1999, pp. 157-168.; L.V.S. Laksmanan, R.T. Ng, J. Han, A. Pang, Optimization of constrained frequent set queries with 2-variable Constraints, in: A. Delis, C. Faloutsos, S. Ghandeharizadeh (Eds.), Proc. ACM SIGMOD, Philadelphia, PA, 1999, pp. 157-168.
[15] Lucasiewicz, T., Local probabilistic deduction from taxonomic and probabilistic knowledge-bases over conjunctive events, J. Approx. Reasoning, 21, 23-61 (1999) · Zbl 0961.68135
[16] Nilsson, N., Probabilistic logic, Artif. Intell., 28, 71-87 (1986) · Zbl 0589.03007
[17] J. Paredaens, Axiomatization of frequent sets, Technical Report 99-11, Universiteit Antwerpen, Department of Mathematics and Computer Science, Belgium, 1999.; J. Paredaens, Axiomatization of frequent sets, Technical Report 99-11, Universiteit Antwerpen, Department of Mathematics and Computer Science, Belgium, 1999. · Zbl 1047.68567
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.