×

Measures on Boolean polynomials and their applications in data mining. (English) Zbl 1050.68033

Summary: We characterize measures on free Boolean algebras and we examine the relationships that exist between measures and binary tables in relational databases. It is shown that these measures are completely defined by their values on positive conjunctions, and a formula that yields this value is obtained using the method of indicators. An extension of the notion of support that is well suited for tables with missing values is presented. Finally, we obtain Bonferroni-type inequalities that allow for approximative evaluations of these measures for several types of queries. An approximation algorithm and an analysis of the results produced is also included.

MSC:

68P15 Database theory
68T05 Learning and adaptive systems in artificial intelligence
28A12 Contents, measures, outer measures, capacities

Software:

UCI-ml
Full Text: DOI

References:

[1] Agrawal, R.; Mannila, H.; Srikant, R.; Toivonen, H.; Inkeri Verkamo, A., Fast discovery of association rules, (Fayyad, U. M.; Piatetsky-Shapiro, G.; Smyth, P.; Uthurusamy, R., Advances in Knowledge Discovery and Data Mining (1996), AAAI Press: AAAI Press Menlo Park), 307-328
[2] Blake, C. L.; Merz, C. J., UCI Repository of machine learning databases, Department of Information and Computer Sciences (1998), University of California: University of California Irvine
[3] Buzzigoli, L.; Giusti, A., An algorithm to calculate the lower and upper bounds of the elements of an array given its marginals, (Statistical Data Protection (SDP’98) (1999), Eurostat: Eurostat Luxembourg), 131-147
[4] A. Dobra, Computing sharp integer bounds for entries in contingency tables given a set of fixed marginals, Technical Report, Department of Statistics, Carnegie Mellon University, 2001.; A. Dobra, Computing sharp integer bounds for entries in contingency tables given a set of fixed marginals, Technical Report, Department of Statistics, Carnegie Mellon University, 2001.
[5] Galambos, J.; Simonelli, I., Bonferroni-type Inequalities with Applications (1996), Springer: Springer Berlin · Zbl 0921.60017
[6] Graham, R. L.; Knuth, D. E.; Patashnik, O., Concrete Mathematics (1989), Addison-Wesley: Addison-Wesley Reading, MA · Zbl 0668.00003
[7] Hammer, P. L.; Rudeanu, S., Pseudo-Boolean Methods for Bivalent Programming, Lecture Notes in Mathematics, vol. 23 (1966), Springer: Springer Cambridge · Zbl 0141.17502
[8] Jaroszewicz, S.; Simovici, D.; Rosenberg, I., An inclusion-exclusion result for boolean polynomials and its applications in data mining, (Proceedings of the Discrete Mathematics in Data Mining Workshop, SIAM Datamining Conference. Proceedings of the Discrete Mathematics in Data Mining Workshop, SIAM Datamining Conference, Washington, DC (2002))
[9] Kahn, J.; Linial, N.; Samorodnitsky, A., Inclusion-exclusionexact and approximate, Combinatorica, 16, 465-477 (1996) · Zbl 0881.68084
[10] Lesser, J. T.; Kalsbeek, W. E., Nonsampling Errors in Surveys (1992), Wiley: Wiley New York · Zbl 0850.62161
[11] Linial, N.; Nisan, N., Approximate inclusion-exclusion, (ACM Symposium on Theory of Computing (1990)), 260-270
[12] Mannila, H., Combining discrete algorithms and probabilistic approaches in data mining, (DeRaedt, L.; Siebes, A., Principles of Data Mining and Knowledge Discovery, Lecture Notes in Artificial Intelligence, vol. 2168 (2001), Springer: Springer Berlin), 493 · Zbl 1009.68692
[13] Mannila, H.; Toivonen, H., Multiple uses of frequent sets and condensed representations, (Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96) (1996), Portland: Portland Oregon), 189-194
[14] Nayak, J. R.; Cook, D. J., Approximate association rule mining, (Proceedings of the Florida Artificial Intelligence Research Symposium (2001))
[15] Pavlov, D.; Mannila, H.; Smyth, P., Beyond independenceProbabilistic models for query approximation on binary transaction data ICS, TR-01-09 (2001), University of California: University of California Irvine
[16] Rényi, A., Quelques remarques ser les probabilites des evenements dependants, J. Math., 37, 393-398 (1958) · Zbl 0082.34101
[17] Ragel, A.; Crémilleux, B., Treatment of missing values for association rules, (Wu, X.; Kotagiri, R.; Korb, K. B., Proceedings of the 2nd Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining (PAKDD-98), Lecture Notes Artificial Intelligence, vol. 1394 (1998), Springer: Springer Berlin), 258-270
[18] Rudeanu, S., Boolean Functions and Equations (1974), North-Holland: North-Holland Amsterdam · Zbl 0321.06013
[19] Yu, C. T.; Meng, W., Principles of Database Query Processing for Advanced Applications (1998), Morgan Kaufmann: Morgan Kaufmann San Francisco
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.