×

Expressive generalized itemsets. (English) Zbl 1354.68220

Summary: Generalized itemset mining is a powerful tool to discover multiple-level correlations among the analyzed data. A taxonomy is used to aggregate data items into higher-level concepts and to discover frequent recurrences among data items at different granularity levels. However, since traditional high-level itemsets may also represent the knowledge covered by their lower-level frequent descendant itemsets, the expressiveness of high-level itemsets can be rather limited. To overcome this issue, this article proposes two novel itemset types, called Expressive Generalized Itemset (EGI) and Maximal Expressive Generalized Itemset (Max-EGI), in which the frequency of occurrence of a high-level itemset is evaluated only on the portion of data not yet covered by any of its frequent descendants. Specifically, EGI s represent, at a high level of abstraction, the knowledge associated with sets of infrequent itemsets, while Max-EGIs compactly represent all the infrequent descendants of a generalized itemset. Furthermore, we also propose an algorithm to discover Max-EGIs at the top of the traditionally mined itemsets.Experiments, performed on both real and synthetic datasets, demonstrate the effectiveness, efficiency, and scalability of the proposed approach.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

SLIQ; UCI-ml; LCM

References:

[5] Baralis, E.; Cagliero, L.; Cerquitelli, T.; Garza, P., Generalized association rule mining with constraints, Inf. Sci., 194, 68-84 (2012)
[6] Barsky, M.; Kim, S.; Weninger, T.; Han, J., Mining flipping correlations from large datasets with taxonomies, Proc. VLDB Endow., 5, 4, 370-381 (2011)
[8] Cagliero, L., Discovering temporal change patterns in the presence of taxonomies, IEEE Trans. Knowl. Data Eng., 25, 3, 541-555 (2013)
[9] Cagliero, L.; Cerquitelli, T.; Garza, P.; Grimaudo, L., Misleading generalized itemset discovery, Expert Syst. Appl., 41, 4, 1400-1410 (2014)
[10] Cagliero, L.; Garza, P., Itemset generalization with cardinality-based constraints, Inf. Sci., 244, 161-174 (2013) · Zbl 1355.68225
[14] Gharib, T. F., An efficient algorithm for mining frequent maximal and closed itemsets, Int. J. Hybrid Intell. Syst., 6, 3, 147-153 (2009) · Zbl 1200.68187
[15] Han, J.; Fu, Y., Mining multiple-level association rules in large databases, IEEE TKDE, 11, 5, 798-805 (1999)
[19] Kunkle, D.; Zhang, D.; Cooperman, G., Mining frequent generalized itemsets and generalized association rules without redundancy, J. Comput. Sci. Technol., 23, 1, 77-102 (2008)
[20] Kuok, C. M.; Fu, A.; Wong, M. H., Mining fuzzy association rules in databases, SIGMOD Rec., 27, 41-46 (1998)
[21] Lee, Y.-C.; Hong, T.-P.; Wang, T.-C., Multi-level fuzzy mining with multiple minimum supports, Expert Syst. Appl., 34, 1, 459-468 (2008)
[24] Parida, L.; Ramakrishnan, N., Redescription Mining: Structure Theory and Algorithms, (AAAI’05 (2005), AAAI Press), 837-844
[30] Sriphaew, K.; Theeramunkong, T., Fast algorithms for mining generalized frequent patterns of generalized association rules, IEICE Trans. Inf. Syst., 87, 3, 761-770 (2004)
[31] Tan, P.-N.; Steinbach, M.; Kumar, V., Introduction to Data Mining (2005), Addison-Wesley
[32] Tatti, N.; Mampaey, M., Using background knowledge to rank itemsets, Data Min. Knowl. Discov., 21, 293-309 (2010)
[34] Zaki, M. J., Generating Non-Redundant Association Rules, (KDD’00 (2000), ACM: ACM New York, NY, USA), 34-43
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.