×

Summarization and visualization of multi-level and multi-dimensional itemsets. (English) Zbl 1457.68230

Summary: Frequent itemset (FI) mining aims at discovering relevant patterns from sets of transactions. In this work we focus on multi-level and multi-dimensional data, which provide a rich description of subjects through multiple features each at different levels of detail. Summarization of FIs has been only marginally addressed so far with specific reference to multi-level and multi-dimensional FIs. In this paper we fill this gap by proposing SUSHI, a framework for summarizing and visually exploring multi-level and multi-dimensional FIs. Specifically, SUSHI is based on (i) a similarity function for FIs which takes into account both their extensional (support-based) and intensional (feature-based) natures; (ii) theoretical results concerning antimonotonicity of support and similarity in multi-level settings, which allow us to propose an efficient clustering algorithm to generate hierarchical summaries; and (iii) two integrated approaches to summary visualization and exploration: a graph-based one, which highlights inter-cluster relationships, and a tree-based one, which emphasizes the relationships between the representative of each cluster and the other FIs in that cluster. SUSHI is evaluated using both a real and a synthetic dataset in terms of effectiveness, efficiency, and understandability of the summary, with reference to three different strategies for choosing cluster representatives. Overall, SUSHI shows to outperform previous approaches and to be a valuable tool to expedite the analysis of FIs. Besides, one of the three strategies for choosing cluster representatives shows to be the most effective one.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T10 Pattern recognition, speech recognition

Software:

GMiner; Graphviz

References:

[1] Afrati, F. N.; Gionis, A.; Mannila, H., Approximating a collection of frequent sets, Proc. SIGKDD, 12-19 (2004)
[2] Agrawal, R.; Srikant, R., Fast algorithms for mining association rules in large databases, Proc. VLDB, 487-499 (1994)
[3] Ahmed, M., Data summarization: a survey, Knowl. Inf. Syst., 58, 2, 249-273 (2019)
[4] Bagaria, V. K.; Kamath, G. M.; Ntranos, V.; Zhang, M. J.; Tse, D., Medoids in almost-linear time via multi-armed bandits, AISTATS. AISTATS, Proceedings of Machine Learning Research, 84, 500-509 (2018), PMLR
[5] Baralis, E.; Cagliero, L.; Cerquitelli, T.; D’Elia, V.; Garza, P., Support driven opportunistic aggregation for generalized itemset extraction, Proc. IS, 102-107 (2010)
[6] Bederson, B. B.; Shneiderman, B.; Wattenberg, M., Ordered and quantum treemaps: making effective use of 2d space to display hierarchies, ACM Trans. Graph., 21, 4, 833-854 (2002)
[7] Bothorel, G.; Serrurier, M.; Hurter, C., Visualization of frequent itemsets with nested circular layout and bundling algorithm, Proc. ISVC, 396-405 (2013)
[8] Carbonell, J. G.; Goldstein, J., The use of MMR, diversity-based reranking for reordering documents and producing summaries, Proc. SIGIR, 335-336 (1998)
[9] Chandola, V.; Kumar, V., Summarization—compressing data into an informative representation, Knowl. Inf. Syst., 12, 3, 355-378 (2007)
[10] Chon, K.; Hwang, S.; Kim, M., Gminer: a fast GPU-based frequent itemset mining method for large-scale data, Inf. Sci., 439-440, 19-38 (2018)
[11] Davidson, I.; Ravi, S. S., Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results, Data Min. Knowl. Discov., 18, 2, 257-282 (2009)
[12] Day, W. H.; Edelsbrunner, H., Efficient algorithms for agglomerative hierarchical clustering methods, J. Classif., 1, 1, 7-24 (1984) · Zbl 0563.62034
[13] Djenouri, Y.; Drias, H.; Bendjoudi, A., Pruning irrelevant association rules using knowledge mining, IJBIDM, 9, 2, 112-144 (2014)
[14] Djenouri, Y.; Lin, J. C.; Nørvåg, K.; Ramampiaro, H., Highly efficient pattern mining based on transaction decomposition, ICDE, 1646-1649 (2019), IEEE
[15] Ellson, J.; Gansner, E. R.; Koutsofios, E.; North, S. C.; Woodhull, G., Graphviz - open source graph drawing tools, Graph Drawing, 483-484 (2001) · Zbl 1054.68583
[16] Ertek, G.; Demiriz, A., A framework for visualizing association mining results, Proc. ISCIS, 593-602 (2006)
[17] Francia, M.; Golfarelli, M.; Rizzi, S., A similarity function for multi-level and multi-dimensional itemsets, Proc. SEBD (2018)
[18] Gansner, E. R.; Koutsofios, E.; North, S. C.; Vo, K., A technique for drawing directed graphs, IEEE Trans. Softw. Eng., 19, 3, 214-230 (1993)
[19] Golfarelli, M.; Rizzi, S., Data Warehouse design: Modern Principles and Methodologies (2009), McGraw-Hill
[20] Gunopulos, D.; Khardon, R.; Mannila, H.; Toivonen, H., Data mining, hypergraph transversals, and machine learning, Proc. SIGACT-SIGMOD-SIGART, 209-216 (1997)
[21] Han, J.; Cheng, H.; Xin, D.; Yan, X., Frequent pattern mining: current status and future directions, Data Min. Knowl. Discov., 15, 1, 55-86 (2007)
[22] Han, J.; Fu, Y., Mining multiple-level association rules in large databases, IEEE Trans. Knowl. Data Eng., 11, 5, 798-804 (1999)
[23] Han, J.; Pei, J.; Yin, Y.; Mao, R., Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data Min. Knowl. Discov., 8, 1, 53-87 (2004)
[24] Hoplaros, D.; Tari, Z.; Khalil, I., Data summarization for network traffic monitoring, J. Netw. Comput. Appl., 37, 194-205 (2014)
[25] Jin, R.; Abu-Ata, M.; Xiang, Y.; Ruan, N., Effective and efficient itemset pattern summarization: regression-based approaches, Proc. SIGKDD, 399-407 (2008)
[26] Keim, D. A., Information visualization and visual data mining, IEEE Trans. Vis. Comput. Graph., 8, 1, 1-8 (2002)
[27] Leung, C. K.; Carmichael, C. L., FpVAT: a visual analytic tool for supporting frequent pattern mining, SIGKDD Explor., 11, 2, 39-48 (2009)
[28] Lim, S. J., On a visual frequent itemset mining, Proc. ICDIM, 46-51 (2009)
[29] Liu, G.; Zhang, H.; Wong, L., A flexible approach to finding representative pattern sets, IEEE Trans. Knowl. Data Eng., 26, 7, 1562-1574 (2014)
[30] Luna, J. M.; Fournier-Viger, P.; Ventura, S., Frequent itemset mining: a 25 years review, Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 9, 6 (2019)
[31] Mampaey, M.; Vreeken, J., Summarizing categorical data by clustering attributes, Data Min. Knowl. Discov., 26, 1, 130-173 (2013) · Zbl 1260.68339
[32] Newling, J.; Fleuret, F., A sub-quadratic exact medoid algorithm, AISTATS. AISTATS, Proceedings of Machine Learning Research, 54, 185-193 (2017), PMLR
[33] Nguyen, L. T.T.; Vu, V. V.; Lam, M. T.H.; Duong, T. T.M.; Manh, L. T.; Nguyen, T. T.T.; Vo, B.; Fujita, H., An efficient method for mining high utility closed itemsets, Inf. Sci., 495, 78-99 (2019)
[34] Pasquier, N.; Bastide, Y.; Taouil, R.; Lakhal, L., Discovering frequent closed itemsets for association rules, Proc. ICDT, 398-416 (1999)
[35] Poernomo, A. K.; Gopalkrishnan, V., CP-summary: a concise representation for browsing frequent itemsets, Proc. SIGKDD, 687-696 (2009)
[36] Shneiderman, B., Tree visualization with tree-maps: 2-D space-filling approach, ACM Trans. Graph., 11, 1, 92-99 (1992) · Zbl 0791.68166
[37] Shneiderman, B., The eyes have it: a task by data type taxonomy for information visualizations, Proc. Symposium on Visual Languages, 336-343 (1996)
[38] Song, W.; Liu, M., A visualizer for high utility itemset mining, Proc. CSE, 244-248 (2014)
[39] Takigawa, I.; Mamitsuka, H., Efficiently mining δ-tolerance closed frequent subgraphs, Mach. Learn., 82, 2, 95-121 (2011) · Zbl 1237.68168
[40] Vanahalli, M. K.; Patil, N., An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets, Inf. Sci., 496, 343-362 (2019)
[41] Wang, J.; Karypis, G., On efficiently summarizing categorical databases, Knowl. Inf. Syst., 9, 1, 19-37 (2006)
[42] Xiang, Y.; Jin, R.; Fuhry, D.; Dragan, F. F., Summarizing transactional databases with overlapped hyperrectangles, Data Min. Knowl. Discov., 23, 2, 215-251 (2011) · Zbl 1235.68068
[43] Xin, D.; Han, J.; Yan, X.; Cheng, H., Mining compressed frequent-pattern sets, Proc. VLDB, 709-720 (2005)
[44] Yan, X.; Cheng, H.; Han, J.; Xin, D., Summarizing itemset patterns: a profile-based approach, Proc. SIGKDD, 314-323 (2005)
[45] Yang, L., Pruning and visualizing generalized association rules in parallel coordinates, IEEE Trans. Knowl. Data Eng., 17, 1, 60-70 (2005)
[46] Yu, W., Discovering frequent movement paths from taxi trajectory data using spatially embedded networks and association rules, IEEE Trans. Intell. Transp. Syst., 20, 3, 855-866 (2019)
[47] Zaki, F. A.M.; Zulkurnain, N. F., RARE: mining colossal closed itemset in high dimensional data, Knowl. Based Syst., 161, 1-11 (2018)
[48] Zhang, S.; Jin, Z.; Lu, J., Summary queries for frequent itemsets mining, J. Syst. Softw., 83, 3, 405-411 (2010)
[49] Zhang, X.; Deng, Z., Mining summarization of high utility itemsets, Knowl. Based Syst., 84, 67-77 (2015)
[50] Zihayat, M.; An, A., Mining top-k high utility patterns over data streams, Inf. Sci., 285, 138-161 (2014) · Zbl 1355.68237
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.