×

Efficient top-\(k\) high utility itemset mining on massive data. (English) Zbl 1484.68211

Summary: In practical applications, top-\(k\) high utility itemset mining (top-\(k\) HUIM) is an interesting operation to find the \(k\) itemsets with the highest utilities. It is analyzed that, the existing algorithms only can deal with the small and medium-sized data, and their performance degrades significantly on massive data. This paper presents a novel top-\(k\) HUIM algorithm PTM which can mine top-\(k\) high utility itemsets on massive data efficiently. PTM maintains the transaction database as a set of prefix-based partitions, each of which can fit in the memory and keeps the transactions with the same prefix item. The utility of an itemset can be computed by the transactions in one particular partition. PTM processes the prefix-based partitions in the order of average transaction utility to raise utility threshold faster. By the concise assistant data structures, PTM can skip majority of the partitions directly. For the partitions to be processed, PTM utilizes depth-first search on the set enumeration tree of the promising items to find the required results. The full-suffix-utility-based subtree pruning rule is devised to reduce the exploration space of set enumeration tree effectively. The extensive experimental results show that PTM can discover the top-\(k\) high utility itemsets on massive data efficiently.

MSC:

68T09 Computational aspects of data analysis and big data
62R07 Statistical aspects of big data and data science

Software:

SPMF
Full Text: DOI

References:

[1] (Aggarwal, Charu C.; Han, Jiawei, Frequent Pattern Mining (2014), Springer) · Zbl 1297.68010
[2] Agrawal, Rakesh; Srikant, Ramakrishnan, Fast algorithms for mining association rules in large databases, (VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases (1994)), 487-499
[3] Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, Young-Koo Lee. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng., 21(12):1708-1721, 2009. · Zbl 1170.68035
[4] Cheung and Ada Wai-Chee Fu, Yin-Ling; Fu, Ada Wai-Chee, Mining frequent itemsets without support threshold: With and without item constraints, IEEE Trans. Knowl. Data Eng., 16, 9, 1052-1069 (2004)
[5] Djenouri, Youcef; Djenouri, Djamel; Belhadi, Asma; Cano, Alberto, Exploiting GPU and cluster parallelism in single scan frequent itemset mining, Inf. Sci., 496, 363-377 (2019)
[6] Duong, Quang-Huy; Fournier-Viger, Philippe; Ramampiaro, Heri; Nørvåg, Kjetil; Dam, Thu-Lan, Efficient high utility itemset mining using buffered utility-lists, Appl. Intell., 48, 7, 1859-1877 (2018)
[7] Duong, Quang-Huy; Liao, Bo; Fournier-Viger, Philippe; Dam, Thu-Lan, An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies, Knowl.-Based Syst., 104, 106-122 (2016)
[8] Philippe Fournier-Viger, Jerry Chun-Wei Lin, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Zhihong Deng, Hoang Thanh Lam. The SPMF open-source data mining library version 2, in: Proceedings of 27th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2016, Part III, pages 36-40, 2016.
[9] Philippe Fournier-Viger, Jerry Chun-Wei Lin, Roger Nkambou, Bay Vo, and Vincent S. Tseng, editors. High-Utility Pattern Mining: Theory, Algorithms and Applications. Springer, 2019.
[10] Fournier-Viger, Philippe; Cheng-Wei, Wu.; Zida, Souleymane; Tseng, Vincent S., FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning, (Proceedings of 21st International Symposium Foundations of Intelligent Systems, ISMIS 2014 (2014)), 83-92
[11] Ada Wai-Chee, Fu.; Kwong, Renfrew W.-W.; Tang, Jian, Mining N-most interesting itemsets, (Proceedings of the 12th International Symposium Foundations of Intelligent Systems, ISMIS 2000 (2000)), 59-67 · Zbl 0983.68669
[12] Han, Xixian; Li, Jianzhong; Gao, Hong, Efficient top-k retrieval on massive data, IEEE Trans. Knowl. Data Eng., 27, 10, 2687-2699 (2015)
[13] Han, Xixian; Liu, Xianmin; Chen, Jian; Lai, Guojun; Gao, Hong; Li, Jianzhong, Efficiently mining frequent itemsets on massive data, IEEE Access, 7, 31409-31421 (2019)
[14] Krishnamoorthy, Srikumar, Pruning strategies for mining high utility itemsets, Expert Syst. Appl., 42, 5, 2371-2381 (2015)
[15] Krishnamoorthy, Srikumar, Mining top-k high utility itemsets with effective threshold raising strategies, Expert Syst. Appl., 117, 148-165 (2019)
[16] Li, Yu-Chiang; Yeh, Jieh-Shan; Chang, Chin-Chen, Isolated items discarding strategy for discovering high utility itemsets, Data Knowl. Eng., 64, 1, 198-217 (2008)
[17] Chun-Han Lin, Cheng-Wei Wu, JianTao Huang, Vincent S. Tseng. Parallel mining of top-k high utility itemsets in spark in-memory computing architecture. In Proceedings of 23rd Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2019, Part II, pages 253-265, 2019.
[18] Liu, Junqiang; Wang, Ke; Fung, Benjamin C. M., Mining high utility patterns in one phase without generating candidates, IEEE Trans. Knowl. Data Eng., 28, 5, 1245-1257 (2016)
[19] Liu, Junqiang; Zhang, Xingxing; Fung, Benjamin C. M.; Li, Jiuyong; Iqbal, Farkhund, Opportunistic mining of top-n high utility patterns, Inf. Sci., 441, 171-186 (2018) · Zbl 1440.68225
[20] Mengchi Liu, Jun-Feng Qu. Mining high utility itemsets without candidate generation. In Proceedings of 21st ACM International Conference on Information and Knowledge Management, CIKM’12, pages 55-64, 2012.
[21] Ying Liu, Wei-keng Liao, Alok N. Choudhary. A two-phase algorithm for fast discovery of high utility itemsets. In Proceedings of 9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005, pages 689-695, 2005.
[22] José María Luna, Philippe Fournier-Viger, Sebastián Ventura. Frequent itemset mining: A 25 years review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 9(6), 2019.
[23] Mamoulis, Nikos; Yiu, Man Lung; Cheng, Kit Hung; Cheung, David W., Efficient top-k aggregation of ranked inputs, ACM Trans. Database Syst., 32(3):19 (2007)
[24] Alex Yuxuan Peng, Yun Sing Koh, Patricia Riddle. mhuiminer: A fast high utility itemset mining algorithm for sparse datasets, in: Proceedings of 21st Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2017, Part II, pages 196-207, 2017.
[25] Ryang, Heungmo; Yun, Unil, Top-k high utility pattern mining with effective threshold raising strategies, Knowl.-Based Syst., 76, 109-126 (2015)
[26] Tseng, Vincent S.; Shie, Bai-En; Cheng-Wei, Wu.; Philip, S. Yu., Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. Knowl. Data Eng., 25, 8, 1772-1786 (2013)
[27] Tseng, Vincent S.; Cheng-Wei, Wu.; Fournier-Viger, Philippe; Philip, S. Yu., Efficient algorithms for mining top-k high utility itemsets, IEEE Trans. Knowl. Data Eng., 28, 1, 54-67 (2016)
[28] Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, Philip S. Yu. Up-growth: an efficient algorithm for high utility itemset mining, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pages 253-262, 2010.
[29] Jason Tsong-Li Wang, Mohammed Javeed Zaki, Hannu Toivonen, Dennis E. Shasha, editors. Data Mining in Bioinformatics. Springer, 2005. · Zbl 1060.68026
[30] Cheng-Wei Wu, Bai-En Shie, Vincent S. Tseng, Philip S. Yu. Mining top-k high utility itemsets, in: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pages 78-86, 2012.
[31] Yun, Ching-Huang; Chen, Ming-Syan, Mining mobile sequential patterns in a mobile commerce environment, IEEE Trans. Systems, Man, and Cybernetics, Part C, 37, 2, 278-295 (2007)
[32] Zaki, Mohammed Javeed, Scalable algorithms for association mining, IEEE Trans. Knowl. Data Eng., 12, 3, 372-390 (2000)
[33] Lin Zhou, Ying Liu, Jing Wang, Yong Shi. Utility-based web path traversal pattern mining. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), pages 373-380, 2007.
[34] Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, Vincent S. Tseng. EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst., 51(2):595-625, 2017.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.