×

Efficient high utility itemset mining without the join operation. (English) Zbl 1543.68104

Summary: The task of mining high-utility itemsets in a database given a minimum threshold is attracting more and more interest due to its many applications. Existing algorithms such as the vertical ones have the advantages of high scalability, efficiency and extensibility. However, they depend on a costly join operation to generate new itemsets. To overcome this limitation, this paper proposes a novel vertical algorithm, FOTH (Fast sOrted iTemset searcH), which employs a novel effective data structure, the IndexSet. The IndexSet self-propagates to produce sub-IndexSets, eliminating the need to perform join operations, which considerably reduces the memory and computation requirements. Experiments were conducted on eight benchmark databases to compare the performance of FOTH with four state-of-the-art list-based algorithms. The results show that FOTH outperforms the other algorithms on dense databases.

MSC:

68P15 Database theory

Software:

SPMF
Full Text: DOI

References:

[1] Agrawal, R.; Imieliński, T.; Swami, A., Mining association rules between sets of items in large databases, SIGMOD Rec., 22, 2, 207-216, 1993
[2] Fournier-Viger, P.; Lin, J. Chun-Wei; Truong-Chi, T.; Nkambou, R., A survey of high utility itemset mining, (High-Utility Pattern Mining: Theory, Algorithms and Applications, 2019), 1-45
[3] Yao, H.; Hamilton, H. J.; Butz, C. J., A foundational approach to mining itemset utilities from databases, (Proceedings of the 2004 SIAM International Conference on Data Mining, 2004, SIAM), 482-486
[4] Agrawal, R.; Srikant, R., Fast algorithms for mining association rules, (Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB’94), vol. 1215, 1994), 487-499
[5] Han, J.; Pei, J.; Yin, Y., Mining frequent patterns without candidate generation, SIGMOD Rec., 29, 2, 1-12, 2000
[6] Zaki, M., Scalable algorithms for association mining, IEEE Trans. Knowl. Data Eng., 12, 3, 372-390, 2000
[7] Park, J. S.; Chen, M.-S.; Yu, P. S., An effective hash-based algorithm for mining association rules, SIGMOD Rec., 24, 2, 175-186, 1995
[8] Savasere, A.; Omiecinski, E.; Navathe, S. B., An efficient algorithm for mining association rules in large databases, (Proceedings of the 21th International Conference on Very Large Data Bases, VLDB ’95, 1995, Morgan Kaufmann Publishers Inc.: Morgan Kaufmann Publishers Inc. San Francisco, CA, USA), 432-444
[9] Toivonen, H., Sampling large databases for association rules, (Vldb, vol. 96, 1996), 134-145
[10] Agarwal, R. C.; Aggarwal, C. C.; Prasad, V., A tree projection algorithm for generation of frequent item sets, J. Parallel Distrib. Comput., 61, 3, 350-371, 2001 · Zbl 0990.68058
[11] Liu, Y.; Liao, W.-k.; Choudhary, A., A two-phase algorithm for fast discovery of high utility itemsets, (Ho, T. B.; Cheung, D.; Liu, H., Advances in Knowledge Discovery and Data Mining, 2005, Springer: Springer Berlin Heidelberg, Berlin, Heidelberg), 689-695
[12] Tseng, V. S.; Shie, B.-E.; Wu, C.-W.; Yu, P. S., Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. Knowl. Data Eng., 25, 8, 1772-1786, 2013
[13] Ahmed, C. F.; Tanbeer, S. K.; Jeong, B.-S.; Lee, Y.-K., Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. Knowl. Data Eng., 21, 12, 1708-1721, 2009
[14] Yun, U.; Ryang, H.; Ryu, K. H., High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates, Expert Syst. Appl., 41, 8, 3861-3878, 2014
[15] Liu, M.; Qu, J., Mining high utility itemsets without candidate generation, (Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, 2012, Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 55-64
[16] Liu, J.; Wang, K.; Fung, B. C.M., Direct discovery of high utility itemsets without candidate generation, (Proceedings of the 2012 IEEE 12th International Conference on Data Mining, ICDM ’12, 2012, IEEE Computer Society: IEEE Computer Society USA), 984-989
[17] Zida, S.; Fournier-Viger, P.; Lin, J. C.-W.; Wu, C.-W.; Tseng, V. S., Efim: a highly efficient algorithm for high-utility itemset mining, (Sidorov, G.; Galicia-Haro, S. N., Advances in Artificial Intelligence and Soft Computing, 2015, Springer International Publishing: Springer International Publishing Cham), 530-546
[18] Fournier-Viger, P.; Wu, C.-W.; Zida, S.; Tseng, V. S., Fhm: faster high-utility itemset mining using estimated utility co-occurrence pruning, (Andreasen, T.; Christiansen, H.; Cubero, J.-C.; Raś, Z. W., Foundations of Intelligent Systems, 2014, Springer International Publishing: Springer International Publishing Cham), 83-92
[19] Krishnamoorthy, S., Hminer: efficiently mining high utility itemsets, Expert Syst. Appl., 90, 168-183, 2017
[20] Krishnamoorthy, S., Pruning strategies for mining high utility itemsets, Expert Syst. Appl., 42, 5, 2371-2381, 2015
[21] Qu, J.-F.; Liu, M.; Fournier-Viger, P., Efficient algorithms for high utility itemset mining without candidate generation, (High-Utility Pattern Mining: Theory, Algorithms and Applications, 2019), 131-160
[22] Wu, P.; Niu, X.; Fournier-Viger, P.; Huang, C.; Wang, B., Ubp-miner: an efficient bit based high utility itemset mining algorithm, Knowl.-Based Syst., 248, Article 108865 pp., 2022
[23] Ryang, H.; Yun, U., Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques, Knowl. Inf. Syst., 51, 2, 627-659, 2017
[24] Peng, A. Y.; Koh, Y. S.; Riddle, P., mhuiminer: a fast high utility itemset mining algorithm for sparse datasets, (Kim, J.; Shim, K.; Cao, L.; Lee, J.-G.; Lin, X.; Moon, Y.-S., Advances in Knowledge Discovery and Data Mining, 2017, Springer International Publishing: Springer International Publishing Cham), 196-207
[25] Qu, J.-F.; Fournier-Viger, P.; Liu, M.; Hang, B.; Wang, F., Mining high utility itemsets using extended chain structure and utility machine, Knowl.-Based Syst., 208, Article 106457 pp., 2020
[26] Qu, J.-F.; Fournier-Viger, P.; Liu, M.; Hang, B.; Hu, C., Mining high utility itemsets using prefix trees and utility vectors, IEEE Trans. Knowl. Data Eng., 1-14, 2023
[27] Duong, Q.-H.; Fournier-Viger, P.; Ramampiaro, H.; NØrvåg, K.; Dam, T.-L., Efficient high utility itemset mining using buffered utility-lists, Appl. Intell., 48, 7, 1859-1877, 2018
[28] Guibas, L. J.; Sedgewick, R., A dichromatic framework for balanced trees, (19th Annual Symposium on Foundations of Computer Science (sfcs 1978), 1978), 8-21
[29] Nawaz, M. S.; Fournier-Viger, P.; Yun, U.; Wu, Y.; Song, W., Mining high utility itemsets with hill climbing and simulated annealing, ACM Trans. Manage. Inf. Syst., 13, 1, oct 2021
[30] Lin, J. C.-W.; Yang, L.; Fournier-Viger, P.; Hong, T.-P.; Voznak, M., A binary pso approach to mine high-utility itemsets, Soft Comput., 21, 17, 5103-5121, 2017
[31] Song, W.; Li, J.; Huang, C., Artificial fish swarm algorithm for mining high utility itemsets, (Advances in Swarm Intelligence: 12th International Conference, ICSI 2021. Advances in Swarm Intelligence: 12th International Conference, ICSI 2021, Qingdao, China, July 17-21, 2021, Proceedings, Part II, 2021, Springer-Verlag: Springer-Verlag Berlin, Heidelberg), 407-419
[32] Lin, C.-W.; Hong, T.-P.; Lu, W.-H., An effective tree structure for mining high utility itemsets, Expert Syst. Appl., 38, 6, 7419-7424, 2011
[33] Dam, T.-L.; Li, K.; Fournier-Viger, P.; Duong, Q.-H., Cls-miner: efficient and effective closed high-utility itemset mining, Front. Comput. Sci., 13, 2, 357-381, 2019
[34] Pramanik, S.; Goswami, A., Discovery of closed high utility itemsets using a fast nature-inspired ant colony algorithm, Appl. Intell., 52, 8, 8839-8855, 2022
[35] Hidouri, A.; Jabbour, S.; Raddaoui, B.; Yaghlane, B. Ben, Mining closed high utility itemsets based on propositional satisfiability, Data Knowl. Eng., 136, Article 101927 pp., 2021
[36] Wu, C.-W.; Fournier-Viger, P.; Gu, J.-Y.; Tseng, V. S., Mining closed+ high utility itemsets without candidate generation, (2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2015), 187-194
[37] Hidouri, A.; Jabbour, S.; Dlala, I. O.; Raddaoui, B., On minimal and maximal high utility itemsets mining using propositional satisfiability, (2021 IEEE International Conference on Big Data (Big Data), 2021), 622-628
[38] Duong, H.; Hoang, T.; Tran, T.; Truong, T.; Le, B.; Fournier-Viger, P., Efficient algorithms for mining closed and maximal high utility itemsets, Knowl.-Based Syst., 257, Article 109921 pp., 2022
[39] Fournier-Viger, P.; Wu, C.-W.; Tseng, V. S., Novel concise representations of high utility itemsets using generator patterns, (Luo, X.; Yu, J. X.; Li, Z., Advanced Data Mining and Applications, 2014, Springer International Publishing: Springer International Publishing Cham), 30-43
[40] Sahoo, J.; Das, A. K.; Goswami, A., An efficient approach for mining association rules from high utility itemsets, Expert Syst. Appl., 42, 13, 5754-5778, 2015
[41] Han, X.; Liu, X.; Li, J.; Gao, H., Efficient top-k high utility itemset mining on massive data, Inf. Sci., 557, 382-406, 2021 · Zbl 1484.68211
[42] Singh, K.; Singh, S. S.; Kumar, A.; Biswas, B., Tkeh: an efficient algorithm for mining top-k high utility itemsets, Appl. Intell., 49, 3, 1078-1097, 2019
[43] Krishnamoorthy, S., Mining top-k high utility itemsets with effective threshold raising strategies, Expert Syst. Appl., 117, 148-165, 2019
[44] Fournier-Viger, P.; Lin, J. C.-W.; Gomariz, A.; Gueniche, T.; Soltani, A.; Deng, Z.; Lam, H. T., The spmf open-source data mining library version 2, (Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part III 16, 2016, Springer), 36-40
[45] Zhang, Z.; Huang, J.; Hao, J.; Gong, J.; Chen, H., Extracting relations of crime rates through fuzzy association rules mining, Appl. Intell., 50, 2, 448-467, 2020
[46] Krishna, G. J.; Ravi, V., High utility itemset mining using binary differential evolution: an application to customer segmentation, Expert Syst. Appl., 181, Article 115122 pp., 2021
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.