×

An algebraic semigroup method for discovering maximal frequent itemsets. (English) Zbl 1511.68226

Summary: Discovering maximal frequent itemsets is an important issue and key technique in many data mining problems such as association rule mining. In the literature, generating maximal frequent itemsets proves either to be NP-hard or to have \(O({l}^3{4}^l(m+n))\) complexity in the worst case from the perspective of generating maximal complete bipartite graphs of a bipartite graph, where \(m\), \(n\) are the item number and the transaction number, respectively, and \(l\) denotes the maximum of \(|C| | \Psi(C)| / (| C| +| \Psi (C)| -1)\), with the maximum taken over all maximal frequent itemsets \(C\). In this article, we put forward a method for discovering maximal frequent itemsets, whose complexity is \(O(3mn{2}^{\beta }+{4}^{\beta }n)\), lower than the known complexity both in the worst case, from the perspective of semigroup algebra, where \(\beta\) is the number of items whose support is more than the minimum support threshold. Experiments also show that an algorithm based on the algebraic method performs better than the other three well-known algorithms. Meanwhile, we explore some algebraic properties with respect to items and transactions, prove that the maximal frequent itemsets are exactly the simplified generators of frequent itemsets, give a necessary and sufficient condition for a maximal \(i+1\)-frequent itemset being a subset of a closed \(i\)-frequent itemset, and provide a recurrence formula of maximal frequent itemsets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
20M99 Semigroups
68Q25 Analysis of algorithms and problem complexity
68W40 Analysis of algorithms

References:

[1] R. Agrawal, T. Imieliński, and A. Swami, Mining association rules between sets of items in large databases, ACM SIGMOD Record 22 (1993), no. 2, 207-216, . · doi:10.1145/170036.170072
[2] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo, Fast Discovery of Association Rules: Advances in Knowledge Discovery and Data Mining, MIT Press, California, 1996, pp. 307-328.
[3] J. Han and Y. Fu, Discovery of multiple-level association rules from large databases, in: VLDB ’95 Proceedings of the 21th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995, pp. 420-431.
[4] W. Hwang and D. Kim, Improved association rule mining by modified trimming, in: The Sixth IEEE International Conference on Computer and Information Technology (CIT’06), IEEE Computer Society, Los Alamitos, CA, USA, 2006, pp. 24-24, . · doi:10.1109/CIT.2006.101
[5] H. Mannila, H. Toivonen, and A. I. Verkamo, Discovering frequent episodes in sequences, in: Proceedings of First ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), AAAI Press, Palo Alto, CA, USA, 1995, pp. 210-215.
[6] D. Gunopulos, H. Mannila, and S. Saluja, Discovering all most specific sentences by randomized algorithm, in: F. Afrati, P. Kolaitis (eds), Database Theory - ICDT ’97, Lecture Notes in Computer Science, Vol 1186. Springer, Berlin, Heidelberg, 1997.
[7] R. J. Bayardo, Efficiently mining long patterns from databases, ACM SIGMOD Record 27 (1998), no. 2, 85-93, DOI: https://doi.org/10.1145/276305.276313. · doi:10.1145/276305.276313
[8] D. Eppstein, Arboricity and bipartite subgraph listing algorithms, Inform. Process. Lett. 51 (1994), no. 4, 207-211. · Zbl 0813.68115
[9] D. Lin and Z. M. Kedem, Pincer-search: an efficient algorithm for discovering the maximum frequent set, IEEE Trans. Knowl. Data Eng. 14 (2002), no. 3, 553-566, . · doi:10.1109/TKDE.2002.1000342
[10] E. Boros, V. Gurvich, L. Khachiyan, and K. Makino, On maximal frequent and minimal infrequent sets in binary matrices, Ann. Math. Artif. Intell. 39 (2003), 211-221, . · Zbl 1038.68041 · doi:10.1023/A:1024605820527
[11] M. M. Dhabu and P. S. Deshpande, Cardinality statistics based maximal frequent itemsets mining, in: S. Dua, A. Gangopadhyay, P. Thulasiraman, U. Straccia, M. Shepherd, B. Stein (eds), Information Systems, Technology and Management. Communications in Computer and Information Science, Vol. 285, Springer, Berlin, Heidelberg, 2021, . · doi:10.1007/978-3-642-29166-1_3
[12] M. M. J. Kabir, S. Xu, B. H. Kang, and Z. Zhao, Comparative analysis of genetic based approach and Apriori algorithm for mining maximal frequent item sets, in: 2015 IEEE Congress on Evolutionary Computation (CEC), 2015, pp. 39-45, . · doi:10.1109/CEC.2015.7256872
[13] M. R. Karim, M. Cochez, O. D. Beyan, C. F. Ahmed, and S. Decker, Mining maximal frequent patterns in transactional databases and dynamic data streams: A spark-based approach, Inf. Sci. 432 (2018), 278-300, . · Zbl 1436.68089 · doi:10.1016/j.ins.2017.11.064
[14] Z. Halim, O. Ali, and M. G. Khan, On the efficient representation of datasets as graphs to mine maximal frequent itemsets, IEEE Trans. Knowl. Data Eng. 33 (2021), no. 4, 1674-1691, . · doi:10.1109/TKDE.2019.2945573
[15] S. M. Fatemi, S. M. Hosseini, A. Kamandi, and M. Shabankhah, CL-MAX: a clustering-based approximation algorithm for mining maximal frequent itemsets, Int. J. Mach. Learn. Cybern. 12 (2021), no. 2, 365-383, . · doi:10.1007/s13042-020-01177-5
[16] Y. Zhang, W. Yu, X. Ma, H. Ogura, and D. Ye, Multi-objective optimization for high-dimensional maximal frequent itemset mining, Appl. Sci. 11 (2021), no. 19, 8971, . · doi:10.3390/app11198971
[17] D. Wu, D. Luo, C. S. Jensen, and J. Z. Huang, Efficiently mining maximal diverse frequent itemsets, in: G. Li, J. Yang, J. Gama, J. Natwichai, Y. Tong (eds), Database Systems for Advanced Applications. Lecture Notes in Computer Science, Vol 11447, Springer, Cham, 2019, . · doi:10.1007/978-3-030-18579-4_12
[18] A. H. Clifford and G. B. Preston, The Algebraic Theory of Semigroups, American Mathematical Society, Providence, Rhode Island, 1961. · Zbl 0111.03403
[19] J. M. Luna, P. Fournier-Viger, and S. Ventura, Frequent itemset mining: A 25 years review, Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9 (2019), no. 6, e1329, . · doi:10.1002/widm.1329
[20] M. J. Zaki, Scalable algorithms for association mining, IEEE Trans. Knowl. Data. Eng. 12 (2000), no. 3, 372-390, . · doi:10.1109/69.846291
[21] R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB ’94), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994, pp. 487-499.
[22] J. Han, J. Pei, and Y. Yin, Mining frequent patterns without candidate generation, SIGMOD Rec. 29 (2000), no. 2, 1-12, . · doi:10.1145/335191.335372
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.