Abstract
Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though the algorithm consumes more memory when the datasets are sparse, it is still the fastest one.
Similar content being viewed by others
References
Han J W, Pei J, Yin Y W. Mining frequent itemsets without candidate generation. In: The 2000 ACM SIGMOD International Conference on Management of data (SIGMOD’00), New York, 2000. 1–12
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: The 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD’93), Washington, 1993. 207–216
Han J, Cheng H, Xin D, et al. Frequent itemset mining: current status and future directions. Data Min Knowl Discov, 2007, 15: 55–86
Baralis E, Cerquitelli T, Chiusano S. IMine: index support for item set mining. IEEE TKDE J, 2009, 21: 493–506
Zaki M J, Gouda K. Fast vertical mining using diffsets, In: The 9th ACM SIGKDD International Conference on. Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 326–335
Deng Z H, Wang Z H. A new fast vertical method for mining frequent itemsets. Int J Comput Intell Syst, 2010, 3: 733–744
Agrawal R, Srikant R. Fast algorithm for mining Association rules. In: The 20th International Conference on Very Large Data Bases (VLDB’94), Santiago de Chile, 1994. 487–499
Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. In: The 21th International Conference on Very Large Data Bases (VLDB’95), Zurich, 1995. 432–443
Shenoy P, Haritsa J R, Sundarshan S, et al. Turbo-charging vertical mining of large databases. In: ACM International Conference on Management of Data and Symposium on Principles of Database Systems (SIGMOD’00), Dallas, 2000. 22–33
Zaki M J. Scalable algorithms for association mining. IEEE TKDE J, 2000, 12: 372–390
Pei J, Han J, Lu H, et al. H-mine: Hyper-structure mining of frequent itemsets in large databases. In: IEEE International Conference on Data Mining (ICDM’01), San Jose, 2001. 441–448
Liu G, Lu H, Lou W, et al. Efficient mining of frequent itemsets using ascending frequency ordered prefix-tree. DMKD J, 2004, 9: 249–274
Grahne G, Zhu J. Fast algorithms for frequent itemset mining using FP-Trees. IEEE TKDE J, 2005, 17: 1347–1362
Woon Y K, Ng W K, Lim E P. A support-ordered trie for fast frequent itemset discovery. IEEE TKDE J, 2004, 16: 875–879
Ananthanarayana V S, Murty N M, Subramanian D K. An incremental data mining algorithm for compact realization of prototypes. Pattern Recognit, 2001, 34: 2249–2251
Grust T. Accelerating xpath location steps, In: The 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD’02), Madison, 2002. 109–120
Kleinberg J, Tardos E. Algorithm design. Boston: Addison Wesley, 2005
Bayardo Jr R J. Efficiently mining long itemsets from databases. In: ACM SIGMOD International Conference on Management of Data (SIGMOD’98), Seattle, 1998. 85–93
Burdick D, Calimlim M, Flannick J, et al. Mafia: A maximal frequent itemset algorithm. IEEE TKDE J, 2005, 17: 1490–1504
Wang J Y, Han J, Pei J. CLOSET+: Searching for the Best Strate-gies for Mining frequent closed itemsets. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 236–245
Lee A J T, Wang C S, Weng W Y, et al. An efficient algorithm for mining closed inter-transaction itemsets. Data Knowle Eng, 2008, 66: 68–91
Xin D, Han J, Yan X, et al. On compressing frequent itemsets. Data Knowl Eng, 2007, 60: 5–29
Li H, Chen H. Mining non-derivable frequent itemsets over data stream. Data Knowl Eng, 2009, 68: 481–498
Yao H, Hamilton H J, Butz C J. A foundational approach to mining itemset utilities from databases. In: The SIAM Data Mining (SDM’04), Florida, 2004. 482–486
Yao H, Hamilton H J. Mining itemset utilities from transaction databases. Data Knowl Eng, 2006, 59: 603–626
Cao L B, Zhao Y C, Zhang C Q. Mining impact-targeted activity itemsets in imbalanced data. IEEE TKDE, 2008, 20: 1053–1066
Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 487–492
Li X, Deng Z H. Mining frequent itemsets from network flows for monitoring network. Expert Syst Appl, 2010, 37: 8850–8860
Agrawal R, Srikant R. Mining sequential itemsets. In: The 11th International Conference on Data Engineering (ICDE’95), Taiwan, 2003. 3–14
Yan X, Han J. gSpan: Graph-based substructure pattern mining. In: The 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi, 2002. 721–724
Cao L B, Zhang H F, Zhao Y C, et al. Combined mining: Discovering informative knowledge in complex data. IEEE Trans Syst Man Cybern Part B-Cybern, 2011, 41: 699–712
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deng, Z., Wang, Z. & Jiang, J. A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inf. Sci. 55, 2008–2030 (2012). https://doi.org/10.1007/s11432-012-4638-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-012-4638-z