A new algorithm for fast mining frequent itemsets using N-lists

719 Accesses
103 Citations
6 Altmetric
Explore all metrics

Abstract

Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though the algorithm consumes more memory when the datasets are sparse, it is still the fastest one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Han J W, Pei J, Yin Y W. Mining frequent itemsets without candidate generation. In: The 2000 ACM SIGMOD International Conference on Management of data (SIGMOD’00), New York, 2000. 1–12
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: The 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD’93), Washington, 1993. 207–216
Han J, Cheng H, Xin D, et al. Frequent itemset mining: current status and future directions. Data Min Knowl Discov, 2007, 15: 55–86
Article MathSciNet Google Scholar
Baralis E, Cerquitelli T, Chiusano S. IMine: index support for item set mining. IEEE TKDE J, 2009, 21: 493–506
Google Scholar
Zaki M J, Gouda K. Fast vertical mining using diffsets, In: The 9th ACM SIGKDD International Conference on. Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 326–335
Deng Z H, Wang Z H. A new fast vertical method for mining frequent itemsets. Int J Comput Intell Syst, 2010, 3: 733–744
Article Google Scholar
Agrawal R, Srikant R. Fast algorithm for mining Association rules. In: The 20th International Conference on Very Large Data Bases (VLDB’94), Santiago de Chile, 1994. 487–499
Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. In: The 21th International Conference on Very Large Data Bases (VLDB’95), Zurich, 1995. 432–443
Shenoy P, Haritsa J R, Sundarshan S, et al. Turbo-charging vertical mining of large databases. In: ACM International Conference on Management of Data and Symposium on Principles of Database Systems (SIGMOD’00), Dallas, 2000. 22–33
Zaki M J. Scalable algorithms for association mining. IEEE TKDE J, 2000, 12: 372–390
Google Scholar
Pei J, Han J, Lu H, et al. H-mine: Hyper-structure mining of frequent itemsets in large databases. In: IEEE International Conference on Data Mining (ICDM’01), San Jose, 2001. 441–448
Liu G, Lu H, Lou W, et al. Efficient mining of frequent itemsets using ascending frequency ordered prefix-tree. DMKD J, 2004, 9: 249–274
MathSciNet Google Scholar
Grahne G, Zhu J. Fast algorithms for frequent itemset mining using FP-Trees. IEEE TKDE J, 2005, 17: 1347–1362
Google Scholar
Woon Y K, Ng W K, Lim E P. A support-ordered trie for fast frequent itemset discovery. IEEE TKDE J, 2004, 16: 875–879
Google Scholar
Ananthanarayana V S, Murty N M, Subramanian D K. An incremental data mining algorithm for compact realization of prototypes. Pattern Recognit, 2001, 34: 2249–2251
Article MATH Google Scholar
Grust T. Accelerating xpath location steps, In: The 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD’02), Madison, 2002. 109–120
Kleinberg J, Tardos E. Algorithm design. Boston: Addison Wesley, 2005
Google Scholar
Bayardo Jr R J. Efficiently mining long itemsets from databases. In: ACM SIGMOD International Conference on Management of Data (SIGMOD’98), Seattle, 1998. 85–93
Burdick D, Calimlim M, Flannick J, et al. Mafia: A maximal frequent itemset algorithm. IEEE TKDE J, 2005, 17: 1490–1504
Google Scholar
Wang J Y, Han J, Pei J. CLOSET+: Searching for the Best Strate-gies for Mining frequent closed itemsets. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 236–245
Lee A J T, Wang C S, Weng W Y, et al. An efficient algorithm for mining closed inter-transaction itemsets. Data Knowle Eng, 2008, 66: 68–91
Article Google Scholar
Xin D, Han J, Yan X, et al. On compressing frequent itemsets. Data Knowl Eng, 2007, 60: 5–29
Article Google Scholar
Li H, Chen H. Mining non-derivable frequent itemsets over data stream. Data Knowl Eng, 2009, 68: 481–498
Article Google Scholar
Yao H, Hamilton H J, Butz C J. A foundational approach to mining itemset utilities from databases. In: The SIAM Data Mining (SDM’04), Florida, 2004. 482–486
Yao H, Hamilton H J. Mining itemset utilities from transaction databases. Data Knowl Eng, 2006, 59: 603–626
Article Google Scholar
Cao L B, Zhao Y C, Zhang C Q. Mining impact-targeted activity itemsets in imbalanced data. IEEE TKDE, 2008, 20: 1053–1066
Google Scholar
Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 487–492
Li X, Deng Z H. Mining frequent itemsets from network flows for monitoring network. Expert Syst Appl, 2010, 37: 8850–8860
Article Google Scholar
Agrawal R, Srikant R. Mining sequential itemsets. In: The 11th International Conference on Data Engineering (ICDE’95), Taiwan, 2003. 3–14
Yan X, Han J. gSpan: Graph-based substructure pattern mining. In: The 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi, 2002. 721–724
Cao L B, Zhang H F, Zhao Y C, et al. Combined mining: Discovering informative knowledge in complex data. IEEE Trans Syst Man Cybern Part B-Cybern, 2011, 41: 699–712
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
ZhiHong Deng, ZhongHui Wang & JiaJian Jiang

Authors

ZhiHong Deng
View author publications
You can also search for this author in PubMed Google Scholar
ZhongHui Wang
View author publications
You can also search for this author in PubMed Google Scholar
JiaJian Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to ZhiHong Deng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, Z., Wang, Z. & Jiang, J. A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inf. Sci. 55, 2008–2030 (2012). https://doi.org/10.1007/s11432-012-4638-z

Download citation

Received: 18 January 2012
Accepted: 07 March 2012
Published: 19 July 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s11432-012-4638-z

A new algorithm for fast mining frequent itemsets using N-lists

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Study of Effective Mining Algorithms for Frequent Itemsets

Mining Frequent Sequences Using Itemset-Based Extension

Efficiently mining frequent itemsets with weight and recency constraints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A new algorithm for fast mining frequent itemsets using N-lists

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Study of Effective Mining Algorithms for Frequent Itemsets

Mining Frequent Sequences Using Itemset-Based Extension

Efficiently mining frequent itemsets with weight and recency constraints

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation