×

MAX-FISM: mining (recently) maximal frequent itemsets over data streams using the sliding window model. (English) Zbl 1268.68071

Summary: Frequent itemset mining from data streams is an important data mining problem with broad applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. However, it is also a difficult problem due to the unbounded, high-speed and continuous characteristics of streaming data. Therefore, extracting frequent itemsets from more recent data can enhance the analysis of stream data. In this paper, we propose an efficient algorithm, called Max-FISM (Maximal-Frequent Itemsets Mining), for mining recent maximal frequent itemsets from a high-speed stream of transactions within a sliding window. According to our algorithm, whenever a new transaction is inserted in the current window only its maximum itemset should be inserted into a prefix tree-based summary data structure called Max-Set for maintaining the number of independent appearance of each transaction in the current window. Finally, the set of recent maximal frequent itemsets is obtained from the current Max-Set. Experimental studies show that the proposed Max-FISM algorithm is highly efficient in terms of memory and time complexity for mining recent maximal frequent itemsets over high-speed data streams.

MSC:

68P15 Database theory
68T10 Pattern recognition, speech recognition

Software:

MAX-FISM; CLOSET
Full Text: DOI

References:

[1] M.N. Garofalakis, J. Gehrke, R. Rastogi, Querying and mining data streams: you only get one look a tutorial, in: Proc. 2002 ACM-SIGMOD Int. Conf. On Management-of Data, SIGMOD’02, Madison, WI, June 2002, p. 635.; M.N. Garofalakis, J. Gehrke, R. Rastogi, Querying and mining data streams: you only get one look a tutorial, in: Proc. 2002 ACM-SIGMOD Int. Conf. On Management-of Data, SIGMOD’02, Madison, WI, June 2002, p. 635.
[2] H.J. Woo, W.S. Lee, EstMax: tracing maximal frequent itemsets over online data streams, in: Proc. ICDM, 2007, pp. 709-714.; H.J. Woo, W.S. Lee, EstMax: tracing maximal frequent itemsets over online data streams, in: Proc. ICDM, 2007, pp. 709-714.
[3] Y. Zhu, D. Shasha, Statstream: statistical monitoring of thousands of data streams in real time, in: Proc. VLDB, 2002, pp. 358-369.; Y. Zhu, D. Shasha, Statstream: statistical monitoring of thousands of data streams in real time, in: Proc. VLDB, 2002, pp. 358-369.
[4] Y. Chi, H. Wang, P.S. Yu, R.R. Muntz, Catch the moment: maintaining closed frequent itemsets over a data stream sliding window, in: Proceedings of Knowl. Inf. Syst. 2006, pp. 265-294.; Y. Chi, H. Wang, P.S. Yu, R.R. Muntz, Catch the moment: maintaining closed frequent itemsets over a data stream sliding window, in: Proceedings of Knowl. Inf. Syst. 2006, pp. 265-294.
[5] H. Li, S. Lee, Mining frequent itemsets over data streams using efficient window sliding techniques, presented at Expert Syst. Appl., 2009, pp. 1466-1477.; H. Li, S. Lee, Mining frequent itemsets over data streams using efficient window sliding techniques, presented at Expert Syst. Appl., 2009, pp. 1466-1477.
[6] R. Agrawal, T. Imielinski, A.N. Swami, Mining association rules between sets of items in large databases, in: Proc. SIGMOD Conference, 1993, pp. 207-216.; R. Agrawal, T. Imielinski, A.N. Swami, Mining association rules between sets of items in large databases, in: Proc. SIGMOD Conference, 1993, pp. 207-216.
[7] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proc. ICDT, 1999, pp. 398-416.; N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proc. ICDT, 1999, pp. 398-416. · Zbl 0983.68511
[8] J. Pei, J. Han, R. Mao, CLOSET: an efficient algorithm for mining frequent closed itemsets, in: Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000, pp. 21-30.; J. Pei, J. Han, R. Mao, CLOSET: an efficient algorithm for mining frequent closed itemsets, in: Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000, pp. 21-30.
[9] M.J. Zaki, C. Hsiao, CHARM: an efficient algorithm for closed itemset mining, in: Proc. SDM, 2002, pp. 457-473.; M.J. Zaki, C. Hsiao, CHARM: an efficient algorithm for closed itemset mining, in: Proc. SDM, 2002, pp. 457-473.
[10] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: Proc. SIGMOD Conference, 2000, pp. 1-12.; J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: Proc. SIGMOD Conference, 2000, pp. 1-12.
[11] H. Li, M. Shan, S. Lee, DSM-FI: an efficient algorithm for mining frequent itemsets in data streams, presented at Knowl. Inf. Syst., 2008, pp. 79-97.; H. Li, M. Shan, S. Lee, DSM-FI: an efficient algorithm for mining frequent itemsets in data streams, presented at Knowl. Inf. Syst., 2008, pp. 79-97.
[12] G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in: Proc. VLDB, 2002, pp. 346-357.; G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in: Proc. VLDB, 2002, pp. 346-357.
[13] J.X. Yu, Z. Chong, H. Lu, Z. Zhang, A. Zhou, A false negative approach to mining frequent itemsets from high speed transactional data streams, presented at Information Science, 2006, pp. 1986-2015.; J.X. Yu, Z. Chong, H. Lu, Z. Zhang, A. Zhou, A false negative approach to mining frequent itemsets from high speed transactional data streams, presented at Information Science, 2006, pp. 1986-2015.
[14] C. Lee, C. Lin, M. Chen, Sliding-window filtering: an efficient algorithm for incremental mining, in: Proc. CIKM, 2001, pp. 263-270.; C. Lee, C. Lin, M. Chen, Sliding-window filtering: an efficient algorithm for incremental mining, in: Proc. CIKM, 2001, pp. 263-270.
[15] J.H. Chang, W.S. Lee, Finding recent frequent itemsets adaptively over online data streams, in: Proc. KDD, 2003, pp. 487-492.; J.H. Chang, W.S. Lee, Finding recent frequent itemsets adaptively over online data streams, in: Proc. KDD, 2003, pp. 487-492.
[16] Mao, G.; Wu, X.; Zhu, X.; Chen, G., Mining maximal frequent itemsets from data streams, J. Inform. Sci., 33, 3, 251-262 (2007)
[17] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proc. VLDB, 1994, pp. 487-499.; R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proc. VLDB, 1994, pp. 487-499.
[18] Z. Zheng, R. Kohavi, L. Mason, Real world performance of association rule algorithms, in: Proc. KDD, 2001, pp. 401-406.; Z. Zheng, R. Kohavi, L. Mason, Real world performance of association rule algorithms, in: Proc. KDD, 2001, pp. 401-406.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.