×

CP-tree: an adaptive synopsis structure for compressing frequent itemsets over online data streams. (English) Zbl 1354.68075

Summary: Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process. This paper proposes a CP-tree (Compressible-prefix tree) that can be effectively employed in finding frequent itemsets over an online data stream. Unlike a prefix tree, a node of a CP-tree can maintain a concise synopsis that can be used to trace the supports of several itemsets together. As the number of itemsets that are traced by a node of a CP-tree is increased, the size of a CP-tree becomes smaller. However, the result of a CP-tree becomes less accurate since the estimated supports of those itemsets that are traced together by a node of a CP-tree may contain possible false positive or negative errors. Based on this characteristic, the size of a CP-tree can be controlled by merging or splitting the nodes of a CP-tree, which allows the utilization of a confined memory space as much as possible. Therefore, the accuracy of a CP-tree is maximized at all times for a confined memory space. Furthermore, a CP-tree can trace a concise set of representative frequent itemsets that can collectively represent the set of original frequent itemsets.

MSC:

68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68T05 Learning and adaptive systems in artificial intelligence

Software:

CLOSET
Full Text: DOI

References:

[14] Gouda, K.; Zaki, M., GenMax: an efficient algorithm for mining maximal frequent itemsets, Data Min. Knowl. Disc., 11, 1-20 (2005)
[20] Lin, D.; Kedem, Z. M., Pincer-search: an efficient algorithm for discovering the maximum frequent set, IEEE Trans. Knowl. Data Eng. (2002)
[27] Cheng, J.; Ke, Y.; Ng, W., A survey on algorithms for mining frequent itemsets over data streams, Appeared Knowledge Inf. Syst., 16, 1, 1-27 (2008)
[30] Liu, X.; Guan, J.; Hu, P., Mining frequent closed itemsets from a landmark window over online data streams, Comput. Math. Appl., 57, 6, 927-936 (2009) · Zbl 1186.68379
[31] Gama, J., Knowledge Discovery from Data Streams (2010), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton, Florida · Zbl 1230.68017
[32] Sayed-Mouchaweh, M.; Lughofer, E., Learning in Non-Stationary Environments: Methods and Applications (2012), Springer: Springer New York · Zbl 1451.68028
[33] Lughofer, E., Evolving Fuzzy Systems: Methodologies, Advanced Concepts and Applications (2011), Springer: Springer Berlin Heidelberg · Zbl 1216.68013
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.