×

Effective approximation of parametrized closure systems over transactional data streams. (English) Zbl 1497.68440

Summary: Strongly closed itemsets, defined by a parameterized closure operator, are a generalization of ordinary closed itemsets. Depending on the strength of closedness, the family of strongly closed itemsets typically forms a tiny subfamily of ordinary closed itemsets that is stable against changes in the input. In this paper we consider the problem of mining strongly closed itemsets from transactional data streams. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speed-up over a straightforward naive algorithm without any significant loss in precision and recall. We motivate the problem setting considered by two practical applications. In particular, we first experimentally demonstrate that the above properties, i.e., compactness and stability, make strongly closed itemsets an excellent indicator of certain types of concept drifts in transactional data streams. As a second application we consider computer-aided product configuration, a real-world problem raised by an industrial project. For this problem, which is essentially exact concept identification, we propose a learning algorithm based on a certain type of subset queries formed by strongly closed itemsets and show on real-world datasets that it requires significantly less query evaluations than a naive algorithm based on membership queries.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68P15 Database theory
68W27 Online algorithms; streaming algorithms

Software:

UCI-ml; StreamKrimp
Full Text: DOI

References:

[1] Angluin, D., Queries and concept learning, Machine Learning, 4, 2, 319-342 (1987) · Zbl 1470.68050
[2] Bai, S. P., & Kumar, G. K. R. (2016). A survey on closed frequent itemset mining on data streams. In Proceedings of the 2nd International Conference on Contemporary Computing and Informatics (IC3I) (pp. 542-547).
[3] Boley, M.; Horváth, T.; Poigné, A.; Wrobel, S., Listing closed sets of strongly accessible set systems with applications to data mining, Theoretical Computer Science, 411, 3, 691-700 (2010) · Zbl 1191.68346 · doi:10.1016/j.tcs.2009.10.024
[4] Boley, M.; Horváth, T.; Wrobel, S., Efficient discovery of interesting patterns based on strong closedness, Statistical Analysis and Data Mining, 2, 5-6, 346-360 (2009) · doi:10.1002/sam.10057
[5] Boros, E.; Gurvich, V.; Khachiyan, L.; Makino, K., On maximal frequent and minimal infrequent sets in binary matrices, Annals of Mathematics and Artificial Intelligence, 39, 3, 211-221 (2003) · Zbl 1038.68041 · doi:10.1023/A:1024605820527
[6] Chi, Y., Wang, H., Yu, P. S., Muntz, R. R. (2004). Moment: maintaining closed frequent itemsets over a stream sliding window. In Proceedings of 4th International Conference on Data Mining (ICDM) (pp. 59-66).
[7] Dua, D., & Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml/index.php.
[8] Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A., A survey on concept drift adaptation, ACM Computing Surveys, 46, 4, 44:1-44:37 (2014) · Zbl 1305.68141 · doi:10.1145/2523813
[9] Ganter, B.; Reuter, K., Finding all closed sets: A general approach, Order, 8, 3, 283-290 (1991) · Zbl 0754.06003 · doi:10.1007/BF00383449
[10] Gély, A., A generic algorithm for generating closed sets of a binary relation, 223-234 (2005), Berlin: Springer, Berlin · Zbl 1078.68773
[11] Gupta, A., Bhatnagar, V., & Kumar, N. (2010). Mining closed itemsets in data stream using formal concept analysis. In Proceedings of 12th International Conference on Data Warehousing and Knowledge Discovery, DaWaK (pp. 285-296).
[12] Hoeffding, W., Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58, 301, 13-30 (1963) · Zbl 0127.10602 · doi:10.1080/01621459.1963.10500830
[13] Iwanuma, K., Yamamoto, Y., & Fukuda, S. (2016). An on-line approximation algorithm for mining frequent closed itemsets based on incremental intersection. In Proceedings of the 19th International Conference on Extending Database Technology, (pp. 704-705).
[14] Jiang, N., & Gruenwald, L. (2006). CFI-Stream: Mining closed frequent itemsets in data streams. In Proceedings of the 12th ACM SIGKDD, (pp. 592-597).
[15] Kifer, D., Ben-David, S., & Gehrke, J. (2004). Detecting change in data streams. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB), (pp. 180-191).
[16] Knuth, DE, The art of computer programming. Vol. 2: Seminumerical algorithms (1997), Reading: Addison-Wesley, Reading · Zbl 0895.68055
[17] Li, C-W; Jea, K-F, An approach of support approximation to discover frequent patterns from concept-drifting data streams based on concept learning, Knowledge and Information Systems, 40, 3, 639-671 (2014) · doi:10.1007/s10115-013-0656-4
[18] Liu, X.; Guan, J.; Hu, P., Mining frequent closed itemsets from a landmark window over online data streams, Computers & Mathematics with Applications, 57, 6, 927-936 (2009) · Zbl 1186.68379 · doi:10.1016/j.camwa.2008.10.060
[19] Loglisci, C.; Ceci, M.; Impedovo, A.; Malerba, D., Mining microscopic and macroscopic changes in network data streams, Knowledge-Based Systems, 161, 294-312 (2018) · doi:10.1016/j.knosys.2018.07.011
[20] Manku, G. S., & Motwani, R. (2002) Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases, (VLDB), (pp. 346-357). VLDB Endowment.
[21] Pasquier, N.; Bastide, Y.; Taouil, R.; Lakhal, L., Efficient mining of association rules using closed itemset lattices, Information Systems, 24, 1, 25-46 (1999) · doi:10.1016/S0306-4379(99)00003-4
[22] Serfling, RJ, Probability inequalities for the sum in sampling without replacement, Annals Statistics, 2, 1, 39-48 (1974) · Zbl 0288.62005 · doi:10.1214/aos/1176342611
[23] Trabold, D., & Horváth, T. (2017). Mining strongly closed itemsets from data streams. In A. Yamamoto, T. Kida, T. Uno, and T. Kuboyama (Eds.), Discovery science. DS 2017. Lecture Notes in Computer Science (Vol. 10558, pp. 251-266). Cham: Springer.
[24] Tsai, PSM, Mining frequent itemsets in data streams using the weighted sliding window model, Expert Systems with Applications, 36, 9, 11617-11625 (2009) · doi:10.1016/j.eswa.2009.03.025
[25] van Leeuwen, M., Siebes, A. (2008). StreamKrimp: Detecting Change in Data Streams. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases 2008, (ECML PKDD), (pp. 672-687).
[26] Vitter, JS, Random sampling with a reservoir, ACM Transactions on Mathematical Software, 11, 1, 37-57 (1985) · Zbl 0562.68028 · doi:10.1145/3147.3165
[27] Yen, S. J., Wu, C. W., Lee, Y. S., Tseng, V. S., Hsieh, C. H. (2011). A fast algorithm for mining frequent closed itemsets over stream sliding window. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (pp. 996-1002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.