×

Dual incremental fuzzy schemes for frequent itemsets discovery in streaming numeric data. (English) Zbl 1457.68245

Summary: Discovering frequent itemsets is essential for finding association rules, yet too computational expensive using existing algorithms. It is even more challenging to find frequent itemsets upon streaming numeric data. The streaming characteristic leads to a challenge that streaming numeric data cannot be scanned repetitively. The numeric characteristic requires that streaming numeric data should be pre-processed into itemsets, e.g., fuzzy-set methods can transform numeric data into itemsets with non-integer membership values. This leads to a challenge that the frequency of itemsets are usually not integer. To overcome such challenges, fast methods and stream processing methods have been applied. However, the existing algorithms usually either still need to re-visit some previous data multiple times, or cannot count non-integer frequencies. Those existing algorithms re-visiting some previous data have to sacrifice large memory spaces to cache those previous data to avoid repetitive scanning. When dealing with big streaming data nowadays, such large-memory requirement often goes beyond the capacity of many computers. Those existing algorithms unable to count non-integer frequencies would be very inaccurate in estimating the non-integer frequencies of frequent itemsets if used with integer approximation of frequency-counting. To solve the aforementioned issues, in this paper we propose two incremental schemes for frequent itemsets discovery that are capable to work efficiently with streaming numeric data. In particular, they are able to count non-integer frequency without re-visiting any previous data. The key of our schemes to the benefits in efficiency is to extract essential statistics that would occupy much less memory than the raw data do for the ongoing streaming data. This grants the advantages of our schemes 1) allowing non-integer counting and thus natural integration with a fuzzy-set discretization method to boost robustness and anti-noise capability for numeric data, 2) enabling the design of a decay ratio for different data distributions, which can be adapted for three general stream models: landmark, damped and sliding windows, and 3) achieving highly-accurate fuzzy-item-sets discovery with efficient stream-processing. Experimental studies demonstrate the efficiency and effectiveness of our dual schemes with both synthetic and real-world datasets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62L86 Fuzziness and sequential statistical methods
Full Text: DOI

References:

[1] Agrawal, R.; Srikant, R., Fast algorithms for mining association rules, Proc. 20th int. conf. very large data bases, VLDB, 1215, 487-499 (1994)
[2] Cai, X.; Niu, Y.; Geng, S.; Zhang, J.; Cui, Z.; Li, J.; Chen, J., An under-sampled software defect prediction method based on hybrid multi-objective cuckoo search, Concurrency and Computation: Practice and Experience, e5478 (2019)
[3] GG. WangX. Cai, Z. C.; Chen, J., High performance computing for cyber physical social system by using evolutionary multi-objective optimization algorithm, IEEE Transactions on Emerging Topics in Computing (2017)
[4] DOI: https://doi.org/10.1002/cpe.5464
[5] Li, C.-W.; Jea, K.-F., An adaptive approximation method to discover frequent itemsets over sliding-window-based data streams, Expert Systems with Applications, 38, 10, 13386-13404 (2011)
[6] Moustafa, A.; Abuelnasr, B.; Abougabal, M. S., Efficient mining fuzzy association rules from ubiquitous data streams, Alexandria Engineering Journal, 54, 2, 163-174 (2015)
[7] Ghosh, S.; Biswas, S.; Sarkar, D.; Sarkar, P. P., Mining frequent itemsets using genetic algorithm, arXiv preprint arXiv:1011.0328 (2010)
[8] Cui, Z.; Sun, B.; Wang, G.; Xue, Y.; Chen, J., A novel oriented cuckoo search algorithm to improve dv-hop performance for cyber-physical systems, Journal of Parallel and Distributed Computing, 103, 42-52 (2017)
[9] Wang, Y.; Cao, Y.; Cai, X.; Zhang, W.; Chen, J., A pigeon-inspired optimization algorithm for many-objective optimization problems, Science China Information Sciences, 62, 7, 70-212 (2019)
[10] Manku, G. S.; Motwani, R., Approximate frequency counts over data streams, VLDB’02: Proceedings of the 28th International Conference on Very Large Databases, 346-357 (2002), Elsevier
[11] Ruiz, E.; Casillas, J., Adaptive fuzzy partitions for evolving association rules in big data stream, International Journal of Approximate Reasoning, 93, 463-486 (2018) · Zbl 1452.68173
[12] Han, J.; Pei, J.; Yin, Y., Mining frequent patterns without candidate generation, ACM SIGMOD Record, 29, 1-12 (2000), ACM
[13] Zaki, M. J., Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering, 12, 3, 372-390 (2000)
[14] Cheung, D. W.; Han, J.; Ng, V. T.; Wong, C., Maintenance of discovered association rules in large databases: An incremental updating technique, Data Engineering, 1996. Proceedings of the Twelfth International Conference on, 106-114 (1996), IEEE
[15] D.W. Cheung, S.D. Lee, B. Kao, A general incremental technique for maintaining discovered association rules (1997).
[16] Hong, T.-P.; Wang, C.-Y.; Tao, Y.-H., A new incremental data mining algorithm using pre-large itemsets, Intelligent Data Analysis, 5, 2, 111-129 (2001) · Zbl 1088.68568
[17] Zhou, Z.; Ezeife, C., A low-scan incremental association rule maintenance method based on the apriori property, Advances in Artificial Intelligence, 26-35 (2001), Springer · Zbl 0984.68673
[18] Gharib, T. F.; Nassar, H.; Taha, M.; Abraham, A., An efficient algorithm for incremental mining of temporal association rules, Data & Knowledge Engineering, 69, 8, 800-815 (2010)
[19] Song, C.; Liu, X.; Ge, T.; Ge, Y., Top-k frequent items and item frequency tracking over sliding windows of any size, Information Sciences, 475, 100-120 (2019) · Zbl 1442.68215
[20] Yang, W. C.; Qu, Q. Y.; Ma, P. F., An improved incremental queue association rules for mining mass text, Advanced Materials Research, 962, 2687-2690 (2014), Trans Tech Publ
[21] Albert, D. W.; Fayaz, K.; Babu, D. V., Hsapriori: High speed association rule mining using apriori based algorithm for gpu, Int. J. of Multidisciplinary and Current research (2014)
[22] He, B., Fast mining algorithm of association rules base on cloud computing, 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (2012), Atlantis Press
[23] Wang, L.; Cheung, D.-L.; Cheng, R.; Lee, S. D.; Yang, X. S., Efficient mining of frequent item sets on large uncertain databases, Knowledge and Data Engineering, IEEE Transactions on, 24, 12, 2170-2183 (2012)
[24] Tanbeer, S. K.; Ahmed, C. F.; Jeong, B.-S.; Lee, Y.-K., Cp-tree: a tree structure for single-pass frequent pattern mining, Pacific-Asia Conference on Knowledge Discovery and Data Mining, 1022-1027 (2008), Springer
[25] Jyotikumari, P.; Singh, Y.; Mahanta, S.; Phalke, D.; Kalia, A., An enriched framework for outsourced computation evaluation of frequent item set in the cloud, International Journal, 5, 5 (2017)
[26] R.M. Bittmann, P. Nemery, X. Shi, M. Kemelmakher, M. Wang, Frequent item-set mining without ubiquitous items, arXiv preprint arXiv:1803.11105(2018).
[27] M. Hassan M. Rehmani, R. K.J. Z.; Chen, J., Differential privacy for renewable energy resources based smart metering, Journal of Parallel and Distributed Computing (2019)
[28] Cui, Z.; Cao, Y.; Cai, X.; Cai, J.; Chen, J., Optimal leach protocol with modified bat algorithm for big data sensing systems in internet of things, Journal of Parallel and Distributed Computing (2018)
[29] Y. Li, B. S.; Chen, J., Image encryption based on a single-round dictionary and chaotic sequences in cloud computing, Concurrency and Computation: Practice and Experience (2019)
[30] Zhang, M.; Wang, H.; Cui, Z.; Chen, J., Hybrid multi-objective cuckoo search with dynamical local search, Memetic Computing, 10, 2, 199-208 (2018)
[31] Cai, X.; Cao, Y.; Wang, G.; Chen, J.; Cui, Z.; Xue, F., Detection of malicious code variants based on deep learning, IEEE Trans. Industrial Informatics, 14, 7, 3187-3196 (2018)
[32] Berry, M. J.; Linoff, G., Data mining techniques: for marketing, sales, and customer support (1997), John Wiley & Sons, Inc.
[33] Chi, Y.; Wang, H.; Yu, P. S.; Muntz, R. R., Moment: Maintaining closed frequent itemsets over a stream sliding window, Data Mining, 2004. ICDM’04. Fourth IEEE International Conference on, 59-66 (2004), IEEE
[34] Jiang, N.; Gruenwald, L., Research issues in data stream association rule mining, ACM Sigmod Record, 35, 1, 14-19 (2006)
[35] Zheng, H.; He, J.; Huang, G.; Zhang, Y., Optimized fuzzy association rule mining for quantitative data, Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference on, 396-403 (2014), IEEE
[36] Kaytoue, M.; Kuznetsov, S. O.; Napoli, A., Revisiting numerical pattern mining with formal concept analysis, Twenty-Second International Joint Conference on Artificial Intelligence (2011)
[37] Zhihua, C.; Yechuang, W.; Xingjuan, C., A pigeon-inspired optimization algorithm for many-objective optimization problems, Science China Information Sciences (2019)
[38] Cui, Z.; Cao, Y.; Cai, X.; Cai, J.; Chen, J., Optimal leach protocol with modified bat algorithm for big data sensing systems in internet of things, Journal of Parallel and Distributed Computing, 132, 217-229 (2019)
[39] Cui, Z.; Sun, B.; Wang, G.; Xue, Y.; Chen, J., A novel oriented cuckoo search algorithm to improve dv-hop performance for cyber-physical systems, Journal of Parallel and Distributed Computing, 103, 42-52 (2017)
[40] Wang, P.; Xue, F.; Li, H.; Cui, Z.; Xie, L.; Chen, J., A multi-objective dv-hop localization algorithm based on nsga-ii in internet of things, Mathematics, 7, 2, 184 (2019)
[41] Wang, Y.; Wang, P.; Zhang, J.; Cui, Z.; Cai, X.; Zhang, W.; Chen, J., A novel bat algorithm with multiple strategies coupling for numerical optimization, Mathematics, 7, 2, 135 (2019)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.