×

A joint matrix factorization and clustering scheme for irregular time series data. (English) Zbl 07837655

Summary: Key Performance Indicator (KPI) clustering plays an important role in Artificial Intelligence for IT Operations (AIOps) when the number of KPIs is large. This approach can effectively reduce the overhead by dividing KPIs into several classes, then applying the same anomaly detection or prediction model to all KPIs in a class. However, KPI sampling strategies vary depending on the environment in question, which leads to the production of irregular KPIs. Few existing works have considered the clustering of KPIs with irregular sampling. Matrix factorization (MF) is widely applied in low-rank data recovery and can be used to align and fill irregular KPIs. However, the clustering performance after recovering and filling by MF remains unknown. These two problems interact with each other and should therefore be solved together. Accordingly, we formulate the joint MF and clustering problem for irregular KPIs and design an iterative clustering scheme based on MF. This iterative clustering scheme comprises alignment and pre-filling, the loop of clustering, and subclass filling by MF, and can work with two pre-filling methods. Extensive experiments on two real-world datasets show that the iterative clustering scheme can obtain higher normalized mutual information (NMI) than non-iterative clustering, while also consuming less computational time than Dynamic Time Warping (DTW). The two kinds of pre-filling methods each have their advantages on different datasets.

MSC:

62-XX Statistics
68-XX Computer science

Software:

hdbscan; AS 136
Full Text: DOI

References:

[1] Aghabozorgi, S.; Seyed Shirkhorshidi, A.; Ying Wah, T., Time-series clustering – a decade review, Inf. Sci., 53, 16-38 (2015)
[2] Bu, J.; Liu, Y.; Zhang, S.; Meng, W.; Liu, Q.; Zhu, X.; Pei, D., Rapid deployment of anomaly detection models for large number of emerging kpi streams, (Proc. IEEE 37th International Performance Computing and Communications Conference (2018)), 1-8
[3] Cheng, Y., Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Anal. Mach. Intell., 17, 790-799 (1995)
[4] Dang, Y.; Lin, Q.; Huang, P., Aiops: real-world challenges and research innovations, (2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) (2019), IEEE), 4-5
[5] Dau, H. A.; Begum, N.; Keogh, E., Semi-supervision dramatically improves time series clustering under dynamic time warping, (Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (2016)), 999-1008
[6] Dau, H. A.; Keogh, E.; Kamgar, K.; Yeh, C. C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C. A.; Chen, Y.; Hu, B.; Begum, N.; Bagnall, A.; Mueen, A.; Batista, G.; Hexagon-ML, The ucr time series classification archive (2018)
[7] Degirmenci, A.; Karal, O., Efficient density and cluster based incremental outlier detection in data streams, Inf. Sci., 607, 901-920 (2022)
[8] Ester, M.; Kriegel, H. P.; Sander, J.; Xu, X., A density-based algorithm for discovering clusters in large spatial databases with noise, (The Second International Conference on Knowledge Discovery and Data Mining (KDD) (1996)), 226-231
[9] Gharghabi, S.; Imani, S.; Bagnall, A.; Darvishzadeh, A.; Keogh, E., Matrix profile xii: Mpdist: a novel time series distance measure to allow data mining in more challenging scenarios, (2018 IEEE International Conference on Data Mining (ICDM) (2018), IEEE), 965-970
[10] Guijo-Rubio, D.; Durán-Rosal, A. M.; Gutiérrez, P. A.; Troncoso, A.; Hervás-Martínez, C., Time-series clustering based on the characterization of segment typologies, IEEE Trans. Cybern., 51, 5409-5422 (2020)
[11] Hallac, D.; Vare, S.; Boyd, S.; Leskovec, J., Toeplitz inverse covariance-based clustering of multivariate time series data, (Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2017)), 215-223
[12] Hartigan, J. A.; Wong, M. A., Algorithm as 136: a k-means clustering algorithm, J. R. Stat. Soc., Ser. C, Appl. Stat., 28, 100-108 (1979) · Zbl 0447.62062
[13] Hartuv, E.; Shamir, R., A clustering algorithm based on graph connectivity, Inf. Process. Lett., 76, 175-181 (2000) · Zbl 0996.68525
[14] He, S.; Guo, M.; Yang, B.; Alfarraj, O.; Tolba, A.; Sharma, P. K.; Yan, X., Fine-grained multivariate time series anomaly detection in iot, Comput. Mater. Continua, 75, 5027-5047 (2023)
[15] He, S.; Li, Z.; Wang, J.; Xiong, N. N., Intelligent detection for key performance indicators in industrial-based cyber-physical systems, IEEE Trans. Ind. Inform., 17, 5799-5809 (2021)
[16] He, S.; Tuo, D.; Chen, B.; Sherratt, R. S.; Wang, J., Unsupervised log anomaly detection method based on multi-feature, Comput. Mater. Continua, 99, 1-20 (2023)
[17] Johnpaul, C. I.; Munaga, V. P.; Nickolas, S.; Gangadharan, G., Trendlets: a novel probabilistic representational structures for clustering the time series data, Expert Syst. Appl., 145, 113-119 (2020)
[18] Kowalski, P. A.; Jeczmionek, E., Parallel complete gradient clustering algorithm and its properties, Inf. Sci., 600, 155-169 (2022) · Zbl 07810845
[19] Lei, Q.; Yi, J.; Vaculin, R.; Wu, L.; Dhillon, I. S., Similarity preserving representation learning for time series clustering, (International Joint Conference on Artificial Intelligence (IJCAI) (2019)), 2845-2851
[20] Li, H.; Liu, J.; Yang, Z.; Liu, R. W.; Wu, K.; Wan, Y., Adaptively constrained dynamic time warping for time series classification and clustering, Inf. Sci., 534, 97-116 (2020) · Zbl 1465.62151
[21] Li, Z.; Zhao, Y.; Liu, R.; Pei, D., Robust and rapid clustering of kpis for large-scale anomaly detection, (Proc. IEEE/ACM 26th International Symposium on Quality of Service (IWQoS) (2018)), 1-10
[22] Liao, N.; Li, X., Traffic anomaly detection model using k-means and active learning method, Int. J. Fuzzy Syst., 24, 2264-2282 (2022)
[23] Liao, T. W., A clustering procedure for exploratory mining of vector time series, Pattern Recognit., 40, 2550-2562 (2007) · Zbl 1118.68632
[24] Ma, M.; Zhang, S.; Chen, J.; Xu, J.; Li, H.; Lin, Y.; Nie, X.; Zhou, B.; Wang, Y.; Pei, D., Jump-starting multivariate time series anomaly detection for online service systems, (2021 USENIX Annual Technical Conference (USENIX ATC 21) (2021)), 413-426
[25] Ma, Q.; Zheng, J.; Li, S.; Cottrell, G. W., Learning representations for time series clustering, (Proceedings of the 33rd International Conference on Neural Information Processing Systems (2019)), 3781-3791
[26] Ma, R.; Angryk, R., Distance and density clustering for time series data, (2017 IEEE International Conference on Data Mining Workshops (ICDMW) (2017)), 25-32
[27] McInnes, L.; Healy, J.; Astels, S., hdbscan: hierarchical density based clustering, J. Open Sour. Softw., 2, 205 (2017)
[28] Mestres, A.; Rodriguez-Natal, A.; Carner, J.; Barlet-Ros, P.; Alarcón, E.; Solé, M.; Muntés-Mulero, V.; Meyer, D.; Barkai, S.; Hibbett, M. J., Knowledge-defined networking, Comput. Commun. Rev., 47, 2-10 (2017)
[29] Nie, L.; Zhao, L.; Li, K., Robust anomaly detection using reconstructive adversarial network, IEEE Trans. Netw. Serv. Manag., 18, 1899-1912 (2021)
[30] Ouyang, T.; Shen, X., Online structural clustering based on dbscan extension with granular descriptors, Inf. Sci., 607, 688-704 (2022)
[31] Paparrizos, J.; Gravano, L., K-shape: efficient and accurate clustering of time series, (Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (2015)), 1855-1870
[32] Paparrizos, J.; Gravano, L., Fast and accurate time-series clustering, ACM Trans. Database Syst., 42, 1-49 (2017) · Zbl 1474.62241
[33] Pei, S.; Shen, T.; Wang, X.; Gu, C.; Ning, Z.; Ye, X.; Xiong, N., 3dacn: 3d augmented convolutional network for time series data, Inf. Sci., 513, 17-29 (2020)
[34] Perdices, D.; de Vergara, J. E.L.; Ramos, J., Deep-fda: using functional data analysis and neural networks to characterize network services time series, IEEE Trans. Netw. Serv. Manag., 18, 986-999 (2021)
[35] Rasmussen, C., The infinite Gaussian mixture model, Adv. Neural Inf. Process. Syst., 12, 554-560 (1999)
[36] Ren, H.; Xu, B.; Wang, Y.; Yi, C.; Huang, C.; Kou, X.; Xing, T.; Yang, M.; Tong, J.; Zhang, Q., Time-series anomaly detection service at Microsoft, (Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019)), 3009-3017
[37] Sakoe, H.; Chiba, S.; Waibel, A.; Lee, K., Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process., 26, 43-49 (1978) · Zbl 0371.68035
[38] Tiano, D.; Bonifati, A.; Ng, R., Featts: feature-based time series clustering, (Proceedings of the 2021 International Conference on Management of Data (2021)), 2784-2788
[39] Wang, C.; Wu, K.; Zhou, T.; Yu, G.; Cai, Z., Tsagen: synthetic time series generation for kpi anomaly detection, IEEE Trans. Netw. Serv. Manag., 19, 130-145 (2022)
[40] Wang, J.; Zhao, C.; He, S.; Gu, Y.; Alfarraj, O.; Abugabah, A., Loguad: log unsupervised anomaly detection based on word2vec, Comput. Syst. Sci. Eng., 41, 1207 (2022)
[41] Wang, S.; Long, Y.; Ruby, R.; Fu, X., Clustering and power optimization in mmwave massive mimo-noma systems, Phys. Commun., 49, Article 101469 pp. (2021)
[42] Xiang, L.; Zhao, G.; Li, Q.; Kim, G. J.K.; Alfarraj, O.; Tolba, A., A fast and effective multiple kernel clustering method on incomplete data, Comput. Mater. Continua, 67, 267-284 (2021)
[43] Xie, K.; Peng, C.; Wang, X.; Xie, G.; Wen, J.; Cao, J.; Zhang, D.; Qin, Z., Accurate recovery of Internet traffic data under variable rate measurements, IEEE/ACM Trans. Netw., 26, 1137-1150 (2018)
[44] Yang, J.; Ning, C.; Deb, C.; Zhang, F.; Cheong, D.; Lee, S. E.; Sekhar, C.; Tham, K. W., K-shape clustering algorithm for building energy usage patterns analysis and forecasting model accuracy improvement, Energy Build., 146, 27-37 (2017)
[45] Yu, G.; Cai, Z.; Wang, S.; Chen, H.; Liu, F.; Liu, A., Unsupervised online anomaly detection with parameter adaptation for kpi abrupt changes, IEEE Trans. Netw. Serv. Manag., 17, 1294-1308 (2020)
[46] Zhang, J.; Wang, W.; Lu, C.; Wang, J.; Sangaiah, A. K., Lightweight deep network for traffic sign classification, Ann. Télécommun., 75, 369-379 (2020)
[47] Zhang, S.; Li, D.; Zhong, Z.; Zhu, J.; Liang, M.; Luo, J.; Sun, Y.; Su, Y.; Xia, S.; Hu, Z.; Zhang, Y.; Pei, D.; Sun, J.; Liu, Y., Robust system instance clustering for large-scale web services, (Proceedings of the ACM Web Conference 2022 (2022)), 1785-1796
[48] Zhang, X.; Lin, Q.; Xu, Y.; Qin, S.; Zhang, H.; Qiao, B.; Dang, Y.; Yang, X.; Cheng, Q.; Chintalapati, M.; Wu, Y.; Hsieh, K.; Sui, K.; Meng, X.; Xu, Y.; Zhang, W.; Shen, F.; Zhang, D., Cross-dataset time series anomaly detection for cloud systems, (2019 USENIX Annual Technical Conference (USENIX ATC 19) (2019)), 1063-1076
[49] Zhao, N.; Zhu, J.; Wang, Y.; Ma, M.; Zhang, W.; Liu, D.; Zhang, M.; Pei, D., Automatic and generic periodicity adaptation for kpi anomaly detection, IEEE Trans. Netw. Serv. Manag., 16, 1170-1183 (2019)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.