×

Boundary-aware local density-based outlier detection. (English) Zbl 07735628

Summary: Outlier detection is crucial for improving the performance of machine learning algorithms and is particularly vital in data sets possessing a small number of points. While the existing outlier detection methods deliver good results on a certain data set, the results are rather down on some data sets. Besides all these aspects, there is also a need for an algorithm that quickly processes high-dimensional data sets. To satisfy these requirements, we propose an unsupervised local outlier detection method that can draw the neighborhood boundaries of the data points via Chebyshev inequality. The proposed method sets the boundaries of the points through the so-called deviation parameter that correlates to the standard deviation of the data distribution and then detects outliers by quantifying their neighborhood densities. The experimental results on real-world and synthetic data sets show the efficacy of the proposed method in comparison to the state-of-the-art methods. The source code of the proposed algorithm and the data sets are at https://github.com/fatihaydin1/BLDOD.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

LOF
Full Text: DOI

References:

[1] Carreño, A.; Inza, I.; Lozano, J. A., Analyzing rare event, anomaly, novelty and outlier detection terms under the supervised classification framework, Artif. Intell. Rev., 53, 3575-3594 (2020)
[2] Huyan, N.; Quan, D.; Zhang, X.; Liang, X.; Chanussot, J.; Jiao, L., Unsupervised Outlier Detection Using Memory and Contrastive Learning, IEEE Trans. Image Process., 31, 6440-6454 (2022)
[3] Fernández, Á.; Bella, J.; Dorronsoro, J. R., Supervised outlier detection for classification and regression, Neurocomputing., 486, 77-92 (2022)
[4] Daneshpazhouh, A.; Sami, A., Semi-Supervised Outlier Detection with Only Positive and Unlabeled Data Based on Fuzzy Clustering, Int. J. Artif. Intell. Tools., 24, 1550003 (2015)
[5] Chakraborty, B.; Chaterjee, A.; Malakar, S.; Sarkar, R., An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering, Complex Intell. Syst., 8, 3215-3230 (2022)
[6] Thudumu, S.; Branch, P.; Jin, J.; Singh, J., A comprehensive survey of anomaly detection techniques for high dimensional big data, J. Big Data., 7, 42 (2020)
[7] Duraj, A.; Szczepaniak, P. S., Outlier Detection in Data Streams — A Comparative Study of Selected Methods, Procedia Comput. Sci., 192, 2769-2778 (2021)
[8] Al Samara, M.; Bennis, I.; Abouaissa, A.; Lorenz, P., A Survey of Outlier Detection Techniques in IoT: Review and Classification, J. Sens. Actuator, Networks., 11, 4 (2022)
[9] Blázquez-García, A.; Conde, A.; Mori, U.; Lozano, J. A., A Review on Outlier/Anomaly Detection in Time Series Data, ACM Comput. Surv., 54, 1-33 (2022)
[10] Chen, Y.; Zhang, C.; Ma, M.; Liu, Y.; Ding, R.; Li, B.; He, S.; Rajmohan, S.; Lin, Q.; Zhang, D., ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection (2023)
[11] Benjelloun, F. Z.; Lahcen, A. A.; Belfkih, S., Outlier detection techniques for big data streams: focus on cyber security, Int. J. Internet Technol. Secur. Trans., 9, 446 (2019)
[12] M.E. Tschuchnig, M. Gadermayr, Anomaly Detection in Medical Imaging - A Mini Review, in: Data Sci. - Anal. Appl., Springer Fachmedien Wiesbaden, Wiesbaden, 2022: pp. 33-38. https://doi.org/10.1007/978-3-658-36295-9_5.
[13] C. Cui, Y. Wang, S. Bao, Y. Tang, R. Deng, L.W. Remedios, Z. Asad, J.T. Roland, K.S. Lau, Q. Liu, L.A. Coburn, K.T. Wilson, B.A. Landman, Y. Huo, Feasibility of Universal Anomaly Detection without Knowing the Abnormality in Medical Images, (2023). http://arxiv.org/abs/2307.00750.
[14] Wang, M.; Xu, L.; Guo, L.; Detection, A.; of System Logs Based on Natural Language Processing and Deep Learning, in,, 4th Int. Conf. Front. Signal Process, IEEE, 2018, 140-144 (2018)
[15] P. Xing, Y. Sun, Z. Li, Self-Supervised Guided Segmentation Framework for Unsupervised Anomaly Detection, (2022). http://arxiv.org/abs/2209.12440.
[16] P. Xing, Z. Li, Visual Anomaly Detection Via Partition Memory Bank Module and Error Estimation, (2022). http://arxiv.org/abs/2209.12441.
[17] S.S.L. Pereira, J.E.B. Maia, A MIL Approach for Anomaly Detection in Surveillance Videos from Multiple Camera Views, (2023). http://arxiv.org/abs/2307.00562.
[18] A. Singh, M. Weber, M. Lange-Hegermann, Interpretable Anomaly Detection in Cellular Networks by Learning Concepts in Variational Autoencoders, (2023). http://arxiv.org/abs/2306.15938.
[19] Villa-Pérez, M. E.; Álvarez-Carmona, M.Á.; Loyola-González, O.; Medina-Pérez, M. A.; Velazco-Rossell, J. C.; Choo, K.-K.-R., Semi-supervised anomaly detection algorithms: A comparative summary and future research directions, Knowledge-Based Syst., 218, Article 106878 pp. (2021)
[20] U.A. Usmani, A. Happonen, J. Watada, A Review of Unsupervised Machine Learning Frameworks for Anomaly Detection in Industrial Applications, in: K. Arai (Ed.), Intell. Comput., Springer International Publishing, 2022: pp. 158-189. https://doi.org/10.1007/978-3-031-10464-0_11.
[21] Frumosu, F. D.; Kulahci, M., Outliers detection using an iterative strategy for semi-supervised learning, Qual. Reliab. Eng. Int., 35, 1408-1423 (2019)
[22] Lee, K.; Lee, C. H.; Lee, J., Semi-Supervised Anomaly Detection Algorithm Using Probabilistic Labeling (SAD-PL), IEEE Access., 9, 142972-142981 (2021)
[23] Fu, S.; Gao, X.; Li, B.; Xue, B.; Jia, X.; Huang, Z.; Zhang, G.; Huang, X.u., Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models, Neural Process. Lett., 55, 3, 3429-3470 (2023)
[24] van den Oord, A.; Li, Y.; Vinyals, O., Representation Learning with Contrastive Predictive Coding (2018)
[25] Hojjati, H.; Ho, T. K.K.; Armanfard, N., Self-Supervised Anomaly Detection, A Survey and Outlook (2022)
[26] Zhang, K.; Wen, Q.; Zhang, C.; Cai, R.; Jin, M.; Liu, Y.; Zhang, J.; Liang, Y.; Pang, G.; Song, D.; Pan, S., Self-Supervised Learning for Time Series Analysis: Taxonomy (2023), and Prospects: and Prospects Progress
[27] J. Guan, F. Xiao, Y. Liu, Q. Zhu, W. Wang, Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining, (2023). http://arxiv.org/abs/2304.03588.
[28] Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J., LOF: identifying density-based local outliers, ACM SIGMOD Rec., 29, 93-104 (2000)
[29] He, Z.; Xu, X.; Deng, S., Discovering cluster-based local outliers, Pattern Recognit. Lett., 24, 1641-1650 (2003) · Zbl 1048.68084
[30] Navarro, J.; Martín de Diego, I.; Fernández, R. R.; Moguerza, J. M., Triangle-based outlier detection, Pattern Recognit. Lett., 156, 152-159 (2022)
[31] Ray, S.; McEvoy, D. S.; Aaron, S.; Hickman, T.-T.; Wright, A., Using statistical anomaly detection models to find clinical decision support malfunctions, J. Am. Med. Informatics Assoc., 25, 862-871 (2018)
[32] Schölkopf, B.; Platt, J. C.; Shawe-Taylor, J.; Smola, A. J.; Williamson, R. C., Estimating the Support of a High-Dimensional Distribution, Neural Comput., 13, 1443-1471 (2001) · Zbl 1009.62029
[33] Chatterjee, I.; Zhou, M.; Abusorrah, A.; Sedraoui, K.; Alabdulwahab, A., Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews, Entropy., 23, 1645 (2021)
[34] Alghushairy, O.; Alsini, R.; Soule, T.; Ma, X., A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams, Big Data Cogn. Comput., 5, 1 (2020)
[35] Jabbar, A., Local and Global Outlier Detection Algorithms in Unsupervised Approach: A Review, Iraqi J. Electr, Electron. Eng., 17, 1-12 (2021) · doi:10.37917/ijeee.17.1.9
[36] Bandaragoda, T. R.; Ting, K. M.; Albrecht, D.; Liu, F. T.; Zhu, Y.; Wells, J. R., Isolation-based anomaly detection using nearest-neighbor ensembles, Comput. Intell., 34, 968-998 (2018)
[37] J. Tang, Z. Chen, A.W. Fu, D.W. Cheung, Enhancing Effectiveness of Outlier Detections for Low Density Patterns, in: B. Chen, MS., Yu, P.S., Liu (Ed.), Adv. Knowl. Discov. Data Min., Springer, Berlin, Heidelberg, 2002: pp. 535-548. https://doi.org/10.1007/3-540-47887-6_53. · Zbl 1048.68925
[38] H.-P. Kriegel, M. S hubert, A. Zimek, Angle-based outlier detection in high-dimensional data, in: Proceeding 14th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. - KDD 08, ACM Press, New York, New York, USA, 2008: p. 444. https://doi.org/10.1145/1401890.1401946.
[39] A. ur Rehman, S. B. Belhaouari,, Unsupervised outlier detection in multidimensional data, J. Big Data., 8, 80 (2021)
[40] Y.-R. Yeh, Z.-Y. Lee, Y.-J. Lee, Anomaly Detection via Over-Sampling Principal Component Analysis, in: R.J. Nakamatsu, K., Phillips-Wren, G., Jain, L.C., Howlett (Ed.), New Adv. Intell. Decis. Technol., Springer, Berlin, Heidelberg, 2009: pp. 449-458. https://doi.org/10.1007/978-3-642-00909-9_43.
[41] K. Zhang, M. Hutter, H. Jin, A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data, in: T. Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho (Ed.), Lect. Notes Comput. Sci., Springer, Berlin, Heidelberg, 2009: pp. 813-822. https://doi.org/10.1007/978-3-642-01307-2_84.
[42] Yuan, Z.; Zhang, X.; Feng, S., Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures, Expert Syst. Appl., 112, 243-257 (2018)
[43] Yuan, Z.; Chen, H.; Li, T.; Liu, J.; Wang, S., Fuzzy information entropy-based adaptive approach for hybrid feature outlier detection, Fuzzy Sets Syst., 421, 1-28 (2021) · Zbl 1522.62043
[44] Yuan, Z.; Chen, B.; Liu, J.; Chen, H.; Peng, D.; Li, P., Anomaly detection based on weighted fuzzy-rough density, Appl. Soft Comput., 134, Article 109995 pp. (2023)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.