×

An incremental density-based clustering framework using fuzzy local clustering. (English) Zbl 1479.62043

Summary: This paper presents a novel incremental density-based clustering framework using the one-pass scheme, named Fuzzy Incremental Density-based Clustering (FIDC). Employing one-pass clustering in which each data point is processed once and discarded, FIDC can process large datasets with less computation time and memory, compared to its density-based clustering counterparts. Fuzzy local clustering is employed in local clusters assignment process to reduce clustering inconsistencies from one-pass clustering. To improve the clustering performance and simplify the parameter choosing process, the modified valley seeking algorithm is used to adaptively determine the outlier thresholds for generating the final clusters. FIDC can operate in both traditional and stream data clustering. The experimental results show that FIDC outperforms state-of-the-art algorithms in both clustering modes.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H86 Multivariate analysis and fuzziness

Software:

KDD Cup; k-means++
Full Text: DOI

References:

[1] Ao Geng, Y.; Li, Q.; Zheng, R.; Zhuang, F.; He, R.; Xiong, N., Recome: a new density-based clustering algorithm using relative knn kernel density, Inf. Sci., 436-437, 13-30 (2018), URL:http://www.sciencedirect.com/science/article/pii/S0020025518300215
[2] D. Arthur, S. Vassilvitskii, K-means++: The advantages of careful seeding, in: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2007. URL:http://dl.acm.org/citation.cfm?id=1283383.1283494. · Zbl 1190.68001
[3] Bezdek, J. C., A convergence theorem for the fuzzy isodata clustering algorithms, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2, 1, 1-8 (1980) · Zbl 0441.62055
[4] Campello, R. J.G. B.; Moulavi, D.; Zimek, A.; Sander, J., Hierarchical density estimates for data clustering, visualization, and outlier detection, ACM Trans. Knowl. Discov. Data, 10, 1, 5:1-5:51 (2015)
[5] F. Cao, M. Ester, W. Qian, A. Zhou, Density-based clustering over an evolving data stream with noise, in: SIAM International Conference on Data Mining, 2004.
[6] Chen, Y.; Tu, L., Density-based clustering for real-time stream data, in, (Proceeding of the 13th ACM SIGKDD International Conference on Knowledge discovery and Data Mining (2007))
[7] Demšar, J., Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1-30 (2006) · Zbl 1222.68184
[8] Du, M.; Ding, S.; Jia, H., Study on density peaks clustering based on k-nearest neighbors and principal component analysis, Knowl.-Based Syst., 99, 135-145 (2016), URL:http://www.sciencedirect.com/science/article/pii/S0950705116000794
[9] Ester, M.; Kriegel, H.; Sander, J.; Wimmer, M., Incremental clustering for mining in a data warehousing environment, in, (Proceedings of the 24th International Conference on Very Large Data Bases (1998))
[10] Ester, M.; Kriegel, H.; Sander, J.; Xu, X., A density-based algorithm for discovering clusters in large spatial databases with noise, in, (2nd Inernational Conference on Knowledge Discovery and Data Mining (1996))
[11] García, S.; Fernández, A.; Luengo, J.; Herrera, F., Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power, Inf. Sci., 180, 10, 2044-2064 (2010), special Issue on Intelligent Distributed Information Systems. URL:http://www.sciencedirect.com/science/article/pii/S0020025509005404
[12] Ghosh, A.; Mishra, N. S.; Ghosh, S., Fuzzy clustering algorithms for unsupervised change detection in remote sensing images, Inf. Sci., 181, 4, 699-715 (2011), URL:http://www.sciencedirect.com/science/article/pii/S0020025510005153
[13] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218 (1985)
[14] Hyde, R.; Angelov, P.; MacKenzie, A., Fully online clustering of evolving data streams into arbitrarily shaped clusters, Inf. Sci., 382-383, 96-114 (2017), URL:http://www.sciencedirect.com/science/article/pii/S0020025516319247
[15] Jain, A., Data clustering: 50 years beyond k-means, Pattern Recogn. Lett., 31, 651-666 (2010)
[16] Karypis, G.; Han, E.-H. S.; Kumar, V., Chameleon: hierarchical clustering using dynamic modeling, Computer, 32, 8, 68-75 (1999)
[17] Laohakiat, S.; Phimoltares, S.; Lursinsap, C., Hyper-cylindrical micro-clustering for streaming data with unscheduled data removals, Knowl.-Based Syst., 99, 183-200 (2016)
[18] Laohakiat, S.; Phimoltares, S.; Lursinsap, C., A clustering algorithm for stream data with lda-based unsupervised localized dimension reduction, Inf. Sci., 381, 104-123 (2017)
[19] Liu, R.; Wang, H.; Yu, X., Shared-nearest-neighbor-based clustering by fast search and find of density peaks, Inform. Sci., 850, C, 200-226 (2018)
[20] Manning, C.; Raghavan, P.; Schutze, H., Introduction to Information Retrieval (2008), Cambridge University Press Inc: Cambridge University Press Inc New York · Zbl 1160.68008
[21] Meng, Y.; Shang, R.; Jiao, L.; Zhang, W.; Yuan, Y.; Yang, S., Feature selection based dual-graph sparse non-negative matrix factorization for local discriminative clustering, Neurocomputing, 290, 87-99 (2018), URL:http://www.sciencedirect.com/science/article/pii/S0925231218301796
[22] J. Nayak, B. Naik, D.H. Behera, Fuzzy C-Means (FCM) Clustering Algorithm: A Decade Review from 2000 to 2014, vol. 32, 2015, pp. 133-149.
[23] Nguyen, X. V.; Bailey, J., Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance, J. Mach. Learn. Res., 11, 2837-2854 (2010) · Zbl 1242.62062
[24] M. Peikari, S. Salama, S. Nofech-Mozes, A. Martel, A cluster-then-label semi-supervised learning approach for pathology image classification, Sci. Rep. 8.
[25] Rand, W., Objective criteria for the evaluation of clustering methods, J. Am. Stat. Assoc., 66, 846-850 (1971)
[26] Rodriguez, A.; Laio, A., Clustering by fast search and find of density peaks, Science, 344, 6191, 1492-1496 (2014), URL:https://science.sciencemag.org/content/344/6191/1492
[27] (Ross, T. J.; Booker, J. M.; Parkinson, W. J., Fuzzy Logic and Probability Applications: Bridging the Gap, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2002)) · Zbl 1005.00012
[28] Shang, R.; Tian, P.; Jiao, L.; Stolkin, R.; Feng, J.; Hou, B.; Zhang, X., A spatial fuzzy clustering algorithm with kernel metric based on immune clone for sar image segmentation, IEEE J. Selected Top. Appl. Earth Observ. Remote Sens., 9, 4, 1640-1652 (2016)
[29] Shang, R.; Wang, W.; Stolkin, R.; Jiao, L., Subspace learning-based graph regularized feature selection, Knowl.-Based Syst., 112, 152-165 (2016), URL:http://www.sciencedirect.com/science/article/pii/S0950705116303227
[30] Shang, R.; Xu, K.; Shang, F.; Jiao, L., Sparse and low-redundant subspace learning-based dual-graph regularized robust feature selection, Knowl.-Based Syst., 187, Article 104830 pp. (2020), URL:http://www.sciencedirect.com/science/article/pii/S0950705119303053
[31] Shang, R.; Zhang, Z.; Jiao, L.; Wang, W.; Yang, S., Global discriminative-based nonnegative spectral clustering, Pattern Recogn., 55, C, 172-182 (2016) · Zbl 1412.62085
[32] M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, A detailed analysis of the kdd cup 99 data set, in: Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA’09, IEEE Press, Piscataway, NJ, USA, 2009.
[33] Theodoridis, S.; Koutroumbas, K., Pattern Recognition, Fourth Edition (2008), Academic Press Inc: Academic Press Inc Orlando, FL, USA
[34] Tian, D.; Gong, M., A novel edge-weight based fuzzy clustering method for change detection in sar images, Inf. Sci., 467, 415-430 (2018), URL: http://www.sciencedirect.com/science/article/pii/S0020025518306200 · Zbl 1441.94025
[35] Xie, J.; Gao, H.; Xie, W.; Liu, X.; Grant, P. W., Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors, Inf. Sci., 354, 19-40 (2016), URL:http://www.sciencedirect.com/science/article/pii/S002002551630158X
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.