×

Clustering ensemble based on sample’s stability. (English) Zbl 1478.68312

Summary: The objective of clustering ensemble is to find the underlying structure of data based on a set of clustering results. It has been observed that the samples can change between clusters in different clustering results. This change shows that samples may have different contributions to the detection of the underlying structure. However, the existing clustering ensemble methods treat all sample equally. To tackle this deficiency, we introduce the stability of a sample to quantify its contribution and present a methodology to determine this stability. We propose two formulas accord with this methodology to calculate sample’s stability. Then, we develop a clustering ensemble algorithm based on the sample’s stability. With either formula, this algorithm divides a data set into two classes: cluster core and cluster halo. With the core and halo, the proposed algorithm then discovers a clear structure using the samples in the cluster core and assigns samples in the cluster halo to the clear structure gradually. The experiments on eight synthetic data sets illustrate how the proposed algorithm works. This algorithm statistically outperforms twelve state-of-the-art clustering ensemble algorithms on ten real data sets from UCI and six document data sets. The experimental analysis on the case of image segmentation shows that cluster cores discovered by the stability are rational.

MSC:

68T09 Computational aspects of data analysis and big data
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence

Software:

UCI-ml
Full Text: DOI

References:

[1] Xu, R.; Wunsch, D., Survey of clustering algorithms, IEEE Trans. Neural Netw., 16, 3, 645-678 (2005)
[2] Vega-Pons, S.; Ruiz-Shulcloper, J., A survey of clustering ensemble algorithms, Int. J. Pattern Recognit. Artif. Intell., 25, 03, 337-372 (2011)
[3] He, Z.; Xu, X.; Deng, S., A cluster ensemble method for clustering categorical data, Inf. Fusion, 6, 2, 143-151 (2005)
[4] Iam-On, N.; Boongeon, T.; Garrett, S.; Price, C., A link-based cluster ensemble approach for categorical data clustering, IEEE Trans. Knowl. Data Eng., 24, 3, 413-425 (2012)
[5] Jing, L.; Tian, K.; Huang, J. Z., Stratified feature sampling method for ensemble clustering of high dimensional data, Pattern Recognit., 48, 11, 3688-3702 (2015)
[6] Yu, Z.; Li, L.; Liu, J.; Zhang, J.; Han, G., Adaptive noise immune cluster ensemble using affinity propagation, IEEE Trans. Knowl. Data Eng., 27, 12, 3176-3189 (2015)
[7] Yang, Y.; Chen, K., Temporal data clustering via weighted clustering ensemble with different representations, IEEE Trans. Knowl. Data Eng., 23, 2, 307-320 (2011)
[8] Elghazel, H.; Aussem, A., Unsupervised feature selection with ensemble learning, Mach. Learn., 98, 1-2, 157-180 (2015) · Zbl 1321.68402
[9] Strehl, A. L.; Ghosh, J., Cluster ensembles — a knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res., 3, 3, 583-617 (2003) · Zbl 1084.68759
[10] Vegapons, S.; Correamorris, J.; Ruizshulcloper, J., Weighted partition consensus via kernels, Pattern Recognit., 43, 8, 2712-2724 (2010) · Zbl 1207.68324
[11] Yu, Z.; Wong, H.; You, J.; Yu, G.; Han, G., Hybrid cluster ensemble framework based on the random combination of data transformation operators, Pattern Recognit., 45, 5, 1826-1837 (2012) · Zbl 1233.68198
[12] Gionis, A.; Mannila, H.; Tsaparas, P., Clustering aggregation, ACM Trans. Knowl. Discov. Data, 1, 1, 4 (2007)
[13] Topchy, A.; Jain, A. K.; Punch, W. F., Clustering ensembles: models of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell., 27, 12, 1866-1881 (2005)
[14] Fred, A. L.; Jain, A. K., Combining multiple clusterings using evidence accumulation, IEEE Trans. Pattern Anal. Mach. Intell., 27, 6, 835-850 (2005)
[15] Hu, J.; Li, T.; Wang, H.; Fujita, H., Hierarchical cluster ensemble model based on knowledge granulation, Knowl.-Based Syst., 91, 179-188 (2016)
[16] Huang, S.; Wang, H.; Li, D.; Yang, Y.; Li, T., Spectral co-clustering ensemble, Knowl.-Based Syst., 84, 46-55 (2015)
[17] Wu, J.; Liu, H.; Xiong, H.; Cao, J.; Chen, J., K-means-based consensus clustering: a unified view, IEEE Trans. Knowl. Data Eng., 27, 1, 155-169 (2015)
[18] Huang, D.; Lai, J.; Wang, C.-D., Ensemble clustering using factor graph, Pattern Recognit., 50, 131-142 (2016) · Zbl 1395.62157
[19] Fern, X. Z.; Brodley, C. E., Solving cluster ensemble problems by bipartite graph partitioning, (Proceedings of the Twenty-First International Conference on Machine Learning (2004), ACM), 36
[20] Claudio, C.; Giovanni, R., Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval, IEEE Trans. Pattern Anal. Mach. Intell., 34, 12, 2315-2326 (2012)
[21] Du, L.; Shen, Y.-D.; Shen, Z.; Wang, J.; Xu, Z., A self-supervised framework for clustering ensemble, (Proceedings of the International Conference on Web-Age Information Management (2013), Springer), 253-264
[22] Huang, D.; Lai, J.-H.; Wang, C.-D., Robust ensemble clustering using probability trajectories, IEEE Trans. Knowl. Data Eng., 28, 5, 1312-1326 (2016)
[23] Lu, Z.; Peng, Y.; Xiao, J., From comparing clusterings to combining clusterings, (Proceedings of the Twenty-Third National Conference on Artificial Intelligence (2008)), 665-670
[24] Singh, V.; Mukherjee, L.; Peng, J.; Xu, J., Ensemble clustering using semidefinite programming with applications, Mach. Learn., 79, 1-2, 177-200 (2010) · Zbl 1470.62096
[25] Rodriguez, A.; Laio, A., Clustering by fast search and find of density peaks, Science, 344, 6191, 1492-1496 (2014)
[26] Jain, B. J., The mean partition theorem in consensus clustering, Pattern Recognit., 79, 427-439 (2018)
[27] Li, F.; Qian, Y.; Wang, J.; Liang, J., Multigranulation information fusion: a Dempster-Shafer evidence theory-based clustering ensemble method, Inf. Sci., 378, 389-409 (2017) · Zbl 1429.68295
[28] Domeniconi, C.; Alrazgan, M., Weighted cluster ensembles: methods and analysis, ACM Trans. Knowl. Discov. Data, 2, 4, 17 (2009)
[29] Fern, X. Z.; Lin, W., Cluster ensemble selection, Stat. Anal. Data Min., 1, 3, 128-141 (2008)
[30] Kuncheva, L. I.; Vetrov, D. P., Evaluation of stability of k-means cluster ensembles with respect to random initialization, IEEE Trans. Pattern Anal. Mach. Intell., 28, 11, 1798-1808 (2006)
[31] Kuncheva, L. I.; Hadjitodorov, S. T., Using diversity in cluster ensembles, (Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 2 (2004), IEEE), 1214-1219
[32] Fischer, B.; Buhmann, J. M., Bagging for path-based clustering, IEEE Trans. Pattern Anal. Mach. Intell., 25, 11, 1411-1415 (2003)
[33] Yang, F.; Li, X.; Li, Q.; Li, T., Exploring the diversity in cluster ensemble generation: random sampling and random projection, Expert Syst. Appl., 41, 10, 4844-4866 (2014)
[34] Ayad, H. G.; Kamel, M. S., Cumulative voting consensus method for partitions with variable number of clusters, IEEE Trans. Pattern Anal. Mach. Intell., 30, 1, 160-173 (2008)
[35] Ayad, H. G.; Kamel, M. S., On voting-based consensus of cluster ensembles, Pattern Recognit., 43, 5, 1943-1953 (2010) · Zbl 1191.68552
[36] Blum, A. L.; Langley, P., Selection of relevant features and examples in machine learning, Artif. Intell., 97, 1, 245-271 (1997) · Zbl 0904.68142
[37] Qian, Y.; Liang, J.; Pedrycz, W.; Dang, C., Positive approximation: an accelerator for attribute reduction in rough set theory, Artif. Intell., 174, 9-10, 597-618 (2010) · Zbl 1205.68310
[38] Fern, X. Z.; Brodley, C. E., Random projection for high dimensional data clustering: a cluster ensemble approach, (Proceedings of the Twentieth International Conference on International Conference on Machine Learning (2003)), 186-193
[39] Hadjitodorov, S. T.; Kuncheva, L. I.; Todorova, L. P., Moderate diversity for better cluster ensembles, Inf. Fusion, 7, 3, 264-275 (2006)
[40] Duarte, F. J.; Fred, A. L.N.; Lourenco, A.; Rodrigues, M. F., Weighting cluster ensembles in evidence accumulation clustering, (Proceedings of the 2005 Portuguese Conference on Artificial Intelligence (2007)), 159-167
[41] Akbari, E.; Dahlan, H. M.; Ibrahim, R.; Alizadeh, H., Hierarchical cluster ensemble selection, Eng. Appl. Artif. Intell., 39, 146-156 (2015)
[42] Jia, J.; Xiao, X.; Liu, B.; Jiao, L., Bagging-based spectral clustering ensemble selection, Pattern Recognit. Lett., 32, 10, 1456-1467 (2011)
[43] Li, F.; Qian, Y.; Wang, J.; Dang, C.; Liu, B., Cluster’s quality evaluation and selective clustering ensemble, ACM Trans. Knowl. Discov. Data, 12, 5, 60 (2018)
[44] Rastin, P.; Kanawati, R., A multiplex-network based approach for clustering ensemble selection, (Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2015), IEEE), 1332-1339
[45] Iam-On, N.; Boongoen, T., Comparative study of matrix refinement approaches for ensemble clustering, Mach. Learn., 98, 1-2, 269-300 (2015) · Zbl 1322.62170
[46] Dimitriadou, E.; Weingessel, A.; Hornik, K., A combination scheme for fuzzy clustering, Int. J. Pattern Recognit. Artif. Intell., 16, 07, 901-912 (2002)
[47] Zhou, Z.-H.; Tang, W., Clusterer ensemble, Knowl.-Based Syst., 19, 1, 77-83 (2006)
[48] Filkov, V.; Skiena, S., Integrating microarray data by consensus clustering, Int. J. Artif. Intell. Tools, 13, 04, 863-880 (2004)
[49] Franek, L.; Jiang, X., Ensemble clustering by means of clustering embedding in vector spaces, Pattern Recognit., 47, 2, 833-842 (2014) · Zbl 1326.62137
[50] Iam-On, N.; Boongoen, T.; Garrett, S.; Price, C., A link-based approach to the cluster ensemble problem, IEEE Trans. Pattern Anal. Mach. Intell., 33, 12, 2396-2409 (2011)
[51] Qian, Y.; Li, F.; Liang, J.; Liu, B.; Dang, C., Space structure and clustering of categorical data, IEEE Trans. Neural Netw. Learn. Syst., 27, 10, 2047-2059 (2016)
[52] Zhong, C.; Yue, X.; Zhang, Z.; Lei, J., A clustering ensemble: two-level-refined co-association matrix with path-based transformation, Pattern Recognit., 48, 8, 2699-2709 (2015) · Zbl 1395.62179
[53] Otsu, N., A threshold selection method from gray-level histograms, Automatica, 11, 285-296, 23-27 (1975)
[54] Sezgin, M.; Sankur, B., Survey over image thresholding techniques and quantitative performance evaluation, J. Electron. Imaging, 13, 1, 146-168 (2004)
[55] Johnson, S. C., Hierarchical clustering schemes, Psychometrika, 32, 3, 241-254 (1967) · Zbl 1367.62191
[56] Lichman, M., UCI machine learning repository (2013)
[57] Steinbach, M.; Karypis, G.; Kumar, V., A comparison of document clustering techniques, (Proceedings of the World Text Mining Conference Workshop. Proceedings of the World Text Mining Conference Workshop, Boston (2000)), 525-526
[58] Chan, T. F.; Vese, L. A., Active contours without edges, IEEE Trans. Image Process., 10, 2, 266-277 (2001) · Zbl 1039.68779
[59] Osher, S.; Sethian, J. A., Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations, J. Comput. Phys., 79, 1, 12-49 (1988) · Zbl 0659.65132
[60] Arbelaez, P.; Maire, M.; Fowlkes, C. C.; Malik, J., Contour detection and hierarchical image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 33, 5, 898-916 (2011)
[61] Ultsch, A., Clustering with SOM: U*C, (Proceedings of the 5th Workshop on Self-Organizing Maps, vol. 2 (2005)), 75-82
[62] Fu, L.; Medico, E., FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data, BMC Bioinform., 8, 1, 3 (2007)
[63] Jain, A. K.; Law, M. H., Data clustering: a user’s dilemma, (Proceedings of the International Conference on Pattern Recognition and Machine Intelligence (2005), Springer), 1-10
[64] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 22, 8, 888-905 (2000)
[65] Yang, Y., An evaluation of statistical approaches to text categorization, Inf. Retr., 1, 1-2, 69-90 (1999)
[66] Demšar, J., Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1-30 (2006) · Zbl 1222.68184
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.