×

Multi-view spectral clustering for uncertain objects. (English) Zbl 1479.62042

Summary: In the machine learning and pattern recognition fraternity, uncertain data clustering is an essential job because uncertainty in data makes the clustering process more difficult. Recently, multi-view clustering is gaining more attention towards data miners for certain data because it produces good results compared to grouping based on a single viewpoint. In uncertain data clustering, similarity measure plays an imperative role. However, state-of-the-art similarity measures suffer from several limitations. For example, when two distributions of two uncertain data are heavily overlapped in locations, then Geometric similarity measure alone is not sufficient. On the other hand, similarity measure based on probability distribution is not enough when two uncertain data are not closed to each other or completely separated. In this study, induced kernel distance and Jeffrey-divergence are fused by the degree of overlap concerning each view of a dataset to construct a self-adaptive mixture similarity measure (SAM). The SAM is further used with pairwise co-regularization in multi-view spectral clustering for grouping uncertain data. The proof of convergence of the objective function of the proposed clustering algorithm is also presented in this study. All the experiments are carried out on nine real-world deterministic datasets, three real-life and one synthetic uncertain datasets. Nine real-world deterministic datasets are further converted into uncertain datasets before executing all the clustering algorithms. Experimental results illustrate that the proposed algorithm outperforms nine state-of-the-art methods. The comparison is made using five clustering evaluation metrics. The proposed method is also tested using null hypothesis significance tests.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62M15 Inference from stochastic processes and spectral analysis
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Jiang, B.; Pei, J.; Tao, Y.; Lin, X., Clustering uncertain data based on probability distribution similarity, IEEE Transactions on Knowledge and Data Engineering, 25, 751-763 (2013)
[2] Liu, H.; Zhang, X.; Zhang, X., Possible world based consistency learning model for clustering and classifying uncertain data, Neural Networks, 102, 48-66 (2018) · Zbl 1434.68424
[3] Sharma, K. K.; Seal, A., Modeling uncertain data using monte carlo integration method for clustering, Expert Systems with Applications, 137, 100-116 (2019)
[4] Ma, Z.; Yan, L., A literature overview of fuzzy conceptual data modeling, Journal of Information Science and Engineering, 26, 427-441 (2010)
[5] S.K. Lee, Imprecise and uncertain information in databases: An evidential approach, in: Data Engineering, 1992. Proceedings. Eighth International Conference on, IEEE, pp. 614-621.
[6] A.P. Dempster, Upper and lower probabilities induced by a multivalued mapping, in: Classic Works of the Dempster-Shafer Theory of Belief Functions, Springer, 2008, pp. 57-72.
[7] Sarma, A. D.; Benjelloun, O.; Halevy, A.; Nabar, S.; Widom, J., Representing uncertain data: models, properties, and algorithms, The VLDB Journal-The International Journal on Very Large Data Bases, 18, 989-1019 (2009)
[8] Zhan, K.; Nie, F.; Wang, J.; Yang, Y., Multiview consensus graph clustering, IEEE Transactions on Image Processing, 28, 1261-1270 (2018)
[9] A. Kumar, H. Daumé, A co-training approach for multi-view spectral clustering, in: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 393-400.
[10] A. Kumar, P. Rai, H. Daume, Co-regularized multi-view spectral clustering, in: Advances in Neural Information Processing Systems, pp. 1413-1421.
[11] Y. Li, F. Nie, H. Huang, J. Huang, Large-scale multi-view spectral clustering via bipartite graph, in: Twenty-Ninth AAAI Conference on Artificial Intelligence.
[12] Houthuys, L.; Langone, R.; Suykens, J. A., Multi-view kernel spectral clustering, Information Fusion, 44, 46-56 (2018)
[13] Kanaan-Izquierdo, S.; Ziyatdinov, A.; Perera-Lluna, A., Multiview and multifeature spectral clustering using common eigenvectors, Pattern Recognition Letters, 102, 30-36 (2018)
[14] Zhan, K.; Zhang, C.; Guan, J.; Wang, J., Graph learning for multiview clustering, IEEE Transactions on Cybernetics, 48, 2887-2895 (2017)
[15] A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92-100.
[16] M. Saha, A graph based approach to multiview clustering, in: International Conference on Pattern Recognition and Machine Intelligence, Springer, pp. 128-133.
[17] Lu, C.; Yan, S.; Lin, Z., Convex sparse spectral clustering: Single-view to multi-view, IEEE Transactions on Image Processing, 25, 2833-2843 (2016) · Zbl 1408.94438
[18] H. Gao, F. Nie, X. Li, H. Huang, Multi-view subspace clustering, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 4238-4246.
[19] C. Zhang, Q. Hu, H. Fu, P. Zhu, X. Cao, Latent multi-view subspace clustering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4279-4287.
[20] J. Liu, C. Wang, J. Gao, J. Han, Multi-view clustering via joint nonnegative matrix factorization, in: Proceedings of the 2013 SIAM International Conference on Data Mining, SIAM, pp. 252-260.
[21] G. Chao, S. Sun, J. Bi, A survey on multi-view clustering, arXiv preprint arXiv:1712.06246 (2017).
[22] Li, Y.; Chen, J.; Feng, L., Dealing with uncertainty: A survey of theories and practices, IEEE Transactions on Knowledge and Data Engineering, 25, 2463-2482 (2012)
[23] Fuhr, N., A probabilistic framework for vague queries and imprecise information in databases, in, Proceedings of the 16th International Conference on Very Large Databases, Morgan, 696-707 (1990)
[24] Aggarwal, C. C.; Philip, S. Y., A survey of uncertain data algorithms and applications, IEEE Transactions on Knowledge and Data Engineering, 21, 609-623 (2009)
[25] Pakyuz-Charrier, E.; Lindsay, M.; Ogarko, V.; Giraud, J.; Jessell, M., Monte carlo simulation for uncertainty estimation on structural data in implicit 3-d geological modeling, a guide for disturbance distribution selection and parameterization, Solid Earth, 9, 385-402 (2018)
[26] M. Chau, R. Cheng, B. Kao, J. Ng, Uncertain data mining: An example in clustering location data, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp. 199-204.
[27] S.D. Lee, B. Kao, R. Cheng, Reducing uk-means to k-means, in: Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), IEEE, pp. 483-488.
[28] F. Gullo, G. Ponti, A. Tagarelli, Clustering uncertain data via k-medoids, in: International Conference on Scalable Uncertainty Management, Springer, pp. 229-242. · Zbl 07260356
[29] H.-P. Kriegel, M. Pfeifle, Density-based clustering of uncertain data, in: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 672-677.
[30] H.-P. Kriegel, M. Pfeifle, Hierarchical density-based clustering of uncertain data, in: Fifth IEEE International Conference on Data Mining (ICDM’05), IEEE, pp. 689-692.
[31] K.-T. Liao, C.-M. Liu, An effective clustering mechanism for uncertain data mining using centroid boundary in ukmeans, in: 2016 International Computer Symposium (ICS), IEEE, pp. 300-305.
[32] G. Cormode, A. McGregor, Approximation algorithms for clustering uncertain data, in: Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ACM, pp. 191-200.
[33] F. Gullo, A. Tagarelli, Uncertain centroid based partitional clustering of uncertain data, arXiv preprint arXiv:1203.6401 (2012).
[34] Gullo, F.; Ponti, G.; Tagarelli, A., Minimizing the variance of cluster mixture models for clustering uncertain objects, Statistical Analysis and Data Mining: The ASA Data Science Journal, 6, 116-135 (2013) · Zbl 07260356
[35] R. Rao Kurada, Unsupervised classification of uncertain data objects in spatial databases using computational geometry and indexing techniques, arXiv preprint arXiv:1312.2378 (2013).
[36] A. Erdem, T. İ. GÜNDEM, M-fdbscan: A multicore density-based uncertain data clustering algorithm, Turkish Journal of Electrical Engineering & Computer Sciences 22 (2014) 143-154.
[37] H. Hamdan, G. Govaert, Mixture model clustering of uncertain data, in: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ’05, IEEE, pp. 879-884.
[38] P.B. Volk, F. Rosenthal, M. Hahmann, D. Habich, W. Lehner, Clustering uncertain data with possible worlds, in: 2009 IEEE 25th International Conference on Data Engineering, IEEE, pp. 1625-1632.
[39] A. Züfle, T. Emrich, K.A. Schmid, N. Mamoulis, A. Zimek, M. Renz, Representative clustering of uncertain data, in: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 243-252.
[40] Von Luxburg, U., A tutorial on spectral clustering, Statistics and Computing, 17, 395-416 (2007)
[41] Jiao, L.; Shang, F.; Wang, F.; Liu, Y., Fast semi-supervised clustering with enhanced spectral embedding, Pattern Recognition, 45, 4358-4369 (2012) · Zbl 1248.68407
[42] Kim, T. H.; Lee, K. M.; Lee, S. U., Learning full pairwise affinities for spectral segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1690-1703 (2013)
[43] Blonder, B.; Lamanna, C.; Violle, C.; Enquist, B. J., The n-dimensional hypervolume, Global Ecology and Biogeography, 23, 595-609 (2014)
[44] Shi, J.; Malik, J., Normalized cuts and image segmentation, Departmental Papers (CIS), 107 (2000)
[45] S.Z. Selim, M.A. Ismail, K-means-type algorithms: A generalized convergence theorem and characterization of local optimality, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984, 81-87. · Zbl 0546.62037
[46] Boyd, S.; Boyd, S. P.; Vandenberghe, L., Convex Optimization (2004), Cambridge University Press · Zbl 1058.90049
[47] N.W.S.N.U.D. o. C. Climate Prediction Center, National Centers for Environmental Prediction, Cpc global summary of day/month observations, 1987.
[48] Kottek, M.; Grieser, J.; Beck, C.; Rudolf, B.; Rubel, F., World map of the köppen-geiger climate classification updated, Meteorologische Zeitschrift, 15, 259-263 (2006)
[49] Rodríguez, J.; Medina-Pérez, M. A.; Gutierrez-Rodríguez, A. E.; Monroy, R.; Terashima-Marín, H., Cluster validation using an ensemble of supervised classifiers, Knowledge-Based Systems, 134-144 (2018)
[50] Seal, A.; Karlekar, A.; Krejcar, O.; Gonzalo-Martin, C., Fuzzy c-means clustering using jeffreys-divergence based similarity measure, Applied Soft Computing, 88, Article 106016 pp. (2020)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.