×

Fast semi-supervised clustering with enhanced spectral embedding. (English) Zbl 1248.68407

Summary: In recent years, semi-supervised clustering (SSC) has aroused considerable interests from the machine learning and data mining communities. In this paper we propose a novel SSC approach with enhanced spectral embedding (ESE), which not only considers the geometric structure information contained in data sets, but also can make use of the given side information such as pairwise constraints. Specially, we first construct a symmetry-favored \(k\)-NN graph, which is highly robust to noise and outliers, and can reflect the underlying manifold structures of data sets. Then we learn the enhanced spectral embedding towards an ideal data representation as consistent with the given pairwise constraints as possible. Finally, by using the regularization of spectral embedding we formulate learning the new data representation as a semidefinite-quadratic-linear programming (SQLP) problem, which can be efficiently solved. Experimental results on a variety of synthetic and real-world data sets show that our ESE approach outperforms the state-of-the-art SSC algorithms in terms of speed and quality on both vector-based and graph-based clustering.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

COIL-20; MNIST; SDPT3
Full Text: DOI

References:

[1] Chapelle, O.; Schölkopf, B.; Zien, A., Semi-Supervised Learning (2006), The MIT Press: The MIT Press Cambridge, MA
[2] X. Zhu, Semi-supervised learning literature survey, Technical Report, Computer Sciences, University of Wisconsin-Madison, 2007.; X. Zhu, Semi-supervised learning literature survey, Technical Report, Computer Sciences, University of Wisconsin-Madison, 2007.
[3] K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained k-means clustering with background knowledge, in: Proceedings of the 18th International Conference on Machine Learning (ICML), 2001, pp. 577-584.; K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained k-means clustering with background knowledge, in: Proceedings of the 18th International Conference on Machine Learning (ICML), 2001, pp. 577-584.
[4] D. Klein, S. Kamvar, C. Manning, From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering, in: Proceedings of the 19th International Conference on Machine Learning (ICML), 2002, pp. 307-314.; D. Klein, S. Kamvar, C. Manning, From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering, in: Proceedings of the 19th International Conference on Machine Learning (ICML), 2002, pp. 307-314.
[5] Kulis, B.; Basu, S.; Dhillon, I.; Mooney, R., Semi-supervised graph clustering: a kernel approach, Machine Learning, 74, 1, 1-22 (2009) · Zbl 1472.68144
[6] Xing, E.; Ng, A.; Jordan, M.; Russell, S., Distance metric learning, with application to clustering with side-information, Advances in Neural Information Processing Systems (NIPS), 15, 505-512 (2003)
[7] S. Kamvar, D. Klein, C. Manning, Spectral learning, in: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), 2003, pp. 561-566.; S. Kamvar, D. Klein, C. Manning, Spectral learning, in: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), 2003, pp. 561-566.
[8] Globerson, A.; Roweis, S., Metric learning by collapsing classes, Advances in Neural Information Processing Systems (NIPS), 18, 451-458 (2006)
[9] S. Basu, M. Bilenko, R. Mooney, A probabilistic framework for semi-supervised clustering, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2004, pp. 59-68.; S. Basu, M. Bilenko, R. Mooney, A probabilistic framework for semi-supervised clustering, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2004, pp. 59-68.
[10] M. Bilenko, S. Basu, R. Mooney, Integrating constraints and metric learning in semi-supervised clustering, in: Proceedings of the 21st International Conference on Machine Learning (ICML), 2004, pp. 81-88.; M. Bilenko, S. Basu, R. Mooney, Integrating constraints and metric learning in semi-supervised clustering, in: Proceedings of the 21st International Conference on Machine Learning (ICML), 2004, pp. 81-88.
[11] A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall, Learning distance functions using equivalence relations, in: Proceedings of the 20th International Conference on Machine Learning (ICML), 2003, pp. 11-18.; A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall, Learning distance functions using equivalence relations, in: Proceedings of the 20th International Conference on Machine Learning (ICML), 2003, pp. 11-18. · Zbl 1222.68140
[12] S. Hoi, R. Jin, M. Lyu, Learning nonparametric kernel matrices from pairwise constraints, in: Proceedings of the 24th International Conference on Machine Learning (ICML), 2007, pp. 361-368.; S. Hoi, R. Jin, M. Lyu, Learning nonparametric kernel matrices from pairwise constraints, in: Proceedings of the 24th International Conference on Machine Learning (ICML), 2007, pp. 361-368.
[13] J. Davis, B. Kulis, P. Jain, S. Sra, I. S. Dhillon, Information theoretic metric learning, in: Proceedings of the 24th International Conference on Machine Learning (ICML), 2007, pp. 209-216.; J. Davis, B. Kulis, P. Jain, S. Sra, I. S. Dhillon, Information theoretic metric learning, in: Proceedings of the 24th International Conference on Machine Learning (ICML), 2007, pp. 209-216.
[14] Z. Li, J. Liu, X. Tang, Constrained clustering via spectral regularization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 421-428.; Z. Li, J. Liu, X. Tang, Constrained clustering via spectral regularization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 421-428.
[15] Z. Li, J. Liu, X. Tang, Pairwise constraint propagation by semidefinite programming for semi-supervised classification, in: Proceedings of the 25th International Conference on Machine Learning (ICML), 2008, pp. 576-583.; Z. Li, J. Liu, X. Tang, Pairwise constraint propagation by semidefinite programming for semi-supervised classification, in: Proceedings of the 25th International Conference on Machine Learning (ICML), 2008, pp. 576-583.
[16] K. Wagstaff, C. Cardie, Clustering with instance-level constraints, in: Proceedings of the 17th International Conference on Machine Learning (ICML), 2000, pp. 1103-1110.; K. Wagstaff, C. Cardie, Clustering with instance-level constraints, in: Proceedings of the 17th International Conference on Machine Learning (ICML), 2000, pp. 1103-1110.
[17] X. Wang, I. Davidson, Flexible constrained spectral clustering, in: Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010, pp. 563-572.; X. Wang, I. Davidson, Flexible constrained spectral clustering, in: Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010, pp. 563-572.
[18] Donath, W. E.; Hofmann, A. J., Lower bounds for the partitioning of graphs, IBM Journal of Research and Development, 17, 420-425 (1973) · Zbl 0259.05112
[19] Shang, F.; Jiao, L. C.; Shi, J.; Wang, F.; Gong, M., Fast affinity propagation clustering: a multilevel approach, Pattern recognition, 45, 1, 474-486 (2012)
[20] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE, Transactions on Pattern Analysis and Machine Intelligence, 22, 8, 888-905 (2000)
[21] Hagen, L.; Kahng, A., New spectral methods for ratio cut partitioning and clustering,, IEEE, Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11, 9, 1074-1085 (1992)
[22] C. H. Q. Ding, X. He, H. Zha, M. Gu, H. D. Simon, A min-max cut algorithm for graph partitioning and data clustering, in: Proceedings of the IEEE International Conference on Data Mining (ICDM), 2001, pp. 107-114.; C. H. Q. Ding, X. He, H. Zha, M. Gu, H. D. Simon, A min-max cut algorithm for graph partitioning and data clustering, in: Proceedings of the IEEE International Conference on Data Mining (ICDM), 2001, pp. 107-114.
[23] S. White, P. Smyth, A spectral clustering approach to finding communities in graph, in: Proceedings of the Fifth SIAM International Conference on Data Mining (SDM), 2005, pp. 76-84.; S. White, P. Smyth, A spectral clustering approach to finding communities in graph, in: Proceedings of the Fifth SIAM International Conference on Data Mining (SDM), 2005, pp. 76-84.
[24] Z. Lu, M. Á. Carreira-Perpiñán, Constrained spectral clustering through affinity propagation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 848-855.; Z. Lu, M. Á. Carreira-Perpiñán, Constrained spectral clustering through affinity propagation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 848-855.
[25] F. Wang, C.H.Q. Ding, T. Li, Integrated KL (K-means-Laplacian) clustering: a new clustering approach by combining attribute data and pairwise relations, in: Proceedings of the Fifth SIAM International Conference on Data Mining (SDM), 2009, pp. 38-48.; F. Wang, C.H.Q. Ding, T. Li, Integrated KL (K-means-Laplacian) clustering: a new clustering approach by combining attribute data and pairwise relations, in: Proceedings of the Fifth SIAM International Conference on Data Mining (SDM), 2009, pp. 38-48.
[26] Yu, S. X.; Shi, J., Segmentation given partial grouping constraints, IEEE, Transactions of the Pattern Analysis and Machine Intelligence, 26, 2, 173-183 (2004)
[27] F. Shang, Y. Liu, F. Wang, Learning spectral embedding for semisupervised clustering, in: Proceedings of the IEEE International Conference on Data Mining (ICDM), 2011, pp. 597-606.; F. Shang, Y. Liu, F. Wang, Learning spectral embedding for semisupervised clustering, in: Proceedings of the IEEE International Conference on Data Mining (ICDM), 2011, pp. 597-606.
[28] Zelnik-Manor, L.; Perona, P., Self-tuning spectral clustering, Advances in Neural Information Processing Systems (NIPS), 16, 1601-1608 (2004)
[29] Shang, F.; Jiao, L. C.; Shi, J.; Gong, M.; Shang, R. H., Fast density-weighted low-rank approximation spectral clustering, Data Mining and Knowledge Discovery, 23, 2, 345-378 (2011) · Zbl 1235.68187
[30] W. Liu, S. Chang, Robust multi-class transductive learning with graphs, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 381-388.; W. Liu, S. Chang, Robust multi-class transductive learning with graphs, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 381-388.
[31] Chung, F., Spectral Graph Theory (1997), American Mathematical Society · Zbl 0867.05046
[32] von Luxburg, U., A tutorial on spectral clustering, Statistics and Computing, 17, 4, 395-416 (2007)
[33] J.T. Kwok, I.W. Tsang, Learning with idealized kernels, in: Proceedings of the 20th International Conference on Machine Learning (ICML), 2003, pp. 400-407.; J.T. Kwok, I.W. Tsang, Learning with idealized kernels, in: Proceedings of the 20th International Conference on Machine Learning (ICML), 2003, pp. 400-407.
[34] Wu, X. -M.; So, A.; Li, Z.; Li, S., Fast graph Laplacian regularized kernel learning via semidefinite-quadratic-linear programming, Advances in Neural Information Processing Systems (NIPS), 21, 1964-1972 (2009)
[35] Golub, G. H.; Loan, C. F.V., Matrix Computations (1996), The Johns Hopkins University Press: The Johns Hopkins University Press Baltimore, Maryland: · Zbl 0865.65009
[36] Tütüncü, R. H.; Toh, K. C.; Todd, M. J., Solving semidefinite-quadratic-linear programs using SDPT3, Mathematical Programming, 95, 189-217 (2003) · Zbl 1030.90082
[37] T. Coleman, J. Saunderson, A. Wirth, Spectral clustering with inconsistent advice, in: Proceedings of the 25th International Conference on Machine Learning (ICML), 2008, pp. 152-159.; T. Coleman, J. Saunderson, A. Wirth, Spectral clustering with inconsistent advice, in: Proceedings of the 25th International Conference on Machine Learning (ICML), 2008, pp. 152-159.
[38] LeCun, Y.; Cortes, C., The MNIST Database of Handwritten Digits (2009), 〈http://yann.lecun.com/exdb/mnist/〉
[39] S.A. Nene, S.K. Nayar, J. Murase, Columbia object image library (COIL-20), Technical Report CUCS-005-96, Columbia University, February 1996.; S.A. Nene, S.K. Nayar, J. Murase, Columbia object image library (COIL-20), Technical Report CUCS-005-96, Columbia University, February 1996.
[40] Georghiades, A. S.; Belhumeur, P. N.; Kriegman, D. J., From few to many: illumination cone models for face recognition under variable lighting and pose,, IEEE Transactions on PatternAnalysis and Machine Intelligence, 23, 6, 643-660 (2001)
[41] L. Fei-Fei, R. Fergus, P. Perona, Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Generative Model Based Vision, 2004.; L. Fei-Fei, R. Fergus, P. Perona, Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Generative Model Based Vision, 2004.
[42] Oliva, A.; Torralba, A., Modeling the shape the scene: a holistic representation of the spatial envelope, International Journal of Computer Vision, 42, 3, 145-175 (2001) · Zbl 0990.68601
[43] K. Grauman, T. Darrell, The pyramid match kernel: discriminative classification with sets of image features, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005, pp. 1458-1465.; K. Grauman, T. Darrell, The pyramid match kernel: discriminative classification with sets of image features, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2005, pp. 1458-1465.
[44] Chai, J.; Liu, H.; Bao, Z., Generalized re-weighting local sampling mean discriminant analysis, Pattern recognition, 43, 10, 3422-3432 (2010) · Zbl 1213.68518
[45] Shang, F.; Jiao, L. C.; Wang, F., Graph dual regularization non-negative matrix factorization for co-clustering, Pattern Recognition, 45, 6, 2237-2250 (2012) · Zbl 1234.68356
[46] Xing, M.; Bao, Z.; Pei, B., Properties of high-resolution range profiles, Optical Engineering, 41, 2, 493-504 (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.