×

Spectral analysis of weighted Laplacians arising in data clustering. (English) Zbl 07432352

Summary: Graph Laplacians computed from weighted adjacency matrices are widely used to identify geometric structure in data, and clusters in particular; their spectral properties play a central role in a number of unsupervised and semi-supervised learning algorithms. When suitably scaled, graph Laplacians approach limiting continuum operators in the large data limit. Studying these limiting operators, therefore, sheds light on learning algorithms. This paper is devoted to the study of a parameterized family of divergence form elliptic operators that arise as the large data limit of graph Laplacians. The link between a three-parameter family of graph Laplacians and a three-parameter family of differential operators is explained. The spectral properties of these differential operators are analyzed in the situation where the data comprises of two nearly separated clusters, in a sense which is made precise. In particular, we investigate how the spectral gap depends on the three parameters entering the graph Laplacian, and on a parameter measuring the size of the perturbation from the perfectly clustered case. Numerical results are presented which exemplify the analysis and which extend it in the following ways: the computations study situations in which there are two nearly separated clusters, but which violate the assumptions used in our theory; situations in which more than two clusters are present, also going beyond our theory; and situations which demonstrate the relevance of our studies of differential operators for the understanding of finite data problems via the graph Laplacian. The findings provide insight into parameter choices made in learning algorithms which are based on weighted adjacency matrices; they also provide the basis for analysis of the consistency of various unsupervised and semi-supervised learning algorithms, in the large data limit.

MSC:

47A75 Eigenvalue problems for linear operators
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T10 Pattern recognition, speech recognition
35B20 Perturbations in context of PDEs
05C50 Graphs and linear algebra (matrices, eigenvalues, etc.)

Software:

PETSc; FEniCS; SyFi

References:

[1] Adams, R. A.; Fournier, J. J., Sobolev Spaces, vol. 140 (2003), Elsevier · Zbl 1098.46001
[2] Bakry, D.; Gentil, I.; Ledoux, M., Analysis and Geometry of Markov Diffusion Operators, vol. 348 (2013), Springer Science & Business Media: Springer Science & Business Media New York
[3] Balay, S.; Abhyankar, S.; Adams, M. F.; Brown, J.; Brune, P.; Buschelman, K.; Dalcin, L.; Dener, A.; Eijkhout, V.; Gropp, W. D.; Karpeyev, D.; Kaushik, D.; Knepley, M. G.; May, D. A.; McInnes, L. C.; Mills, R. T.; Munson, T.; Rupp, K.; Sanan, P.; Smith, B. F.; Zampini, S.; Zhang, H.; Zhang, H. (2019), Argonne National Laboratory, PETSc users manual. Technical Report ANL-95/11 - Revision 3.11
[4] Belkin, M.; Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15, 6, 1373-1396 (2003) · Zbl 1085.68119
[5] Belkin, M.; Niyogi, P., Convergence of Laplacian eigenmaps, (NIPS (2006))
[6] Belkin, M.; Niyogi, P., Towards a theoretical foundation for Laplacian-based manifold methods, J. Comput. Syst. Sci., 74, 8, 1289-1308 (2008) · Zbl 1157.68056
[7] Berry, T.; Harlim, J., Variable bandwidth diffusion kernels, Appl. Comput. Harmon. Anal., 40, 1, 68-96 (2016) · Zbl 1343.94020
[8] Bertozzi, A. L.; Flenner, A., Diffuse interface models on graphs for classification of high dimensional data, Multiscale Model. Simul., 10, 3, 1090-1118 (2012) · Zbl 1259.68215
[9] Bertozzi, A. L.; Luo, X.; Stuart, A. M.; Zygalakis, K. C., Uncertainty quantification in graph-based classification of high dimensional data, SIAM/ASA J. Uncertain. Quantificat., 6, 2, 568-595 (2018) · Zbl 1394.62083
[10] Bovier, A.; Eckhoff, M.; Gayrard, V.; Klein, M., Metastability in reversible diffusion processes i: sharp asymptotics for capacities and exit times, J. Eur. Math. Soc., 6, 4, 399-424 (2004) · Zbl 1076.82045
[11] Bovier, A.; Gayrard, V.; Klein, M., Metastability in reversible diffusion processes ii: precise asymptotics for small eigenvalues, J. Eur. Math. Soc., 7, 1, 69-99 (2005) · Zbl 1105.82025
[12] Calder, J., The game theoretic p-Laplacian and semi-supervised learning with few labels, Nonlinearity, 32, 1, 301 (2018) · Zbl 1408.35048
[13] Calder, J.; Trillos, N. G., Improved spectral convergence rates for graph Laplacians on ϵ-graphs and k-NN graphs (2019)
[14] Coifman, R. R.; Lafon, S., Diffusion maps, Appl. Comput. Harmon. Anal., 21, 1, 5-30 (2006) · Zbl 1095.68094
[15] de Kergorlay, H.-L.; Higham, D. J., Consistency of anchor-based spectral clustering (2020), arXiv preprint
[16] Deuflhard, P.; Dellnitz, M.; Junge, O.; Schütte, C., Computation of essential molecular dynamics by subdivision techniques, (Computational Molecular Dynamics: Challenges, Methods, Ideas (1999), Springer), 98-115 · Zbl 0966.81067
[17] Deuflhard, P.; Huisinga, W.; Fischer, A.; Schütte, C., Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains, Linear Algebra Appl., 315, 1-3, 39-59 (2000) · Zbl 0963.65008
[18] Dunlop, M. M.; Slepčev, D.; Stuart, A. M.; Thorpe, M., Large data and zero noise limits of graph-based semi-supervised learning algorithms, Appl. Comput. Harmon. Anal. (2019) · Zbl 1442.62768
[19] Evans, L. C., Partial Differential Equations, Graduate Studies in Mathematics, vol. 19 (2010), AMS: AMS Providence, RI · Zbl 1194.35001
[20] García Trillos, N.; Gerlach, M.; Hein, M.; Slepčev, D., Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator, Found. Comput. Math., 20, 4, 827-887 (2020) · Zbl 1447.62141
[21] García Trillos, N.; Hoffmann, F.; Hosseini, B., Geometric structure of graph Laplacian embeddings, J. Mach. Learn. Res., 22, Article 63 pp. (2021), 55 pp. · Zbl 1540.62083
[22] Garcia Trillos, N.; Slepčev, D., Continuum limit of total variation on point clouds, Arch. Ration. Mech. Anal., 220, 1, 193-241 (2016) · Zbl 1336.68215
[23] García Trillos, N.; Slepčev, D., A variational approach to the consistency of spectral clustering, Appl. Comput. Harmon. Anal., 45, 2, 239-281 (2018) · Zbl 1396.49013
[24] Giné, E.; Koltchinskii, V., Empirical graph Laplacian approximation of Laplace-Beltrami operators: large sample results, (High Dimensional Probability (2006), Institute of Mathematical Statistics), 238-259 · Zbl 1124.60030
[25] Hafiene, Y.; Fadili, J. M.; Chesneau, C.; Elmoataz, A., Continuum limit of the nonlocal p-Laplacian evolution problem on random inhomogeneous graphs, ESAIM: Math. Model. Numer. Anal., 54, 2 (2020) · Zbl 1442.65212
[26] Hafiene, Y.; Fadili, J. M.; Elmoataz, A., Continuum limits of nonlocal p-Laplacian variational problems on graphs, SIAM J. Imaging Sci., 12, 4, 1772-1807 (2019) · Zbl 1447.65056
[27] Hoffmann, F.; Hosseini, B.; Ren, Z.; Stuart, A. M., Consistency of semi-supervised learning algorithms on graphs: probit and one-hot methods, J. Mach. Learn. Res., 21, Article 186 pp. (2020), 55 pp. · Zbl 1527.68179
[28] Huisinga, W.; Meyn, S.; Schütte, C., Phase transitions and metastability in Markovian and molecular systems, Ann. Appl. Probab., 14, 1, 419-458 (2004) · Zbl 1041.60026
[29] Kato, T., Perturbation Theory for Linear Operators, Classics in Mathematics (1995), Springer: Springer New York · Zbl 0836.47009
[30] Loftsgaarden, D. O.; Quesenberry, C. P., A nonparametric estimate of a multivariate density function, Ann. Math. Stat., 36, 3, 1049-1051 (1965) · Zbl 0132.38905
[31] Logg, A.; Mardal, K.-A.; Wells, G., Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, Lecture Notes in Computational Science and Engineering, vol. 84 (2012), Springer Science & Business Media · Zbl 1247.65105
[32] McLean, W., Strongly Elliptic Systems and Boundary Integral Equations (2000), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0948.35001
[33] A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic.
[34] Ng, A. Y.; Jordan, M. I.; Weiss, Y., On spectral clustering: analysis and an algorithm, (Advances in Neural Information Processing Systems (2002)), 849-856
[35] Pavliotis, G. A., Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, Texts in Applied Mathematics, vol. 60 (2014), Springer: Springer New York · Zbl 1318.60003
[36] Schiebinger, G.; Wainwright, M. J.; Yu, B., The geometry of kernelized spectral clustering, Ann. Stat., 43, 2, 819-846 (2015) · Zbl 1312.62082
[37] Schütte, C.; Huisinga, W.; Deuflhard, P., Transfer operator approach to conformational dynamics in biomolecular systems, (Bernold, F., Ergodic Theory, Analysis, and Efficient Simulation of Dynamical Systems (2001), Springer: Springer Berlin), 191-223 · Zbl 0996.92012
[38] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 22, 8, 888-905 (Aug. 2000)
[39] Shi, T.; Belkin, M.; Yu, B., Data spectroscopy: eigenspaces of convolution operators and clustering, Ann. Stat., 37, 6B, 3960-3984 (2009) · Zbl 1191.62114
[40] Slepčev, D.; Thorpe, M., Analysis of p-Laplacian regularization in semisupervised learning, SIAM J. Math. Anal., 51, 3, 2085-2120 (2019) · Zbl 1422.49020
[41] Spielmat, D. A.; Teng, S.-H., Spectral partitioning works: planar graphs and finite element meshes, (Proceedings of 37th Conference on Foundations of Computer Science (1996), IEEE), 96-105
[42] Terrell, G. R.; Scott, D. W., Variable kernel density estimation, Ann. Stat., 20, 3, 1236-1265 (1992) · Zbl 0763.62024
[43] von Luxburg, U., A tutorial on spectral clustering, Stat. Comput., 17, 4, 395-416 (2007)
[44] von Luxburg, U.; Belkin, M.; Bousquet, O., Consistency of spectral clustering, Ann. Stat., 36, 2, 555-586 (2008) · Zbl 1133.62045
[45] C.L. Wormell, S. Reich, Spectral convergence of diffusion maps: improved error bounds and an alternative normalisation, 2020. · Zbl 1486.65237
[46] Zelnik-Manor, L.; Perona, P., Self-tuning spectral clustering, (Advances in Neural Information Processing Systems (2005)), 1601-1608
[47] Zhu, X.; Ghahramani, Z.; Lafferty, J. D., Semi-supervised learning using Gaussian fields and harmonic functions, (Proceedings of the 20th International Conference on Machine Learning (2003)), 912-919
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.