×

Diffusion-based kernel methods on Euclidean metric measure spaces. (English) Zbl 1382.42014

Summary: Diffusion-based kernel methods are commonly used for analyzing massive high dimensional datasets. These methods utilize a non-parametric approach to represent the data by using an affinity kernel that represents similarities, distances or correlations between data points. The kernel is based on a Markovian diffusion process, whose transition probabilities are determined by local distances between data points. Spectral analysis of this kernel provides a representation of the data, where Euclidean distances correspond to diffusion distances between data points. When the data lies on a low dimensional manifold, these diffusion distances encompass the geometry of the manifold. In this paper, we present a generalized approach for defining diffusion-based kernels by incorporating measure-based information, which represents the density or distribution of the data, together with its local distances. The generalized construction does not require an underlying manifold to provide a meaningful kernel interpretation but assumes a more relaxed assumption that the measure and its support are related to a locally low dimensional nature of the analyzed phenomena. This kernel is shown to satisfy the necessary spectral properties that are required in order to provide a low dimensional embedding of the data. The associated diffusion process is analyzed via its infinitesimal generator and the provided embedding is demonstrated in two geometric scenarios.

MSC:

42B37 Harmonic analysis and PDEs
94A08 Image processing (compression, reconstruction, etc.) in information and communication theory
Full Text: DOI

References:

[1] Belkin, M.; Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15, 6, 1373-1396 (2003) · Zbl 1085.68119
[2] Bermanis, A.; Averbuch, A.; Coifman, R. R., Multiscale data sampling and function extension, Appl. Comput. Harmon. Anal., 34, 15-29 (2013) · Zbl 1262.65016
[3] Bermanis, A.; Salhov, M.; Wolf, G.; Averbuch, A., Measure-based diffusion grid construction and high-dimensional data discretization, Appl. Comput. Harmon. Anal., 40, 2, 207-228 (2016) · Zbl 1335.94008
[4] Coifman, R. R.; Hirn, M. J., Bi-stochastic kernels via asymmetric affinity functions, Appl. Comput. Harmon. Anal., 35, 1, 177-180 (2013) · Zbl 1359.62237
[5] Coifman, R. R.; Kevrekidis, I. G.; Lafon, S.; Maggioni, M.; Nadler, B., Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems, (Multiscale Modeling and Simulation (2008)), 842-864 · Zbl 1175.60058
[6] Coifman, R. R.; Lafon, S., Diffusion maps, Appl. Comput. Harmon. Anal., 21, 1, 5-30 (2006) · Zbl 1095.68094
[7] Coifman, R. R.; Lafon, S., Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions, Appl. Comput. Harmon. Anal., 21, 1, 31-52 (2006) · Zbl 1095.68095
[8] Coifman, R. R.; Lafon, S.; Lee, A. B.; Maggioni, M.; Nadler, B.; Warner, F.; Zucker, S. W., Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps, Proc. Natl. Acad. Sci. USA, 102, 21, 7426-7431 (2005) · Zbl 1405.42043
[9] Cox, T.; Cox, M., Multidimensional Scaling (1994), Chapman and Hall: Chapman and Hall London, UK · Zbl 0853.62047
[10] David, G., Anomaly detection and classification via diffusion processes in hyper-networks (March 2009), School of Computer Science, Tel Aviv University, PhD thesis
[11] David, G.; Averbuch, A., Hierarchical data organization, clustering and denoising via localized diffusion folders, Appl. Comput. Harmon. Anal., 33, 1, 1-23 (2011) · Zbl 1239.68060
[12] Donoho, D. L.; Grimes, C., Hessian eigenmaps: new locally linear embedding techniques for high dimensional data, Proc. Natl. Acad. Sci. USA, 100, 5591-5596 (May 2003) · Zbl 1130.62337
[13] Engel, Y.; Mannor, S.; Meir, R., The kernel recursive least-squares algorithm, IEEE Trans. Signal Process., 52, 8, 2275-2285 (2004) · Zbl 1369.68280
[14] Haddad, A.; Kushnir, D.; Coifman, R. R., Filtering via a reference set (2011), Yale University: Yale University New Haven, CT, Tech. rep. YALEU/DCS/TR-1441
[15] Haddad, A.; Kushnir, D.; Coifman, R. R., Texture separation via a reference set, Appl. Comput. Harmon. Anal., 36, 2, 335-347 (2014) · Zbl 1357.94021
[16] Hotelling, H., Analysis of a complex of statistical variables into principal components, J. Educ. Psychol., 24 (1933) · JFM 59.1182.04
[17] Jolliffe, I. T., Principal Component Analysis (1986), Springer: Springer New York, NY · Zbl 1011.62064
[18] Kruskal, J. B., Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika, 29, 1-27 (1964) · Zbl 0123.36803
[19] Kushnir, D.; Haddad, A.; Coifman, R. R., Anisotropic diffusion on sub-manifolds with application to earth structure classification, Appl. Comput. Harmon. Anal., 32, 2, 280-294 (2012) · Zbl 1316.62067
[20] Lafon, S., Diffusion maps and geometric harmonics (May 2004), Yale University, PhD thesis
[21] Lian, W.; Talmon, R.; Zaveri, H.; Carin, L.; Coifman, R. R., Multivariate time-series analysis and diffusion maps, Signal Process., 116, 13-28 (2015)
[22] Nadler, B.; Lafon, S.; Coifman, R. R.; Kevrekidis, I. G., Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators, (Weiss, Y.; Schölkopf, B.; Platt, J., Advances in Neural Information Processing Systems, vol. 18 (2006), MIT Press: MIT Press Cambridge, MA), 955-962
[23] Nadler, B.; Lafon, S.; Coifman, R. R.; Kevrekidis, I. G., Diffusion maps, spectral clustering and reaction coordinates of dynamical systems, Appl. Comput. Harmon. Anal., 21, 1, 113-127 (2006) · Zbl 1103.60069
[24] Rabin, N.; Coifman, R. R., Heterogeneous datasets representation and learning using diffusion maps and Laplacian pyramids, (SDM (2012)), 189-199
[25] Roweis, S. T.; Saul, L. K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 2323-2326 (December 2000)
[26] Salhov, M.; Bermanis, A.; Wolf, G.; Averbuch, A., Approximately-isometric diffusion maps, Appl. Comput. Harmon. Anal., 38, 3, 399-419 (2015) · Zbl 1328.68172
[27] Talmon, R.; Cohen, I.; Gannot, S.; Coifman, R. R., Supervised graph-based processing for sequential transient interference suppression, IEEE Trans. Audio, Speech Language Process., 20, 9, 2528-2538 (2012)
[28] Tenenbaum, J. B.; de Silva, V.; Langford, J. C., A global geometric framework for nonlinear dimensionality reduction, Science, 290, 5500, 2319-2323 (2000)
[30] Zass, R.; Shashua, A., A unifying approach to hard and probabilistic clustering, (Tenth IEEE International Conference on Computer Vision, vol. 1. Tenth IEEE International Conference on Computer Vision, vol. 1, ICCV 2005 (2005), IEEE), 294-301
[31] Zhang, Z.; Zha, H., Principal manifolds and nonlinear dimension reduction via local tangent space alignment (2002), Department of Computer Science and Engineering, Pennsylvania State University, Technical report CSE-02-019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.