×

Approximately-isometric diffusion maps. (English) Zbl 1328.68172

Summary: Diffusion Maps (DM), and other kernel methods, are utilized for the analysis of high dimensional datasets. The DM method uses a Markovian diffusion process to model and analyze data. A spectral analysis of the DM kernel yields a map of the data into a low dimensional space, where Euclidean distances between the mapped data points represent the diffusion distances between the corresponding high dimensional data points. Many machine learning methods, which are based on the Euclidean metric, can be applied to the mapped data points in order to take advantage of the diffusion relations between them. However, a significant drawback of the DM is the need to apply spectral decomposition to a kernel matrix, which becomes infeasible for large datasets.
In this paper, we present an efficient approximation of the DM embedding. The presented approximation algorithm produces a dictionary of data points by identifying a small set of informative representatives. Then, based on this dictionary, the entire dataset is efficiently embedded into a low dimensional space. The Euclidean distances in the resulting embedded space approximate the diffusion distances. The properties of the presented embedding and its relation to DM method are analyzed and demonstrated.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H25 Factor analysis and principal components; correspondence analysis

References:

[1] Achlioptas, D.; McSherry, F., Fast computation of low rank matrix approximations, (Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing. Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing, STOC ’01 (2001), ACM), 611-618 · Zbl 1311.94032
[2] Alon, N.; Kahale, N., A spectral technique for coloring random 3-colorable graphs, SIAM J. Comput., 26, 6, 1733-1748 (1997) · Zbl 0884.05042
[3] Aspvall, B.; Gilbert, J. R., Graph coloring using eigenvalue decomposition (1983), Cornell University, Technical report
[4] Aspvall, B.; Gilbert, J. R., Graph coloring using eigenvalue decomposition, SIAM J. Algebr. Discrete Methods, 5, 4, 526-538 (1984) · Zbl 0554.05048
[5] Belkin, M.; Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15, 6, 1373-1396 (2003) · Zbl 1085.68119
[6] Bermanis, A.; Averbuch, A.; Coifman, R. R., Multiscale data sampling and function extension, Appl. Comput. Harmon. Anal., 34, 1, 15-29 (2013) · Zbl 1262.65016
[7] Bermanis, A.; Wolf, G.; Averbuch, A., Cover-based bounds on the numerical rank of gaussian kernels, Appl. Comput. Harmon. Anal., 36, 2, 302-315 (2014) · Zbl 1294.65058
[8] Brin, S.; Page, L., The anatomy of a large-scale hypertextual web search engine, Comput. Netw. ISDN Syst., 30, 1-7, 107-117 (1998)
[9] Bronstein, M. M.; Bronstein, A. M., Shape recognition with spectral distances, IEEE Trans. Pattern Anal. Mach. Intell., 33, 5, 1065-1071 (2011)
[10] Chung, F., Spectral Graph Theory, vol. 92 (May 1997), CBMS-AMS · Zbl 0867.05046
[11] Coifman, R. R.; Lafon, S., Diffusion maps, Appl. Comput. Harmon. Anal., 21, 1, 5-30 (2006) · Zbl 1095.68094
[12] Cullum, J. K.; Willoughby, R. A., Lanczos Algorithms for Large Symmetric Eigenvalue Computations: Volume 1, Theory, Classics Appl. Math., vol. 41 (2002), SIAM · Zbl 1013.65033
[13] David, G., Anomaly detection and classification via diffusion processes in hyper-networks (March 2009), School of Computer Science, Tel Aviv University, PhD thesis
[14] David, G.; Averbuch, A., Hierarchical data organization, clustering and denoising via localized diffusion folders, Appl. Comput. Harmon. Anal., 33, 1, 1-23 (2012) · Zbl 1239.68060
[15] Donoho, D. L.; Grimes, C., Hessian eigenmaps: new locally linear embedding techniques for high dimensional data, Proc. Natl. Acad. Sci. USA, 100, 5591-5596 (May 2003) · Zbl 1130.62337
[16] Felzenszwalb, P. F.; Huttenlocher, D. P., Efficient graph-based image segmentation, Int. J. Comput. Vis., 59, 2, 167-181 (2004) · Zbl 1477.68505
[17] Fowlkes, C.; Belongie, S.; Chung, F.; Malik, J., Spectral grouping using the Nyström method, IEEE Trans. Pattern Anal. Mach. Intell., 26, 2, 214-225 (2004)
[18] Halko, N.; Martinsson, P.-G.; Tropp, J. A., Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53, 2, 217-288 (2011) · Zbl 1269.65043
[19] Jingen, L.; Yang, Y.; Shah, M., Learning semantic visual vocabularies using diffusion distance, (IEEE Conference on Computer Vision and Pattern Recognition, 2009. IEEE Conference on Computer Vision and Pattern Recognition, 2009, CVPR 2009 (2009)), 461-468
[20] Keller, Y.; Coifman, R. R.; Lafon, S.; Zucker, S. W., Audio-visual group recognition using diffusion maps, IEEE Trans. Signal Process., 58, 1, 403-413 (2010) · Zbl 1392.94627
[21] Kruskal, J. B., Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika, 29, 1-27 (1964) · Zbl 0123.36803
[22] Lafon, S., Diffusion maps and geometric harmonics (May 2004), Yale University, PhD thesis
[23] Lafon, S.; Keller, Y.; Coifman, R. R., Data fusion and multicue data matching by diffusion maps, IEEE Trans. Pattern Anal. Mach. Intell., 28, 11, 1784-1797 (2006)
[24] Roweis, S. T.; Saul, L. K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 2323-2326 (December 2000)
[25] Rui, X.; Damelin, S.; Wunsch, D. C., Applications of diffusion maps in gene expression data-based cancer diagnosis analysis, (29th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, 2007. 29th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, 2007, EMBS 2007 (2007)), 4613-4616
[26] Schclar, A.; Averbuch, A.; Rabin, N.; Zheludev, V.; Hochman, K., A diffusion framework for detection of moving vehicles, Digital Signal Process., 20, 1, 111-122 (2010)
[27] Shi, J.; Malik, J., Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 22, 8, 888-905 (2000)
[28] Stewart, G. W.; Sun, J., Matrix Perturbation Theory (1990), Academic Press, INC · Zbl 0706.65013
[29] Talmon, R.; Cohen, I.; Gannot, S., Supervised source localization using diffusion kernels, (2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA (2011)), 245-248
[30] Talmon, R.; Cohen, I.; Gannot, S., Single-channel transient interference suppression with diffusion maps, IEEE Trans. Audio, Speech Language Process., 21, 1-2, 132-144 (2013)
[31] Talmon, R.; Kushnir, D.; Coifman, R. R.; Cohen, I.; Gannot, S., Parametrization of linear systems using diffusion kernels, IEEE Trans. Signal Process., 60, 3, 1159-1173 (2012) · Zbl 1391.93278
[32] Tenenbaum, J. B.; de Silva, V.; Langford, J. C., A global geometric framework for nonlinear dimensionality reduction, Science, 290, 5500, 2319-2323 (2000)
[33] von Luxburg, U., A tutorial on spectral clustering, Stat. Comput., 17, 4, 395-416 (2007)
[34] Yang, G.; Xu, X.; Zhang, J., Manifold alignment via local tangent space alignment, SIAM J. Sci. Comput., 26, 1, 313-338 (2005) · Zbl 1077.65042
[35] Zhang, Z.; Zha, H., Principal manifolds and nonlinear dimension reduction via local tangent space alignment (2002), Department of Computer Science and Engineering, Pennsylvania State University, Technical Report CSE-02-019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.