×

Measure-based diffusion grid construction and high-dimensional data discretization. (English) Zbl 1335.94008

Summary: The diffusion maps framework is a kernel-based method for manifold learning and data analysis that models a Markovian process over data. Analysis of this process provides meaningful information concerning inner geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel, which incorporates information about the density of the data. Thus, the manifold assumption is replaced by a more general measure assumption.
The measure-based diffusion kernel utilizes two separate independent datasets. The first is the set by which the measure is determined. This measure correlates with a density that represents normal behaviors and patterns in the data. The second set consists of the analyzed data points that are embedded by the metastable states of the underlying diffusion process. This set can either be contiguous or discrete.
In this paper, we present a data discretization methodology for analyzing a contiguous domain. The obtained discretization is achieved by constructing a uniform grid over this domain. This discretization is designed to approximate the continuous measure-based diffusion process by a discrete random walk process. This paper provides a proved criterion to determine the grid resolution that ensures a controllable approximation error for the continuous steady states by the discrete ones. Finally, the presented methodology is demonstrated on analytically generated data.

MSC:

94A08 Image processing (compression, reconstruction, etc.) in information and communication theory
94A12 Signal theory (characterization, reconstruction, filtering, etc.)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C60 Computational problems in statistics (MSC2010)
62-07 Data analysis (statistics) (MSC2010)

Software:

MapReduce; Hadoop
Full Text: DOI

References:

[1] Belkin, M.; Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15, 6, 1373-1396 (2003) · Zbl 1085.68119
[3] Bermanis, A.; Wolf, G.; Averbuch, A., Measure-based diffusion kernel methods, (SampTA 2013: 10th International Conference on Sampling Theory and Applications. SampTA 2013: 10th International Conference on Sampling Theory and Applications, Bremen, Germany (2013))
[4] Coifman, R. R.; Lafon, S., Diffusion maps, Appl. Comput. Harmon. Anal., 21, 1, 5-30 (2006) · Zbl 1095.68094
[5] Cox, T.; Cox, M., Multidimensional Scaling (1994), Chapman and Hall: Chapman and Hall London, UK · Zbl 0853.62047
[6] Dean, Jeffrey; Ghemawat, Sanjay, Mapreduce: simplified data processing on large clusters, Commun. ACM, 51, 1, 107-113 (2008)
[7] Djurdjevac, N.; Sarich, M.; Schütte, C., Estimating the eigenvalue error of Markov state models, Multiscale Model. Simul., 10, 1, 61-81 (2012) · Zbl 1253.65010
[8] Donoho, D. L.; Grimes, C., Hessian eigenmaps: new locally linear embedding techniques for high dimensional data, Proc. Natl. Acad. Sci. USA, 100, 5591-5596 (May 2003) · Zbl 1130.62337
[9] Guillard, H.; Janka, A.; Vaněk, P., Analysis of an Algebraic Petrov-Galerkin Smoothed Aggregation Multigrid Method, Appl. Numer. Math., 58, 12, 1861-1874 (December 2008) · Zbl 1158.65082
[10] Horn, M., Optimal algorithms for global optimization in case of unknown Lipschitz constant, J. Complexity, 22, 1, 50-70 (2006), Special Issue on Algorithms and Complexity for Continuous Problems · Zbl 1094.65057
[11] Horton, G.; Leutenegger, S. T., A multi-level solution algorithm for steady-state Markov chains, (Proceedings of the ACM SIGMETRICS 1994 Conference on Measurement and Modeling of Computer Systems (1994)), 191-200
[12] Huisinga, W., Metastability of Markovian Systems (2001), Freie Universität: Freie Universität Berlin, PhD thesis
[13] Killian, M., Algebraic multigrid for Markov chains and tensor decomposition (2012), University of Waterloo, PhD thesis
[14] Kruskal, J. B., Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika, 29, 1-27 (1964) · Zbl 0123.36803
[15] Lafon, S., Diffusion maps and geometric harmonics (May 2004), Yale University, PhD thesis
[16] Lin, J., Mapreduce is good enough? If all you have is a hammer, throw away everything that’s not a nail! (2012)
[17] Nadler, B.; Lafon, S.; Coifman, R. R.; Kevrekidis, I. G., Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators, (Weiss, Y.; Schölkopf, B.; Platt, J., Advances in Neural Information Processing Systems, vol. 18 (2006), MIT Press: MIT Press Cambridge, MA), 955-962
[18] Nadler, B.; Lafon, S.; Coifman, R. R.; Kevrekidis, I. G., Diffusion maps, spectral clustering and reaction coordinates of dynamical systems, Appl. Comput. Harmon. Anal., 21, 1, 113-127 (2006) · Zbl 1103.60069
[19] Roweis, S. T.; Saul, L. K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 2323-2326 (December 2000)
[20] Sarich, Marco; Noé, Frank; Schütte, Christof, On the approximation quality of Markov state models, Multiscale Model. Simul., 8, 4, 1154-1177 (2010) · Zbl 1210.60082
[21] Tenenbaum, J. B.; de Silva, V.; Langford, J. C., A global geometric framework for nonlinear dimensionality reduction, Science, 290, 5500, 2319-2323 (2000)
[22] Treister, E.; Yavneh, I., On-the-fly adaptive smoothed aggregation multigrid for Markov chains, SIAM J. Sci. Comput., 33, 5, 2927-2949 (2011) · Zbl 1232.65167
[23] Fayyad, U., Big data analytics: applications and opportunities in on-line predictive modeling (2012)
[24] Vaněk, P.; Van Ek, Y.; Janka, A.; Guillard, H., Convergence of algebraic multigrid based on smoothed aggregation ii: Extension to a Petrov-Galerkin method, Computing, 56, 179-196 (1998)
[25] White, T., Hadoop: The Definitive Guide (2009), O’Reilly Media
[26] Yang, G.; Xu, X.; Zhang, J., Manifold alignment via local tangent space alignment, (International Conference on Computer Science and Software Engineering (December 2008))
[27] Zhang, Z.; Zha, H., Principal manifolds and nonlinear dimension reduction via local tangent space alignment (2002), Department of Computer Science and Engineering, Pennsylvania State University, Technical report CSE-02-019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.