Document Zbl 1493.62649

Luo, Hengrui; Patania, Alice; Kim, Jisu; Vejdemo-Johansson, Mikael

Generalized penalty for circular coordinate representation. (English) Zbl 1493.62649

Found. Data Sci. 3, No. 4, 729-767 (2021).

Summary: Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account the roughness of circular coordinates in change-point and high-dimensional applications. To do that, we use a generalized penalty function instead of an \(L_2\) penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analyses to support our claim that circular coordinates with generalized penalty will detect the change in high-dimensional datasets under different sampling schemes while preserving the topological structures.

Cited in 2 Documents

MSC:

62R40	Topological data analysis
55N31	Persistent homology and applications, topological data analysis
68T09	Computational aspects of data analysis and big data

Keywords:

topological data analysis; persistent cohomology; high-dimensional data; nonlinear dimension reduction

Software:

Adam; Ripser; javaPlex; t-SNE; Rtsne; Dionysus; coRanking; dimRed; UMAP; GitHub

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	H. Adams, A. Tausz and M. Vejdemo-Johansson, JavaPlex: A research software package for persistent (co) homology, in International Congress on Mathematical Software, Lecture Notes in Computer Science, 8592, Springer, Berlin, Heidelberg, (2014), 129-136. · Zbl 1402.65186
[2]	C. C. Aggarwal, A. Hinneburg and D. A. Keim, On the surprising behavior of distance metrics in high dimensional space, in International Conference on Database Theory, Lecture Notes in Computer Science, 1973, Springer, Berlin, Heidelberg, (2001), 420-434. · Zbl 1047.68038
[3]	J. Alman and V. Vassilevska Williams, A refined laser method and faster matrix multiplication, in Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, PA, (2021), 522-539. · Zbl 07788370
[4]	M. Basseville and I. V. Nikiforov, Detection of Abrupt Changes: Theory and Application, Prentice Hall Information and System Sciences Series, Prentice Hall, Inc., Englewood Cliffs, NJ, 1993. · Zbl 1407.62012
[5]	U. Bauer, Ripser: Efficient computation of Vietoris-Rips persistence barcodes, J. Appl. Comput. Topol., 5, 391-423 (2021) · Zbl 1476.55012 · doi:10.1007/s41468-021-00071-5
[6]	M. Belkin; P. Niyogi, Laplacian Eigenmaps for dimensionality reduction and data representation, Neural Computation, 15, 1373-1396 (2003) · Zbl 1085.68119 · doi:10.1162/089976603321780317
[7]	E. Berberich and M. Kerber, Exact arrangements on tori and Dupin cyclides, in Proceedings of the 2008 ACM Symposium on Solid and Physical Modeling, ACM, (2008), 59-66.
[8]	T. Caliński; J. Harabasz, A dendrite method for cluster analysis, Comm. Statist., 3, 1-27 (1974) · Zbl 0273.62010 · doi:10.1080/03610927408827101
[9]	E. J. Candès, Mathematics of sparsity (and a few other things), in Proceedings of the International Congress of Mathematicians—Seoul 2014. Vol. 1, Kyung Moon Sa, Seoul, 2014,235-258. · Zbl 1376.65093
[10]	G. Carlsson, Topology and data, Bull. Amer. Math. Soc. (N.S.), 46, 255-308 (2009) · Zbl 1172.62002 · doi:10.1090/S0273-0979-09-01249-X
[11]	D. Coppersmith; S. Winograd, Matrix multiplication via arithmetic progressions, J. Symbolic Comput., 9, 251-280 (1990) · Zbl 0702.65046 · doi:10.1016/S0747-7171(08)80013-2
[12]	D. L. Davies; D. W. Bouldin, A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1, 224-227 (1979) · doi:10.1109/TPAMI.1979.4766909
[13]	R. C. de Amorim; B. Mirkin, Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering, Pattern Recognition, 45, 1061-1075 (2012) · doi:10.1016/j.patcog.2011.08.012
[14]	V. de Silva; D. Morozov; M. Vejdemo-Johansson, Persistent cohomology and circular coordinates, Discrete Comput. Geom., 45, 737-759 (2011) · Zbl 1216.68322 · doi:10.1007/s00454-011-9344-x
[15]	David L. Donoho; Ca rrie Grimes, Hessian eigenmaps: Locally linear em-bedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences, 100, 5591-5596 (2003) · Zbl 1130.62337 · doi:10.1073/pnas.1031596100
[16]	M. Elad, Sparse and Redundant Representations. From Theory to Applications in Signal and Image Processing, Springer, New York, 2010. · Zbl 1211.94001
[17]	B. T. Fasy; F. Lecci; A. Rinaldo; L. Wasserman; S. Balakrishnan; A. Singh, Confidence sets for persistence diagrams, Ann. Statist., 42, 2301-2339 (2014) · Zbl 1310.62059 · doi:10.1214/14-AOS1252
[18]	A. Gupta and R. Bowden, Evaluating dimensionality reduction techniques for visual category recognition using rényi entropy, 19th European Signal Processing Conference, IEEE, Barcelona, Spain, 2011.
[19]	S. Holmes, Personal communication, 2020.
[20]	D. P. Kingma and J. L. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980.
[21]	G. Kraemer; M. Reichstein; M. D. Mahecha, dimRed and coRanking-Unifying dimensionality reduction in R, The R Journal, 10, 342-358 (2018) · doi:10.32614/RJ-2018-039
[22]	J. H. Krijthe, Rtsne: T-Distributed Stochastic Neighbor Embedding Using Barnes-Hut Implementation, 2015., Available from: https://github.com/jkrijthe/Rtsne.
[23]	J. A. Lee; M. Verleysen, Quality assessment of dimensionality reduction: Rank-based criteria, Neurocomputing, 72, 1431-1443 (2009) · doi:10.1016/j.neucom.2008.12.017
[24]	W. Lueks, B. Mokbel, M. Biehl and B. Hammer, How to evaluate dimensionality reduction?-Improving the co-ranking matrix, preprint, arXiv: 1110.3917.
[25]	H. Luo and D. Li, Spherical rotation dimension reduction with geometric loss functions, work in progress.
[26]	H. Luo, S. N. MacEachern and M. Peruggia, Asymptotics of lower dimensional zero-density regions, preprint, arXiv: arXiv: 2006.02568.
[27]	L. McInnes, J. Healy and J. Melville, UMAP: Uniform manifold approximation and projection for dimension reduction, preprint, arXiv: 1802.03426.
[28]	B. Michel, A Statistical Approach to Topological Data Analysis, Ph.D thesis, UPMC Université Paris Ⅵ, 2015.
[29]	N. Milosavljević, D. Morozov and P. Škraba, Zigzag persistent homology in matrix multiplication time, Computational Geometry (SCG’11), ACM, New York, (2011), 216-225. · Zbl 1283.68373
[30]	D. Morozov, Dionysus2: A library for computing persistent homology., Available from: https://github.com/mrzv/dionysus.
[31]	P. Niyogi; S. Smale; S. Weinberger, Finding the homology of submanifolds with high confidence from random samples, Discrete Comput. Geom., 39, 419-441 (2008) · Zbl 1148.68048 · doi:10.1007/s00454-008-9053-2
[32]	N. Otter; M. A. Porter; U. Tillmann; P. Grindrod; H. A. Harrington, A roadmap for the computation of persistent homology, EPJ Data Science, 6, 1-38 (2017) · doi:10.1140/epjds/s13688-017-0109-5
[33]	E. S. Page, Continuous inspection schemes, Biometrika, 41, 100-115 (1954) · Zbl 0056.38002 · doi:10.1093/biomet/41.1-2.100
[34]	L. Polanco and J. A. Perea, Coordinatizing data with lens spaces and persistent cohomology, preprint, arXiv: 1905.00350.
[35]	M. Robinson, Multipath-dominant, pulsed doppler analysis of rotating blades, preprint, arXiv: 1204.4366.
[36]	M. Robinson, Personal communication, 2019.
[37]	E. Ronchetti, The main contributions of robust statistics to statistical science and a new challenge, METRON, 79, 127-135 (2021) · Zbl 1483.62067 · doi:10.1007/s40300-020-00185-3
[38]	A. Tausz and G. Carlsson, Applications of zigzag persistence to topological data analysis, preprint, arXiv: 1108.3545.
[39]	L. van der Maaten and G. Hinton, Visualizing data using t-SNE, J. Mach. Learning Res., 9(2008), 2579-2605. Available from: http://jmlr.org/papers/v9/vandermaaten08a.html. · Zbl 1225.68219
[40]	M. Vejdemo-Johansson, G. Carlsson, P. Y. Lum, A. Lehman, G. Singh and T. Ishkhanov, The topology of politics: Voting connectivity in the US House of Representatives, in NIPS 2012 Workshop on Algebraic Topology and Machine Learning, 2012.
[41]	L. Vendramin; R. J. G. B. Campello; E. R. Hruschka, Relative clustering validity criteria: A comparative overview, Stat. Anal. Data Min., 3, 209-235 (2010) · Zbl 07260245 · doi:10.1002/sam.10080
[42]	R. Vidal; Y. Ma; S. S. Sastry, Generalized principal component analysis (GPCA), IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1945-1959 (2005) · doi:10.1109/TPAMI.2005.244
[43]	B. Wang; B. Summa; V. Pascucci; M. Vejdemo-Johansson, Branching and circular features in high dimensional data, IEEE Transactions on Visualization and Computer Graphics, 17, 1902-1911 (2011) · doi:10.1109/TVCG.2011.177
[44]	E. Zemel, An \(O(n)\) algorithm for the linear multiple choice knapsack problem and related problems, Inform. Process. Lett., 18, 123-128 (1984) · Zbl 0555.90069 · doi:10.1016/0020-0190(84)90014-0
[45]	X. Zhu, Persistent homology: An introduction and a new text representation for natural language processing, in Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, 2013, 1953-1959. Available from: https://www.ijcai.org/Proceedings/13/Papers/288.pdf.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.