Document Zbl 07875510

Cherapanamjeri, Yeshwanth; Nelson, Jelani

Terminal embeddings in sublinear time. (English) Zbl 07875510

TheoretiCS 3, Paper No. 6, 52 p. (2024).

Summary: Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a terminal embedding from one metric space \((X,d_X)\) to another \((Y,d_Y)\) with a set of designated terminals \(T\subset X\). Such an embedding \(f\) is said to have distortion \(\rho\geqslant 1\) if \(\rho\) is the smallest value such that there exists a constant \(C>0\) satisfying \[ \forall x\in T\forall q\in X,\, Cd_X(x,q)\leqslant d_Y(f(x),f(q))\leqslant C\rho d_X(x, q). \] When \(X,Y\) are both Euclidean metrics with \(Y\) being \(m\)-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion \(1+\varepsilon\) is achievable via such a terminal embedding with \(m=O(\varepsilon^{-2}\log n)\) for \(n:=|T|\). This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within \(T\) and not to \(T\) from the rest of space. The downside of prior work is that evaluating their embedding on some \(q\in\mathbb{R}^d\) required solving a semidefinite program with \(\Theta(n)\) constraints in \(m\) variables and thus required some superlinear \(\mathrm{poly}(n)\) runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process \(T\) to obtain an almost linear-space data structure that supports computing the terminal embedding image of any \(q\in\mathbb{R}^d\) in sublinear time \(O^* (n^{1-\Theta(\varepsilon^2)}+d)\). To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.

MSC:

68-XX

Computer science

Keywords:

dimensionality reduction; terminal embeddings; adaptive data structures; nearest neighbor search; sublinear algorithms

Cite Review PDF

Full Text: DOI arXiv

OA License

References:

[1]	Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117-122, 2008. DOI (8) · doi:10.1145/1327452.1327494
[2]	Alexandr Andoni, Thijs Laarhoven, Ilya P. Razenshteyn, and Erik Waingarten. Optimal hashing-based time-space trade-offs for approximate near neighbors. Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira, January 16-19, pages 47-66. SIAM, 2017. DOI (7-9, 14) · Zbl 1410.68091 · doi:10.1137/1.9781611974782.4
[3]	Dimitri P. Bertsekas. Nonlinear programming. Athena Scientific Optimization and Computation Series. Athena Scientific, Belmont, MA, third edition, 2016., pages xviii+861 (5, 50, 51) · Zbl 1360.90236
[4]	Yeshwanth Cherapanamjeri and Jelani Nelson. On adaptive distance estimation. Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL (5, 12, 13, 30)
[5]	Michael Elkin, Arnold Filtser, and Ofer Neiman. Terminal embeddings. Theor. Comput. Sci. 697:1-36, 2017. DOI (2-4) · Zbl 1378.68128 · doi:10.1016/J.TCS.2017.06.021
[6]	Sariel Har-Peled. A replacement for voronoi diagrams of near linear size. 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA, pages 94-103. IEEE Computer Society, 2001. DOI (11) · doi:10.1109/SFCS.2001.959884
[7]	Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: towards removing the curse of dimensionality. Theory of Computing, 8(1):321-350, 2012. DOI (8, 10, 11, 22, 32, 47, 49) · doi:10.4086/toc.2012.v008a014
[8]	William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space, Conference in modern analysis and probability (New Haven, Conn., 1982). Volume 26, Contemp. Math. Pages 189-206. Amer. Math. Soc., Providence, RI, 1984. DOI (2) · Zbl 0539.46017 · doi:10.1090/conm/026/737400
[9]	Sepideh Mahabadi, Konstantin Makarychev, Yury Makarychev, and Ilya P. Razenshteyn. Nonlinear dimension reduction via outer bi-lipschitz extensions. Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 1088-1101. ACM, 2018. · Zbl 1428.68326
[10]	DOI (2, 4, 15, 16) · Zbl 1428.68326 · doi:10.1145/3188745.3188828
[11]	Shyam Narayanan and Jelani Nelson. Optimal terminal dimensionality reduction in euclidean space. Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 1064-1069. ACM, 2019. DOI (2, 4, 5, 15, 16) · Zbl 1433.68371 · doi:10.1145/3313276.3316307
[12]	A. S. Nemirovsky and D. B. and Yudin. Problem complexity and method efficiency in optimization. · Zbl 0501.90061
[13]	Roman Vershynin. High-dimensional probability, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2018., pages xiv+284. An introduction with applications in data science, With a foreword by Sara van de Geer. DOI (34, 41) · Zbl 1430.60005 · doi:10.1017/9781108231596
[14]	B.1 Ellipsoid Algorithm Preliminaries We recall some basic facts regarding the operation of the Ellipsoid algorithm for convex opti-mization. We will instead use a weaker version where your goal is simply to output a feasible point in a convex set. In what follows, our goal is to find a point in a closed convex set, 𝐾 ⊂ R 𝑑 and we are given a starting 𝑥 and 𝑅 ⩾ 0 such that for all 𝐾 ⊆ B(𝑥, 𝑅). Furthermore, the algorithm assumes access to an oracle O which when given a point 𝑥 ∈ R 𝑑 either: 1. Outputs 𝑣 ≠ 0 such that ∀ 𝑦 ∈ 𝐾, ⟨ 𝑦 -𝑥, 𝑣⟩ ⩾ 0 or 2. Outputs FAIL.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.