Skip to main content
Log in

Stability and Minimax Optimality of Tangential Delaunay Complexes for Manifold Reconstruction

  • Published:
Discrete & Computational Geometry Aims and scope Submit manuscript

Abstract

We consider the problem of optimality in manifold reconstruction. A random sample \(\mathbb {X}_n = \{X_1,\ldots ,X_n\}\subset \mathbb {R}^D\) composed of points close to a d-dimensional submanifold M, with or without outliers drawn in the ambient space, is observed. Based on the tangential Delaunay complex (Discrete Comput Geom 51(1):221–267 2014), we construct an estimator \(\widehat{M}\) that is ambient isotopic and Hausdorff-close to M with high probability. The estimator \(\widehat{M}\) is built from existing algorithms. In a model with additive noise of small amplitude, we show that this estimator is asymptotically minimax optimal for the Hausdorff distance over a class of submanifolds satisfying a reach constraint. Therefore, even with no a priori information on the tangent spaces of M, our estimator based on Tangential Delaunay Complexes is optimal. This shows that the optimal rate of convergence can be achieved through existing algorithms. A similar result is also derived in a model with outliers. A geometric interpolation result is derived, showing that the Tangential Delaunay Complex is stable with respect to noise and perturbations of the tangent spaces. In the process, a decluttering procedure and a tangent space estimator both based on local principal component analysis (PCA) are studied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alexander, S.B., Bishop, R.L.: Gauss equation and injectivity radii for subspaces in spaces of curvature bounded above. Geom. Dedicata 117, 65–84 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  2. Arias-Castro, E., Lerman, G., Zhang, T.: Spectral clustering based on local PCA. J. Mach. Learn. Res. 18(9), 1–57 (2017)

    MathSciNet  MATH  Google Scholar 

  3. Arias-Castro, E., Verzelen, N.: Community detection in dense random networks. Ann. Stat. 42(3), 940–969 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  4. Boissonnat, J.-D., Ghosh, A.: Manifold reconstruction using tangential Delaunay complexes. Discrete Comput. Geom. 51(1), 221–267 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  5. Boissonnat, J.-D., Guibas, L.J., Oudot, S.Y.: Manifold reconstruction in arbitrary dimensions using witness complexes. Discrete Comput. Geom. 42(1), 37–70 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: a survey of some recent advances. ESAIM Probab. Stat. 9, 323–375 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013)

    Book  MATH  Google Scholar 

  8. Bousquet, O.: A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334(6), 495–500 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  9. Buchet, M., Dey, T.K., Wang, J., Wang, Y.: Declutter and resample: towards parameter free denoising. In: Aronov, B., Katz, M.J. (eds.) The 33rd International Symposium on Computational Geometry (SoCG’17). LIPIcs. Leibniz Int. Proc. Inform., vol. 77, Art. No. 23. Schloss Dagstuhl Leibniz-Zentrum für Informatik, Wadern (2017)

  10. do Carmo, M.P.: Riemannian Geometry. Mathematics: Theory & Applications. Birkhäuser, Boston, MA (1992)

  11. Chazal, F., Cohen-Steiner, D., Lieutier, A.: A sampling theory for compact sets in Euclidean space. In: Proceedings of the 22nd Annual Symposium on Computational Geometry (SCG’06), pp. 319–326. ACM, New York (2006)

  12. Chazal, F., Glisse, M., Labruère, C., Michel, B.: Convergence rates for persistence diagram estimation in topological data analysis. J. Mach. Learn. Res. 16, 3603–3635 (2015)

    MathSciNet  MATH  Google Scholar 

  13. Cheng, S.-W., Dey, T.K., Ramos, E.A.: Manifold reconstruction from point samples. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’05), pp. 1018–1027. ACM, New York (2005)

  14. Clarkson, K.L.: Building triangulations using \(\varepsilon \)-nets. In: Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC’06), pp. 326–335. ACM, New York (2006)

  15. Cuevas, A., Rodríguez-Casal, A.: On boundary estimation. Adv. Appl. Probab. 36(2), 340–354 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  16. Davis, C., Kahan, W.M.: The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7, 1–46 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  17. De Marco, G., Gorni, G., Zampieri, G.: Global inversion of functions: an introduction. NoDEA Nonlinear Differ. Equ. Appl. 1(3), 229–248 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  18. Dey, T.L.: Curve and surface reconstruction: algorithms with mathematical analysis. Cambridge Monographs on Applied and Computational Mathematics, vol. 23. Cambridge University Press, Cambridge (2007)

  19. Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41(3), 613–627 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  20. Dümbgen, L., Walther, G.: Rates of convergence for random approximations of convex sets. Adv. Appl. Probab. 28(2), 384–393 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  21. Dyer, R., Vegter, G., Wintraecken, M.: Riemannian simplices and triangulations. In: Arge, L., Pach, J. (eds.) The 31st International Symposium on Computational Geometry (SoCG’15). LIPIcs. Leibniz Int. Proc. Inform. (LIPIcs), vol. 34, pp. 255–269. Schloss Dagstuhl Leibniz-Zentrum für Informatik, Wadern (2015)

  22. Federer, H.: Curvature measures. Trans. Am. Math. Soc. 93(3), 418–491 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  23. Federer, H.: Geometric Measure Theory. Die Grundlehren der Mathematischen Wissenschaften, vol. 153. Springer, New York (1969)

  24. Genovese, C.R., Perone-Pacifico, M., Verdinelli, I., Wasserman, L.: Manifold estimation and singular deconvolution under Hausdorff loss. Ann. Stat. 40(2), 941–963 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. Genovese, C.R., Perone-Pacifico, M., Verdinelli, I., Wasserman, L.: Minimax manifold estimation. J. Mach. Learn. Res. 13, 1263–1291 (2012)

    MathSciNet  MATH  Google Scholar 

  26. Kim, A.K.H., Zhou, H.H.: Tight minimax rates for manifold estimation under Hausdorff loss. Electron. J. Stat. 9(1), 1562–1582 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  27. LeCam, L.: Convergence of estimates under dimensionality restrictions. Ann. Stat. 1(1), 38–53 (1973)

    Article  MathSciNet  Google Scholar 

  28. Maggioni, M., Minsker, S., Strawn, N.: Multiscale dictionary learning: non-asymptotic bounds and robustness. J. Mach. Learn. Res. 17, 1–51 (2016)

    MathSciNet  MATH  Google Scholar 

  29. Mammen, E., Tsybakov, A.B.: Asymptotical minimax recovery of sets with smooth boundaries. Ann. Stat. 23(2), 502–524 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  30. Niyogi, P., Smale, S., Weinberger, S.: Finding the homology of submanifolds with high confidence from random samples. Discrete Comput. Geom. 39(1–3), 419–441 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  31. Sharma, A., Paliwal, K.K.: Fast principal component analysis using fixed-point algorithm. Pattern Recognit. Lett. 28(10), 1151–1155 (2007)

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Jean-Daniel Boissonnat, Frédéric Chazal, Pascal Massart, and Steve Oudot for their insight and the interest they brought to this work. We are also grateful to the reviewers whose comments helped enhancing substantially this paper. This work was supported by ANR project TopData ANR-13-BS01-0008 and by the Advanced Grant of the European Research Council GUDHI (Geometric Understanding in Higher Dimensions). E. Aamari was supported by the Conseil régional d’Île-de-France under a doctoral allowance of its program Réseau de Recherche Doctoral en Mathématiques de l’Île-de-France (RDM-IdF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eddie Aamari.

Additional information

Editor in Charge: Kenneth Clarkson

Appendices

Appendix

A Interpolation Theorem

This section is devoted to prove the interpolation results of Sect. 4.1. For the sake of completeness, let us state a stability result for the reach with respect to \(\mathscr {C}^2\)-diffeomorphisms.

Lemma 7.1

([22, Thm. 4.19]) Let \(A \subset \mathbb {R}^D\) with \(\mathrm {reach}(A) \ge \rho > 0\) and \( \Phi :\mathbb {R}^D \rightarrow \mathbb {R}^D\) be a \(\mathscr {C}^1\)-diffeomorphism such that \( \Phi \),\( \Phi ^{-1}\), and \({d}\Phi \) are Lipschitz with Lipschitz constants K,N and R respectively. Then

$$\begin{aligned} \mathrm {reach}(\Phi (A)) \ge \dfrac{1}{(K\rho ^{-1} + R)N^2}. \end{aligned}$$

Writing \(\phi _\ell (\cdot ) = \phi (\cdot /\ell )\), we recall that \({\psi _j(a)} = (R_j-I_D)(a - \pi (p_j)) + (p_j - \pi (p_j)) \) and

$$\begin{aligned} \Phi (a)&= a + \sum _{j=1}^q \phi _\ell (a-\pi (p_j) ) \psi _j(a). \end{aligned}$$
(1)

Let us denote \(b_1 = \sup _x \left\| {d}_x \phi \right\| \), \(b_2 = \sup _x \Vert {{d}^2_x \phi }\Vert _\mathrm {op}\), and write \(C_1 = 1+b_1\), \(C_2 = b_2+2b_1\). Straightforward computation yields \(C_1 \le 7/2\) and \(C_2 \le 28\).

Proof of Lemma 4.2

First notice that the sum appearing in (1) consists of at most one term. Indeed, since \(\phi \equiv 0\) outside \(\mathscr {B}(0,1)\), if \(\phi _\ell (a-\pi (p_j) ) \ne 0\) for some \(j \in \{1,\ldots ,q\}\), then \(\Vert {a - \pi (p_j)}\Vert \le \ell \). Consequently, for all \(i \ne j\),

$$\begin{aligned} \Vert {a - \pi (p_i)}\Vert&\ge \Vert {p_j - p_i}\Vert - \Vert {p_j - \pi (p_j)}\Vert - \Vert {\pi (p_j) - a}\Vert - \left\| \pi (p_i) - p_i\right\| \\&\ge \delta - \eta - \ell - \eta \\&\ge \delta - 2\ell \ge \ell , \end{aligned}$$

where we used that \( 6 \eta \le \ell \le \delta / 3\). Therefore, \(\phi _\ell (a-\pi (p_i) ) = 0\) for all \(i\ne j\). In other words, if a \(p_j\) actually appears in \(\Phi (a)\) then the others do not.

Global diffeomorphism: As the sum in (1) is at most composed of one term, chain rule yields

$$\begin{aligned} \left\| {d}_a \Phi - {{I_D}}\right\| _\mathrm {op}&= \underset{ 1 \le j \le q }{\max } \left\| {d}_a [ \phi _\ell (a-\pi (p_j) ) \psi _j(a) ]\right\| _\mathrm {op} \\&= \underset{ 1 \le j \le q }{\max } \Big \Vert {\psi _j(a)\, \frac{{d}_b \phi }{\ell }\Big |_{b = \frac{a-\pi (p_j)}{\ell }} + \phi _\ell (a-\pi (p_j) )(R_j-I_D)}\Big \Vert _\mathrm {op} \\&\le (b_1+1)\theta + b_1\frac{\eta }{\ell } < 1, \end{aligned}$$

where the last line follows from \(b_1 \le 5/2\), \(6 \eta \le \ell \) and \(\theta \le \pi /64\). Therefore, \({d}_a \Phi \) is invertible for all \(a\in \mathbb {R}^D\), and \(({d}_a \Phi )^{-1} = \sum _{i=0}^\infty (I_D- {d}_a \Phi )^i\). \( \Phi \) is a local diffeomorphism according to the local inverse function theorem. Moreover, \(\left\| \Phi (a)\right\| \rightarrow \infty \) as \(\left\| a\right\| \rightarrow \infty \), so that \( \Phi \) is a global \(\mathscr {C^\infty }\)-diffeomorphism by the Hadamard–Cacciopoli theorem [17].

Differentials estimates: (i) First order: From the estimates above,

$$\begin{aligned} \left\| {d}_a \Phi \right\| _\mathrm {op} \le \left\| I_D\right\| _\mathrm{op} + \left\| {d}_a \Phi - I_D\right\| _\mathrm{op} \le 1+ (b_1+1)\theta + b_1\frac{\eta }{\ell }. \end{aligned}$$

(ii) Inverse: Write for all \(a \in \mathbb {R}^D\):

$$\begin{aligned} \Vert {{d}_{\Phi (a)} \Phi ^{-1}}\Vert _\mathrm {op}&= \Vert {({d}_a \Phi )^{-1}}\Vert _\mathrm {op} =\Big \Vert { \sum _{i=0}^\infty (I_D- {d}_a \Phi )^i}\Big \Vert _\mathrm {op} \\&\le \frac{1}{1- \left\| {{I_D}} - {d}_a \Phi \right\| _\mathrm {op} } \le \frac{1}{1- (b_1+1)\theta - b_1{\eta }/{\ell } }, \end{aligned}$$

where the first inequality holds since \(\left\| {d}_a \Phi - {{I_D}}\right\| _\mathrm {op} < 1\), and \(\left\| \cdot \right\| _\mathrm {op}\) is sub-multiplicative.

(iii) Second order: Again, since the sum (1) includes at most one term,

$$\begin{aligned} \Vert {{d}^2_a \Phi }\Vert _\mathrm {op}&= \underset{ 1 \le j \le q }{\max } \Vert {{d}^2_a [ \phi _\ell (a-\pi (p_j) ) \psi _j(a) ]}\Vert _\mathrm {op} \\&\le \underset{ 1 \le j \le q }{\max } \biggl \{ \frac{\Vert {{d}^2 \phi }\Vert _\mathrm {op}}{\ell ^2} \,\Vert {\psi _j(a)}\Vert + 2 \,\frac{\Vert {{d}\phi }\Vert _\mathrm {op}}{\ell }\, \Vert {R_j-I_D}\Vert _\mathrm {op} \biggr \} \\&\le b_2 \,\frac{\eta }{\ell ^2} + (b_2 + 2b_1 )\, \frac{\theta }{\ell }. \end{aligned}$$

\(\square \)

Proof of Theorem 4.1

Set \(\ell = \delta /3\) and \(M' = \Phi (M)\).

  • Interpolation: For all j, \(p_j = \Phi (\pi (p_j)) \in M'\) by construction since \(\phi _\ell (0)=1\).

  • Tangent spaces: Since \({d}_x \phi _l|_{x=0} = 0\), for all \(j \in \{1,\ldots ,q \}\), \({d}_a \Phi |_{a=\pi (p_j)} = R_j\). Thus,

    $$\begin{aligned} T_{p_j} {M}'&= T_{ \Phi (\pi (p_j))} \Phi (M) = {d}_a \Phi |_{a=\pi (p_j)} (T_{\pi (p_j)} M )= R_j (T_{\pi (p_j)} M ) = T_j, \end{aligned}$$

    by definition of \(R_j\).

  • Proximity to M: The bound on \(d_H(M,M') = d_H\bigl (M, \Phi (M) \bigr )\) follows from the correspondence

    $$\begin{aligned} \left\| \Phi (a) - a\right\|&\le \sup _{a \in \mathbb {R}^D}\max _{1\le j\le q} \phi _\ell (a-\pi (p_j) ) \left\| \psi _j(a)\right\| \le \ell \theta + \eta \le \delta \theta + \eta . \end{aligned}$$
  • Isotopy: Consider the continuous family of maps

    $$\begin{aligned} \Phi _{(t)}(a) = a + t \sum _{j = 1}^q \phi _\ell (a-\pi (p_j) ) {\psi _j(a)} , \end{aligned}$$

    for \(0 \le t \le 1\). Since \( \Phi _{(t)} - {{I_D}} = t ( \Phi - {{I_D}})\), the arguments above show that \( \Phi _{(t)}\) is a global diffeomorphism of \(\mathbb {R}^D\) for all \(t \in [0,1]\). Moreover, \( \Phi _{(0)} = {{I_D}}\) and \( \Phi _{(1)} = \Phi \). Thus, \(M = \Phi _{(0)}(M)\) and \(M' = \Phi _{(1)}(M)\) are ambient isotopic.

  • Reach lower bound: The differentials estimates of order 1 and 2 of \( \Phi \) translate into estimates on Lipschitz constants of \( \Phi , \Phi ^{-1}\) and \({d}\Phi \). Applying Lemma 7.1 leads to

    $$\begin{aligned} \mathrm {reach}(M')&\ge \dfrac{( 1-C_1 ({\eta }/{\ell } + \theta ) )^2}{({1+ C_1 ({\eta }/{\ell } + \theta )})/{\rho }+ C_2 ({\eta }/{\ell ^2} +{\theta }/{\ell })}\\&= \rho \cdot \dfrac{( 1-C_1 ({\eta }/{\ell } + \theta ) )^2}{1+ C_1 ({\eta }/{\ell } + \theta )+ C_2 ({\eta }/{\ell ^2} +{\theta }/{\ell }) \rho }. \end{aligned}$$

    Now, replace \(\ell \) by its value \(\delta /3\), and write \(c_1 = 3 C_1 \le 21/2 \le 11\) and \(c_2 = 3^2 C_2 \le 252\). We derive

    $$\begin{aligned} \mathrm {reach}(M')&\ge \biggl (1-2c_1 \biggl (\frac{\eta }{\delta } + \theta \biggr ) \biggr ) \biggl ( 1 - c_1 \biggl (\frac{\eta }{\delta } + \theta \biggr ) - c_2 \biggl (\frac{\eta }{\delta ^2} + \frac{\theta }{\delta } \biggr ) \rho \biggr ) \rho \\&\ge \biggl ( 1 - 3 c_1 \biggl (\frac{\eta }{\delta } + \theta \biggr ) - c_2 \biggl (\frac{\eta }{\delta ^2} + \frac{\theta }{\delta } \biggr ) \rho \biggr ) \rho \\&\ge \biggl ( 1 - (3 c_1 +c_2) \biggl (\frac{\eta }{\delta ^2} + \frac{\theta }{\delta } \biggr ) \rho \biggr ) \rho , \end{aligned}$$

    where for the last line we used that \(\delta / \rho \le 1\). The desired lower bound follows taking \(c_0 = 3c_1+ c_2 \le 285\).

\(\square \)

B Some Geometric Properties under Reach Regularity Condition

1.1 B.1 Reach and Projection on the Submanifold

In this section we state intermediate results that connect the reach condition to orthogonal projections onto the tangent spaces. They are based on the following fundamental result.

Proposition 8.1

([22, Thm. 4.18]) For all x and y in M,

$$\begin{aligned} \Vert (y-x)_{\perp }\Vert \le \frac{\Vert y-x\Vert ^2}{2 \rho }, \end{aligned}$$

where \((y-x)_{\perp }\) denotes the projection of \(y-x\) onto \(T_x M^{\perp }\).

From Proposition 8.1 we may deduce the following property about trace of Euclidean balls on M.

Proposition 8.2

Let \(x \in \mathbb {R}^D\) be such that \({d}(x,M) = \Delta \le h \le {\rho }/{8}\), and let y denote \(\pi (x)\). Then,

$$\begin{aligned} \mathscr {B} (y, r_h^- )\cap M \subset \mathscr {B}(x,h) \cap M \subset \mathscr {B} ( y, r_h^+ ) \cap M, \end{aligned}$$

where \(r_h^2 + \Delta ^2 = h^2\), \((r_h^-)^2 = \bigl (1-\frac{ \Delta }{\rho }\bigr )r_h^2\), and \((r_h^+)^2 = \bigl (1+\frac{ 2 \Delta }{\rho }\bigr )r_h^2\).

Proof

Let z be in \(M \cap \mathscr {B}(x,h)\), and denote by \(\delta \) the quantity \(\Vert z-y\Vert \). We may write

$$\begin{aligned} \Vert z-x\Vert ^2 = \delta ^2 + \Delta ^2 +2 \langle z-y,y-x \rangle , \end{aligned}$$
(2)

hence \(\delta ^2 \le h^2 - \Delta ^2 - 2 \langle z-y,y-x \rangle \). Denote, for u in \(\mathbb {R}^D\), by \(u_\perp \) its projection onto \(T_y M^{\perp }\). Since \(\langle z-y,y-x \rangle = \langle (z-y)_{\perp },y-x \rangle \), Proposition 8.1 ensures that

$$\begin{aligned} \delta ^2\biggl ( 1 - \frac{\Delta }{\rho } \biggr ) \le r_h^2. \end{aligned}$$

Since \(\Delta \le h \le \rho /8\), it comes \(\delta ^2 \le (1+2 {\Delta }/{\rho })r_h^2\). On the other hand, (2) and Proposition 8.1 also yield

$$\begin{aligned} \Vert z-x\Vert ^2 \le \delta ^2 \biggl ( 1 + \frac{\Delta }{\rho } \biggr ) + \Delta ^2. \end{aligned}$$

Hence, if \(\delta ^2 \le \big (1- \frac{\Delta }{\rho } \big )r_h^2\), we have

$$\begin{aligned} \Vert z-x\Vert ^2 \le r_h^2 + \Delta ^2 = h^2. \end{aligned}$$

\(\square \)

Also, the following consequence of Proposition 8.1 will be of particular use in the decluttering procedure.

Proposition 8.3

Let h and \(h_k\) be bandwidths satisfying \(h_k^2/\rho \le h \le h_k\). Let x be such that \(d(x,M) \le h/\sqrt{2}\) and \(\pi _{M} (x) = 0\), and let z be such that \(\Vert z-x\Vert \le h\) and \(d(z,M) \le h_k^2/\rho \). Then

$$\begin{aligned} \Vert z_{\perp } \Vert \le \frac{6 h_k^2}{\rho }, \end{aligned}$$

where \(z_{\perp }\) denotes the projection of z onto \(T_0 M^{\perp }\).

Proof

Let y denote \(\pi _M (z)\). The triangle inequality yields \(\Vert y\Vert \le \Vert y-z \Vert + \Vert z-x\Vert + \Vert x\Vert \le h_k^{2}/\rho + (1 + 1/\sqrt{2}) h \le 3 h_k\). Proposition 8.1 ensures that \(\Vert y_{\perp } \Vert \le \Vert y \Vert ^2/(2\rho ) \le (9 h_k^2)/(2\rho )\). Since \(\Vert z_{\perp } \Vert \le \Vert y_{\perp }\Vert + h_k^2/\rho \), we have \(\Vert z_{\perp } \Vert \le 6 h_k^2/\rho \) . \(\square \)

At last, let us prove Lemma 5.4, which gives properties of intersections of ambient slabs with M.

Proof of Lemma 5.4

Set \(k_1=\frac{1}{16(K \vee 1)}\), \(k_2 = \frac{1}{16(K \vee \rho \vee 1)}\), and \(k_3 = k_1 \wedge \frac{\rho k_2}{1+2K} \wedge 1\). For all \(h> 0\), and \(z \in S(x,T,h)\), the triangle inequality yields \(\Vert z-x\Vert \le \Vert \pi _T(z-x)\Vert + \Vert \pi _{T^{\perp }}(z-x)\Vert \le (k_1 + k_2h)h\). Since \(h \le 1\) and \(k_1 + k_2 \le 1/2\), we get \(z \in \mathscr {B}(x,h/2)\).

Now, suppose that \( h/\sqrt{2} \ge {d}(x,M) \ge h^2 / \rho \) and \(\angle (T_{\pi (x)} M, T ) \le K h / \rho \). For short we write \(T_0 = T_{\pi (x)} M\). Let \(z \in S(x,T,h)\), since \(h \le 1\), it comes

$$\begin{aligned} \Vert \pi _{T_0} (z-x) \Vert \le \Vert z-x \Vert \le (k_1 + k_2)h = k'_1 h, \end{aligned}$$

with \(k'_1 = k_1 + k_2\). On the other hand

$$\begin{aligned} \Vert \pi _{T_0^{\perp }} (z-x) \Vert&\le \Vert \pi _{T_0^{\perp }} \pi _{T} (z-x) \Vert + \Vert \pi _{T_0^{\perp }} \pi _{T^{\perp }} (z-x) \Vert \\&\le (Kh/\rho )(k_1 h) + k_2 h^2 = k'_2 h^2, \end{aligned}$$

with \(k'_2 = k_1 K/\rho + k_2\). Hence \(S(x,T,h) \subset S'(x,T_0,h)\), for the constants \(k'_1\) and \(k'_2\) defined above. It remains to prove that \(S'(x,T_0,h) \cap M = \emptyset \). To see this, let \(z \in S'(x,T_0,h)\), and \(y = \pi (x)\). Since \(k'_1 + k'_2 \le 1/4\), we have \(\Vert y-z \Vert \le \Vert y-x \Vert + \Vert x-z \Vert \le h(1/\sqrt{2} + 1/4)\). For the normal part, we may write

$$\begin{aligned} \Vert \pi _{T_0^{\perp }}(z-y) \Vert \ge \Vert \pi _{T_0^{\perp }}(y-x)\Vert - \Vert \pi _{T_0^{\perp }}(x-z) \Vert \ge h^2(1/\rho - k'_2). \end{aligned}$$

Since \(k'_2 \le 1/(8\rho )\), we have \(\Vert \pi _{T_0^{\perp }}(z-y) \Vert > \Vert y-z\Vert ^2 /(2\rho )\), hence Proposition 8.1 ensures that \(z \notin M\).

At last, suppose that \(x \in M\) and \(y \in \mathscr {B}(x,k_3h)\cap M\). Since \(k_3 \le k_1\), we have \(\Vert \pi _{T}(y-x)\Vert \le k_1h\). Next, we may write

$$\begin{aligned} \Vert \pi _{T^{\perp }}(y-x) \Vert \le \Vert \pi _{T^{\perp }} \pi _{T_0}(y-x) \Vert + \Vert \pi _{T^{\perp }}\pi _{T_0^{\perp }}(y-x) \Vert . \end{aligned}$$

Since \(y \in M\), Proposition 8.1 entails \(\Vert \pi _{T_0^{\perp }}(y-x) \Vert \le \Vert y-x\Vert ^2/(2\rho ) \le k_3^2 h^2/(2\rho )\). It comes

$$\begin{aligned} \Vert \pi _{T^{\perp }}(y-x) \Vert \le \frac{h^2}{\rho } \,\biggl ( k_3 K + \frac{k_3^2}{2} \biggr ) \le k_2 h^2. \end{aligned}$$

Hence \(y \in S(x,T,h)\). \(\square \)

1.2 B.2 Reach and Exponential Map

In this section we state results that connect Euclidean and geodesic quantities under reach regularity condition. We start with a result linking reach and principal curvatures.

Proposition 8.4

([30, Prop. 6.1]) For all \(x\in M\), writing \(\textit{II}_x\) for the second fundamental form of M at x, for all unitary \(w\in T_x {M}\), we have \(\left\| \textit{II}_x(w,w)\right\| \le 1/\rho \).

For all \(x \in {M}\) and \(v\in T_x {M}\), let us denote by \(\exp _x(v)\) the exponential map at x of direction v. According to the following proposition, this exponential map turns out to be a diffeomorphism on balls of radius at most \(\pi \rho \).

Proposition 8.5

([1, Corr. 1.4]) The injectivity radius of M is at least \(\pi \rho \).

Denoting by \({d}_M(\cdot ,\cdot )\) the geodesic distance on M, we are in position to connect geodesic and Euclidean distance. In what follows, we fix the constant \(\alpha = 1 + \frac{1}{4 \sqrt{2}}\).

Proposition 8.6

For all \(x,y \in {M}\) such that \(\left\| x-y\right\| \le \rho /4\),

$$\begin{aligned} \left\| x-y\right\| \le {d}_{M}(x,y) \le \alpha \left\| x-y\right\| . \end{aligned}$$

Moreover, writing \(y = \exp _x(r v)\) for \(v\in T_x {M}\) with \(\left\| v\right\| =1\) and \(r \le \rho /4\),

$$\begin{aligned} y = x +rv + R(r,v) \end{aligned}$$

with \(\left\| R(r,v)\right\| \le {r^2}/({2\rho })\).

Proof

The first statement is a direct consequence of Proposition 6.3 in [30]. Let us define \(u(t) =\exp _x(tv) - \exp _x(0) -tv\) and \(w(t) = \exp _x(tv)\) for all \(0\le t \le r\). It is clear that \(u(0) = 0\) and \(u'(0) = 0\). Moreover, \(\Vert {u''(t)}\Vert = \Vert {\textit{II}_{w(t)}(w'(t),w'(t))}\Vert \le 1/\rho \). Therefore, a Taylor expansion at order two gives \(\left\| R(r,v)\right\| = u(r) \le r^2/(2\rho )\). Applying the first statement of the proposition gives \(r\le \alpha \left\| x-y\right\| \). \(\square \)

The next proposition gives bounds on the volume form expressed in polar coordinates in a neighborhood of points of M.

Proposition 8.7

Let \(x \in {M}\) be fixed. Denote by J(rv) the Jacobian of the volume form expressed in polar coordinates around x, for \(r \le {\rho }/{4}\) and v a unit vector in \(T_x {M}\). In other words, if \(y = \exp _x(rv)\), \(d_yV =J(r,v)\,\, dr dv.\) Then

$$\begin{aligned} c_d r^{d-1} \le J(r,v) \le C_d r^{d-1}, \end{aligned}$$

where \(c_d = 2^{-d}\) and \(C_d=2^d\). As a consequence, if \(\mathscr {B}_M(x,r)\) denotes the geodesic ball of radius r centered at x, then, if \(r \le {\rho }/{4}\),

$$\begin{aligned} c'_d r^d \le \mathrm{Vol}(\mathscr {B}_M(x,r)) \le C'_d r^d, \end{aligned}$$

with \(c'_d = c_d V_d\) and \(C'_d = C_d V_d\), where \(V_d\) denotes the volume of the unit d-dimensional Euclidean ball.

Proof

Denoting \(A_{r,v} = {d}_{rv} \exp _x\), the Area Formula [23, Sect. 3.2.5] asserts that \(J(r,v) = r^{d-1} \sqrt{\det ( A_{r,v}^t A_{r,v} )}\). Note that from Proposition 6.1 in [30] together with the Gauss equation [10, p. 130], the sectional curvatures in M are bounded by \(|\kappa | \le 2/\rho ^2\). Therefore, the Rauch theorem [21, Lemma 5] states that

$$\begin{aligned} \biggl ( 1 - \frac{r^2}{3\rho ^2} \biggr ) \left\| w\right\| \le \left\| A_{r,v} w\right\| \le \biggl (1+ \frac{r^2}{\rho ^2} \biggr )\left\| w\right\| , \end{aligned}$$

for all \(w \in T_x {M}\). As a consequence,

$$\begin{aligned} 2^{-d} \le \biggl ( 1 - \frac{r^2}{3\rho ^2} \biggr )^d \le \sqrt{\det ( A_{r,v}^t A_{r,v} )} \le \biggl (1+ \frac{r^2}{\rho ^2} \biggr )^d \le 2^d. \end{aligned}$$

Since \(\mathrm{Vol}(\mathscr {B}_M(x,r))= \int _{s=0}^{r} \int _{v \in {\mathscr {S}}_{d-1}} J(s,v)\, ds dv\), where \({\mathscr {S}}_{d-1}\) denotes the unit \((d-1)\)-dimensional sphere, the bounds on the volume easily follow. \(\square \)

C Some Technical Properties of the Statistical Model

1.1 C.1 Covering and Mass

Lemma 9.1

Let \(Q_0 \in \mathscr {U}_M(f_\mathrm{min},f_\mathrm{max})\). Then for all \(p \in {M}\) and \( r \le \rho /4\),

$$\begin{aligned} Q_0\bigl (\mathscr {B}(p,r)\bigr ) \ge a_{d}f_\mathrm{min} r^d, \end{aligned}$$

where \(a_d > 0\). As a consequence, for n large enough and for all \(Q \in \mathscr {G}_{D,d,f_\mathrm{min},f_\mathrm{max},\rho ,\sigma }\), with probability larger than \(1-({1}/{n})^{2/d}\),

$$\begin{aligned} {d}_{H}(M,\mathbb {X}_n) \le C_{d,f_\mathrm{min}} \biggl ( \frac{\log n}{n} \biggr )^{1/d}+ \sigma . \end{aligned}$$

Similarly, for n large enough and for all \(P \in \mathscr {O}_{D,d,f_\mathrm{min},f_\mathrm{max},\rho ,\beta }\), with probability larger than \(1-({1}/{n})^{2/d}\),

$$\begin{aligned} {d}_{H}(M,\mathbb {X}_n\cap M) \le C_{d,f_\mathrm{min}} \biggl ( \frac{\log n}{\beta n} \biggr )^{1/d}. \end{aligned}$$

Proof

The first statement is a direct corollary of Proposition 8.7, since for all \(r \le \rho /4\),

$$\begin{aligned} Q_0\bigl (\mathscr {B}(p,r)\bigr ) = \int _{\mathscr {B}(p,r)} f \,d \mathscr {H}^d \ge f_\mathrm{min} \mathrm{Vol}(\mathscr {B}(p,r)\cap M) \ge a_d f_\mathrm{min} r^d , \end{aligned}$$

where \(a_d\) can be taken to be equal to \(c'_d\) of Proposition 8.7. Let us now prove the second statement. By definition, sample \(X_i \in \mathbb {X}_n\), that has distribution \(Q \in \mathscr {G}_{D,d,f_\mathrm{min},f_\mathrm{max},\rho ,\sigma }\), can be written as \(X_i = Y_i + Z_i\), with \(Y_i\) having distribution \(Q_0 \in \mathscr {G}_{D,d,f_\mathrm{min},f_\mathrm{max},\rho }\), and \(\left\| Z_i\right\| \le \sigma \). From the previous point, letting \(a = a_d f_\mathrm{min}\), \(Q_0\) fulfils the so-called (ad)-standard assumption of [12] for \(r\le \rho /4\). Looking carefully at the proof of Lemma 10 in [12] shows that its conclusion still holds for measures satisfying the (ad)-standard assumption for small radii only. Therefore, writing \(\mathbb {Y}_n = \{Y_1,\ldots ,Y_n\}\), for \(r \le \rho /8\) we obtain

$$\begin{aligned} \mathbb {P}_{Q_0}( {d}_{H}({M},\mathbb {Y}_n) > r ) \le \frac{4^d}{a r^d} \exp \biggl ( -n \,\frac{a}{2^d}\, r^d \biggr ). \end{aligned}$$

The claim follows using that \(d_H(\mathbb {X}_n,\mathbb {Y}_n) \le \sigma \), and setting \(r = C_{d,f_\mathrm{min}} ( {\log n}/{n})^{1/d}\) with \(C_{d,f_\mathrm{min}}^d \frac{a}{2^{d+1}} \ge 1+ 2/d\).

To prove the last point, notice that for all \(k=0,\ldots ,n\), conditionally on the event \(\lbrace |\mathbb {X}_n \cap {M}|=k\rbrace \), \(\mathbb {X}_n \cap {M}\) has the distribution of a k-sample of \(Q_0\). Therefore,

$$\begin{aligned} \mathbb {P}_P\bigl ( {d}_{H}({M},\mathbb {X}_n \cap {M})> r \,\big |\, |\mathbb {X}_n \cap {M}|=k \bigr )&= \mathbb {P}_{Q_0}( {d}_{H}({M},\mathbb {X}_k \cap {M}) > r )\\&\le \frac{4^d}{a r^d} \exp \biggl ( -k \,\frac{a}{2^d} \,r^d \biggr ). \end{aligned}$$

Hence,

$$\begin{aligned} \mathbb {P}_P( {d}_{H}({M},\mathbb {X}_n \cap {M})> r )&= \sum _{k = 0}^{n} \mathbb {P}_P\bigl ( {d}_{H}({M},\mathbb {X}_n \cap {M})> r \,\big | \, |\mathbb {X}_n \cap {M}|=k \bigr )\mathbb {P}_P(|\mathbb {X}_n \cap {M}|=k ) \\&\le \sum _{k = 0}^{n} \frac{4^d}{a r^d} \exp \biggl ( -k \,\frac{a}{2^d}\, r^d \biggr ) \left( {\begin{array}{c}n\\ k\end{array}}\right) \beta ^k(1-\beta )^{n-k}\\&= \frac{4^d}{a r^d}\, \biggl [ 1 - \beta \biggl ( 1 - \exp \biggl ({-\frac{a}{2^d}\, r^d}\biggr ) \biggr ) \biggr ]^n \\&\le \frac{4^d}{a r^d} \exp \biggl [ - n \beta \biggl ( 1 - \exp \biggl ({-\frac{a}{2^d} \,r^d}\biggr ) \biggr ) \biggr ] \\&\le \frac{4^d}{a r^d} \exp \biggl [ - n \beta \, \frac{a}{2^{d+1}}\,r^d \biggr ], \end{aligned}$$

whenever \(r\le \rho /8\) and \(a r^d \le 2^d\). Taking \(r = C_{d,f_\mathrm{min}}' \bigl ( \frac{\log n}{\beta n}\bigr )^{1/d}\) with \(C_{d,f_\mathrm{min}}'^d \frac{\beta a}{2^{d+1}} \ge 1+ 2/d\) yields the result. \(\square \)

We now focus on proving Lemma 2.2. For its proof, we need the following piece of notation. For all bounded subsets \(K \subset \mathbb {R}^D\) and \(\varepsilon >0\), we let \(\mathrm {cv}_K(\varepsilon )\) denote the Euclidean covering number of K. That is, \(\mathrm {cv}_K(\varepsilon )\) is the minimal number k of Euclidean open balls of radii \(\varepsilon \) and centered at elements of K that are needed to cover K.

Lemma 9.2

Let \(K \subset \mathbb {R}^D\) be a bounded subset. If K is path connected, then for all \(\varepsilon >0\), \(\mathrm {diam}(K) \le 2\varepsilon \mathrm {cv}_K(\varepsilon )\).

Proof

Let \(p,q \in K\) and \(\gamma :[0,1] \rightarrow K\) be a continuous path joining \(\gamma (0) = p\) and \(\gamma (1) = q\). Writing \(N = \mathrm {cv}_K(\varepsilon )\), let \(x_1,\ldots ,x_N \in \mathbb {R}^D\) be the centers of a covering of K by open balls of radii \(\varepsilon \). We let \(U_i\) denote \(\{ t , \left\| \gamma (t) - x_{i}\right\| < \varepsilon \} \subset [0,1]\). By construction of the covering, there exists \(x_{(1)} \in \{x_1,\ldots ,x_N\}\) such that \(\Vert {p - x_{(1)}}\Vert < \varepsilon \). Then \(U_{(1)} \ni \gamma (0) = p\) is a non-empty open subset of [0, 1], so that \(t_{(1)} = \sup U_{(1)}\) is positive. If \(t_{(1)} =1\), then \(\Vert {q-x_{(1)}}\Vert \le \varepsilon \), and in particular \(\left\| q-p\right\| \le 2 \varepsilon \). If \(t_{(1)} < 1\), since \(U_{(1)}\) is an open subset of [0, 1], we see that \(\gamma (t_{(1)}) \notin U_{(1)}\). But \(\bigcup _{i = 1}^N U_i\) is an open cover of [0, 1], which yields the existence of \(U_{(2)}\) such that \(\gamma (t_{(1)}) \in U_{(2)}\), and for all \(t < t_{(1)}\), \(\gamma (t) \notin U_{(2)}\). Then consider \(t_{(2)} = \sup U_{(2)}\), and so on. Doing so, we build by induction a sequence of numbers \(0< t_{(1)}< \cdots < t_{(k)} \le 1\) and distinct centers \(x_{(1)},\ldots ,x_{(k)} \in \{x_1,\ldots ,x_N\}\) (\(k \le N\)) such that \(\Vert {p - x_{(1)}}\Vert < \varepsilon \), \(\Vert {q - x_{(k)}}\Vert \le \varepsilon \), with \(\Vert {\gamma (t_{(i)}) - x_{(i)}}\Vert \le \varepsilon \) for \(1 \le i \le k\) and \(\Vert {\gamma (t_{(i)}) - x_{(i+1)}}\Vert < \varepsilon \) for \(1 \le i \le k-1\). In particular, \(\Vert {x_{(i)} - x_{(i+1)}}\Vert \le 2\varepsilon \) for all \(1\le i \le k-1\). To conclude, write

$$\begin{aligned} \left\| p-q\right\|&\le \Vert {p-x_{(1)}}\Vert + \Vert {x_{(1)} - x_{(k)}}\Vert + \Vert {q - x_{(k)}}\Vert \\&\le \varepsilon + \sum _{i=1}^{k-1} \Vert {x_{(i)} - x_{(i+1)}}\Vert + \varepsilon \le 2k\varepsilon \le 2\varepsilon \mathrm {cv}_K(\varepsilon ). \end{aligned}$$

Since this bound holds for all \(p,q \in K\), we get the announced bound on the diameter of K. \(\square \)

We are now in position to prove Lemma 2.2.

Proof of Lemma 2.2

Let \(\varepsilon \le {\rho }/{4}\), and \(x_1,\ldots ,x_{\mathrm {cv}_M(\varepsilon )}\) be a minimal covering of M. According to Lemma 9.1, for all k,

$$\begin{aligned} Q(\mathscr {B}_M(x_k,\varepsilon )) \ge a_d f_\mathrm{min} \varepsilon ^d \end{aligned}$$

for some \(a_d > 0\). A straightforward packing argument [12, Sect.  B.1] yields that the covering number of the support M of Q satisfies

$$\begin{aligned} \mathrm {cv}_M(\varepsilon ) \le \frac{c_d}{f_\mathrm{min} \varepsilon ^{d}} \end{aligned}$$

for all \(\varepsilon \le {\rho }/{4}\), where \(c_d = 2^d/a_d\). Applying this bound with \(\varepsilon = \rho /4\), together with Lemma 9.2 yields

$$\begin{aligned} \mathrm {diam}(M)&\le 2 \,\frac{\rho }{4}\, \mathrm {cv}_M\biggl ( \frac{\rho }{4} \biggr ) \le \frac{\rho }{2}\, \frac{c_d}{f_\mathrm{min} ({\rho }/{4})^d} = \frac{C_d}{f_\mathrm{min} \rho ^{d-1}}, \end{aligned}$$

where \(C_d = 2^{3d-1}/a_d\). \(\square \)

Now we allow for some outliers. We consider a random variable X with distribution P, that can be written as \(X = V(Y+Z) + (1-V)X''\), with \(\Vert Z\Vert \le s h\), \(s \le 1/4\), such that \(\mathbb {P}(V=1) = \beta \) and V is independent from \((Y,Z,X'')\), Y has law Q in \(\mathscr {G}_{D,d,f_\mathrm{min},f_\mathrm{max},\rho }\), and \(X''\) has uniform distribution on \(\mathscr {B}(0,K_0)\) (recall that \(K_0\) is defined below Lemma 2.2). Note that \(s=0\) corresponds to the clutter noise case, whereas \(\beta =1\) corresponds to the additive noise case.

For a fixed point x, let p(xh) denote \(P(\mathscr {B}(x,h))\). We have \(\mathbb {P}(VY \in \mathscr {B}(x,(1-s)h) ) \le \mathbb {P}(VX \in \mathscr {B}(x,h))\le \mathbb {P}(VY \in \mathscr {B}(x,2h))\). Hence we may write

$$\begin{aligned} \beta q(x,3/4h) + (1-\beta )q'(x,h) \le p(x,h) \le \beta q(x,2h) + (1-\beta )q'(x,h), \end{aligned}$$

where \(q(x,h) = Q(\mathscr {B}(x,h))\), and \(q'(x,h)\) = \((h/{K_0})^D\). Bounds on the quantities above are to be found in the following lemma.

Lemma 9.3

There exists \(h_+(\rho ,\beta ,f_\mathrm{min},f_\mathrm{max},d) \le \rho / \sqrt{12d}\) such that, if \(h \le h_+\), for every x such that \({d}(x,M) \le h\), we have

  • \(\mathscr {B}(x,2h) \cap M \subset \mathscr {B}(\pi _{{M}}(x),4h) \cap M,\)

  • \( q(x,2h) \le C_{d}{ f_\mathrm{max}} h^{d}\).

Moreover, if \({d}(x,M) \le h/\sqrt{2}\), we have

  • \(\mathscr {B}(\pi _{{M}}(x),h/8) \cap M \subset \mathscr {B}(x,3h/4)\),

  • \(c_{d}{ f_\mathrm{min}} h^d \le q(x,3h/4)\),

  • \( p(x,h) \le 2 \beta q(x,2h)\).

Proof

Set \(h_1(\rho ) = \rho /(16 \alpha )\), and let x be such that \({d}(x,{M}) \le h\), and \(h \le h_1\). According to Proposition 8.2, \(\mathscr {B}(x,2h) \cap {M} \subset \mathscr {B}(\pi _{{M}}(x), r_{2h}^+) \cap M\), with \(r_{2h}^+ = \sqrt{(1 + 2 \Delta / \rho )} r_{2h} \le 2 r_{2h} \le 4h\). According to Proposition 8.6, if y \(\in \) \(\mathscr {B}(\pi _M(x),4h) \cap M\), then \(d_M(\pi _M(x),y) \le 4\alpha h \le \rho /4\). Proposition 8.7 then yields \(q(x,2h) \le C_d f_\mathrm{max}h^d\).

Now if \({d}(x,M) \le h/\sqrt{2}\), \(\mathscr {B}(\pi _{{M}}(x),r_{3h/4}^-) \cap M \subset \mathscr {B}(x,3h/4) \cap M\) from Proposition 8.2, with \(r_{3h/4}^- = \sqrt{(1 - \Delta / \rho )}r_{3h/4} \ge r_{3h/4}/2 \ge h/8\). Since we have \(\mathscr {B}_M(\pi _M(x),h/8) \subset \mathscr {B}(\pi _M(x),h/8)\cap M\), a direct application of Proposition 8.7 entails \(c_{d}{ f_\mathrm{min}} h^d \le q(x,3h/4)\).

Applying Proposition 8.7 again, there exists \(h_2(f_\mathrm{min},d,D,\beta ,\rho )\) such that if \(h \le h_1 \wedge h_2\), then for any x such that \({d}(x,{M}) \le h/\sqrt{2}\) we have \((1-\beta ) q'(x,h) \le \beta c_{d,f_\mathrm{min}} h^{d} \), along with \(q(x,2h) \ge q(x,3h/4) \ge c_{d,f_\mathrm{min}} h^{d}\). We deduce that \( p(x,h) \le 2 \beta q(x,2h)\). Taking \(h_+ = h_1 \wedge h_2\wedge \rho / \sqrt{12d}\) leads to the result. \(\square \)

1.2 C.2 Local Covariance Matrices

In this section we describe the shape of the local covariance matrices involved in tangent space estimation. Without loss of generality, the analysis will be conducted for \(\widehat{\Sigma }_1\) (at sample point \(X_1\)), abbreviated as \(\widehat{\Sigma }\). We further assume that \(d(X_1,M) \le h/\sqrt{2}\), \(\pi _{{M}}(X_1) = 0\), and that \(T_{0}{M}\) is spanned by the d first vectors of the canonical basis of \(\mathbb {R}^D\).

The two models (additive noise and clutter noise) will be treated jointly, by considering a random variable X of the form

$$\begin{aligned} X = V(Y+Z) + (1-V)X'', \end{aligned}$$

where \(\mathbb {P}(V=1) = \beta \) and V is independent from \((Y,Z,X'')\), Y has distribution in \(\mathscr {G}_{D,d,f_\mathrm{min},f_\mathrm{max},\rho ,\sigma }\), \(\Vert Z\Vert \le \sigma \), and \(X''\) has uniform law on \(\mathscr {B}(0,K_0)\) (recall that \(K_0\) is defined above Definition 2.4). For short we denote by s the quantity \(\sigma /h\), and recall that we take \(s \le 1/4\), along with \(h \le h_+\) (defined in Lemma 9.3).

Let \(U(X_i,h)\), \(i=2, \ldots , n\), denote \( \mathbb {1}_{\mathscr {B} (X_1,h )}(X_i)\), let \(Y_i \in M\) and \(Z_i\) such that \(X_i = Y_i + Z_i\), with \(\Vert Z_i\Vert \le s h\), and let \(V_2, \ldots V_n\) denote random variables such that \(V_i= 1\) if \(X_i\) is drawn from the signal distribution (see Sect. 2.1). It is immediate that the \((U(X_i,h),V_i)\)’s are independent and identically distributed, with distribution (U(Xh), V).

With a slight abuse of notation, we will denote by \(\mathbb {P}\) and \(\mathbb {E}\) conditional probability and expectation with respect to \(X_1\). The following expectations will be of particular interest.

$$\begin{aligned} m(h) =&\mathbb {E}(XU(X,h)V)/\mathbb {E}(VU(X,h)), \\ \Sigma (h) =&\mathbb {E} (X-m(h))_{\top }(X-m(h))_{\top }^{t}U(X,h)V, \end{aligned}$$

where for any x in \(\mathbb {R}^D\) \(x_{\top }\) and \(x_{\perp }\) denote respectively the projection of x onto \(T_0 M\) and \(T_0M^{\perp }\).

The following lemma gives useful results on both m(h) and \(\Sigma (h)\), provided that \(X_1\) is close enough to M.

Lemma 9.4

If \(d(X_1,M) \le h/\sqrt{2}\), for \(h \le h_+\), then

$$\begin{aligned} \Sigma (h) = \left( \begin{array}{@{}c|c} A(h) &{} 0 \\ \hline 0 &{} 0 \end{array} \right) , \end{aligned}$$

with

$$\begin{aligned} \mu _\mathrm{min}(A(h)) \ge \beta {c_{d,f_\mathrm{min},f_\mathrm{max} }}h^{d+2}, \end{aligned}$$

where \(\mu _{min}(A(h))\) denotes the smallest eigenvalue of A(h). Furthermore,

$$\begin{aligned} \Vert m_{\top }(h)\Vert&\le 2h, \\ \Vert m_{\perp }(h)\Vert&\le \frac{2h^2}{\rho } + s h. \end{aligned}$$

Proof

Let \(x=y+z\) be in \(B(X_1,h)\), with \(y \in M\) and \(\Vert z\Vert \le s h\). Since \(s \le 1/4\), \(\Vert y\Vert \le 2h\). According to Proposition 8.2 combined with Proposition 8.6, we may write, for \(h \le h_+\) and y in \(\mathscr {B}(X_1 ,2h) \cap {M}\),

$$\begin{aligned} y = r v + R(r,v), \end{aligned}$$

in local polar coordinates. Moreover, if \(y \in \mathscr {B}(X_1,(1-s)h)\), then \(x \in \mathscr {B}(X_1,h)\). Then, according to Proposition 8.2, we have \(\mathscr {B}(\pi _{{M}}(X_1),r_{3h/4}^-) \cap M \subset \mathscr {B}(X_1,(1-s)h) \cap M\). Let u be a unit vector in \(T_0 {M}\). Then \(\langle u,x-m_{\top }(h) \rangle ^2\) \(=\) \(\langle u,rv + R(r,v)+z-m_{\top }(h) \rangle ^2\) \(\ge \) \(\langle u,rv -m_{\top }(h) \rangle ^2 /2 - 3 (R(r,v)+z)^2\) \(\ge \) \(\langle u,rv -m_{\top }(h) \rangle ^2 /2-6 r^4/(4\rho ^2) - 6 s^2 h^2\) according to Proposition 8.6. Hence we may write

$$\begin{aligned} \langle Au,u \rangle&= \beta \int _{\mathscr {B}(X_1,h) \cap {M}} \langle u,rv + R(r,v)-m_{\top }(h) \rangle ^2 J(r,v){f(r,v)}\,drdv \\&\ge \beta f_\mathrm{min} c_d \int _{r=0}^{r_{3h/4}^-} \int _{{\mathscr {S}}_{d-1}}r^{d-1}\big [\langle u,rv-m_\top (h)\rangle ^2 /2 - 3 r^4/(2\rho ^2) -6 s^2 h^2 \big ]\,drdv, \end{aligned}$$

according to Proposition 8.7 (bound on J(rv)) and Proposition 8.2 (the geodesic ball \(\mathscr {B}_M(\pi _M(X_1),r_{3h/4}^-)\) is included in the Euclidean ball \(\mathscr {B}(\pi _M(X_1),r_{3h/4}^-)\) \(\subset \) \(\mathscr {B}(X_1,(1-s)h) \cap {M}\)). Then

$$\begin{aligned}&\int _{r=0}^{r_{3h/4}^-}\int _{{\mathscr {S}}_{d-1}}{\frac{r^{d-1}\langle u,rv-m_\top (h)\rangle ^2}{2} \,drdv}\\&\quad \ge \int _{r=0}^{r_{3h/4}^-}\int _{{\mathscr {S}}_{d-1}}{\frac{r^{d-1}\langle u,rv\rangle ^2}{2}\, drdv} \\&\quad = \frac{\sigma _{d-1}}{2d} \int _{r=0}^{r_{3h/4}^-} r^{d+1} \,dr = \frac{\sigma _{d-1}(r_{3h/4}^-)^{d+2}}{2d(d+2)}, \end{aligned}$$

where \(\sigma _{d-1}\) denotes the surface of the \((d-1)\)-dimensional unit sphere. On the other hand,

$$\begin{aligned} \int _{r=0}^{r_{3h/4}^-} \int _{{\mathscr {S}}_{d-1}}{\frac{3 r^{d+3}}{2\rho ^2} + 6 s^2 h^2r^{d-1}\,drdv} = \sigma _{d-1}(r_{3h/4}^-)^{d+2} \biggl ( \frac{3 (r_{3h/4}^-)^{2}}{ 2(d+4)\rho ^2} + \frac{6 s^2h^2}{d}\biggr ). \end{aligned}$$

Since \(r_{3h/4}^- \le h \le h_+ \le \rho /\sqrt{12d}\), we conclude that

$$\begin{aligned} \langle Au,u \rangle \ge \beta c_d f_\mathrm{min} (r_{3h/4}^-)^{d+2} \ge \beta c_d f_\mathrm{min} h^{d+2}, \end{aligned}$$

since, for \(d(X_1,M) \le h/\sqrt{2}\) and \(h \le h_+\), \(r_{3h/4}^- \ge r_{3h/4}/2 \ge h/8\), according to Proposition 8.2.

Now, since for any \(x=y+z \in \mathscr {B}(X_1,h)\), y \(\in \) \(M \cap \mathscr {B}(0,2h)\) and \(\Vert z\Vert \le s h\), we have \(\Vert y_{\perp }\Vert \le 2h^2/\rho \), according to Proposition 8.1. Jensen’s inequality yields that \(\Vert m(h)_{\perp }\Vert \le 2h^2/\rho + s h\) and \(\Vert m(h)_{\top }\Vert \le \Vert m(h)\Vert \le 2h\). \(\square \)

The following Lemma 9.5 is devoted to quantifying the deviations of empirical quantities such as local covariance matrices, means and number of points within balls from their deterministic counterparts. To this aim we define \(N_0(h)\) and \(N_1(h)\) as the number of points drawn from respectively noise and signal in \(\mathscr {B}(X_1,h) \cap M\), namely

$$\begin{aligned} N_0(h)&= \sum _{i \ge 2} U(X_i,h) (1-V_i), \\ N_1(h)&= \sum _{i \ge 2} U(X_i,h) V_i. \end{aligned}$$

Lemma 9.5

Recall that \(h_0 = \bigl ( \kappa \frac{\log n}{\beta (n-1)} \bigr )^{1/(d+1)}\) (as defined in Sect. 5.2), and \(h_{\infty } = h_0^{(d+1)/d}\), for \(\kappa \) to be fixed later.

If \(h_0 \le h_+\) and \(d(X_1, M) \le h_+ /\sqrt{2}\), then, with probability larger than \(1- 4 ({1}/{n})^{2/d +1}\), the following inequalities hold, for all \(h \le h_0\).

$$\begin{aligned} \frac{N_0(h)}{n-1}&\le 2 (1- \beta ) q'(h) + \frac{10(2+2/d)\log n}{n-1}, \\ \frac{N_1(h)}{n-1}&\le 2 \beta q(2h) + \frac{10(2+2/d)\log n}{n-1}. \end{aligned}$$

Moreover, for all \((h_{\infty } \vee \sqrt{2} {d}(X_1,M)) \le h \le h_0\), and n large enough,

$$\begin{aligned} \biggl \Vert \frac{1}{n-1}\sum _{i \ge 2} (X_i-m(h))_{\top }(X_i-m(h))_{\top }^{t}U(X_i,h)V_i - \Sigma (h) \biggr \Vert _\mathscr {F}&\le C_d \,\frac{f_\mathrm{max}}{f_\mathrm{min}\sqrt{\kappa }}\, \beta q(2h) h^2 , \\ \frac{1}{n-1}\,\biggl \Vert \sum _{i \ge 2}{(X_i - m(h))_{\top }U(X_i,h)V_i} \biggr \Vert _{\mathscr {F}}&\le C_d \,\frac{f_\mathrm{max}}{f_\mathrm{min}\sqrt{\kappa }}\, \beta q(2h) h. \end{aligned}$$

Proof

The first two inequalities are straightforward applications of [6, Thm. 5.1]. The proofs of the two last results are detailed below. They are based on Talagrand–Bousquet’s inequality (see, e.g., [8, Thm. 2.3]) combined with the so-called peeling device.

Define \(h_-=(h_{\infty } \vee \sqrt{2} {d}(X_1,M))\), where we recall that in this analysis \(X_1\) is fixed, and let \(f_{T,h}\) denote the function

$$\begin{aligned} f_{T,h}(x,v) = \bigl \langle T,(x-m(h))_{\top }(x-m(h))_{\top }^{t} U(x,h)v \bigr \rangle , \end{aligned}$$

for \(h_- \le h \le h_0\), T a \(d \times d\) matrix such that \(\Vert T\Vert _{\mathscr {F}}=1\), x in \(\mathbb {R}^D\), v in \(\{0,1\}\), and \(\langle T, B \rangle = \mathrm {trace}(T^t A)\), for any square matrices T and A. Now we define the weighted empirical process

$$\begin{aligned} Z = \sup _{T,h} {\sum _{i \ge 2}{\frac{f_{T,h}(X_i,V_i) - \mathbb {E}f_{T,h}(X,V)}{r(h)}}}, \end{aligned}$$

with \(r(h) = \beta q(2h) h^2\), along with the constrained empirical processes

$$\begin{aligned} Z(u) = \sup _{T,h \le u} {\sum _{i \ge 2}{{f_{T,h}(X_i,V_i) - \mathbb {E}f_{T,h}(X,V)}}}, \end{aligned}$$

for \(h_- \le u \le h_0\). Since \(\Vert f_{T,h} \Vert _{\infty } \le \sup _{x \in M} \Vert x-m(h)\Vert ^2 U(x,h) \le 4h^2\), and

$$\begin{aligned} \mathrm{Var}(f_{T,h}(X,V))\le & {} \mathbb {E} ( \Vert X-m(h)\Vert ^2 U(X,h) V )\\\le & {} 16 \beta h^4 \mathbb {P}(VX \in \mathscr {B}(X_1,h)) \\\le & {} 16 \beta h^4 \mathbb {P}(VY \in \mathscr {B}(X_1,2h)), \end{aligned}$$

for \(s \le 1/4\), a direct application of Theorem 2.3 in [8] yields, with probability larger than \(1-e^{-x}\),

$$\begin{aligned} Z(u) \le 3 \mathbb {E} Z(u) + \sqrt{\frac{32 \beta q(2u) u^4x}{n-1}} + \frac{20u^2x}{3(n-1)}. \end{aligned}$$

To get a bound on \(\mathbb {E} Z(u)\), we introduce some independent Rademacher random variables \(\sigma _2, \ldots , \sigma _n\), i.e., \(\mathbb {P}(\sigma _j =1) = \mathbb {P}(\sigma _j=-1)=1/2\). With a slight abuse of notation, expectations with respect to the \((X_i,V_i)\)’s and \(\sigma _i\)’s, \(i=2, \ldots , n\), will be denoted by \(\mathbb {E}_{(X,V)}\) and \(\mathbb {E}_{\sigma }\) in what follows. According to the symmetrization principle (see, e.g., [7, Lem. 11.4]), we have

$$\begin{aligned} (n-1)\mathbb {E}Z(u)&\le 2 \mathbb {E}_{(X,V)} \mathbb {E}_{\sigma _i} \sup _{h \le u,T} \sum _{i \ge 2} \bigl \langle T,\sigma _i V_i U (X_i,h) ((X_i - m(h))_{\top }(X_i - m(h))_{\top }^t) \bigr \rangle \\&\le 2 \mathbb {E}_{(X,V)} \mathbb {E}_{\sigma } \sup _{h \le u,T} \sum _{i \ge 2} \sigma _i \bigl \langle V_i U(X_i,h) X_i X_i^t,T \bigr \rangle \\&\quad + 2 \mathbb {E}_{(X,V)} \mathbb {E}_{\sigma } \sup _{h \le u,T} \sum _{i \ge 2} \sigma _i \bigl \langle V_i U(X_i,h) X_i m(h)^t,T \bigr \rangle \\&\quad + 2 \mathbb {E}_{(X,V)} \mathbb {E}_{\sigma } \sup _{h \le u,T} \sum _{i \ge 2} \sigma _i \bigl \langle V_i U(X_i,h) m(h) X_i^t,T\bigr \rangle \\&\quad + 2 \mathbb {E}_{(X,V)} \mathbb {E}_{\sigma } \sup _{h \le u,T} \sum _{i \ge 2} \sigma _i \bigl \langle V_i U(X_i,h) m(h) m(h)^t,T \bigr \rangle \\&:= 2\mathbb {E}_{(X,V)}(E_1 + E_2 + E_3 + E_4). \end{aligned}$$

For a fixed sequence \((X_i,V_i)\), \(i=2, \ldots , n\), we may write

$$\begin{aligned} E_1 \le \mathbb {E}_\sigma \sup _{h \le u} \biggl (&\biggl \Vert \sum _{i \ge 2} \sigma _i V_i U(X_i,h) X_i X_i^t \biggr \Vert _{\mathscr {F}} - \mathbb {E}_{\sigma } \biggl \Vert \sum _{i \ge 2} \sigma _i V_i U(X_i,h) X_i X_i^t \biggr \Vert _{\mathscr {F}} \biggr )\\&+ \sup _{h \le u} \mathbb {E}_{\sigma } \biggl \Vert \sum _{i \ge 2} \sigma _i V_i U(X_i,h) X_i X_i^t \biggr \Vert _{\mathscr {F}} := E_{11} + E_{12}. \end{aligned}$$

Jensen’s inequality ensures that

$$\begin{aligned} E_{12}&\le \sup _{h \le u}\sqrt{\mathbb {E}_\sigma \biggl \Vert \sum _{i \ge 2} \sigma _i V_i U(X_i,h) X_i X_i^t \biggr \Vert _{\mathscr {F}}^2} \le 4u^2\sqrt{N_1(u)}, \end{aligned}$$

hence

$$\begin{aligned} \mathbb {E}_{(X,V)} E_{12} \le 4 u^2 \sqrt{\beta (n-1)q(2u)}. \end{aligned}$$

For the term \(E_{11}\), note that, when \((X_i,V_i)_{i=2, \ldots ,n}\) is fixed, \(\sup _{h \le u} \bigl ( \bigl \Vert \sum _{i \ge 2} \sigma _i V_i U(X_i,h) X_i X_i^t \bigr \Vert _{\mathscr {F}} - \mathbb {E}_{\sigma } \bigl \Vert \sum _{i \ge 2} \sigma _i V_i U(X_i,h) X_i X_i^t \bigr \Vert _{\mathscr {F}} \bigr )\) is in fact a supremum of at most \(N_1(u)\) processes. According to the bounded difference inequality (see, e.g., [7, Thm. 6.2]), each of these processes is subGaussian with variance bounded by \(16h^4N_1(u)\) (see Theorem 2.1 of [7]). Hence a maximal inequality for sub-Gaussian random variables (see Sect. 2.5, p. 31, of [7]) ensures that

$$\begin{aligned} E_{11} \le 4 h^2 \sqrt{2 N_1(u) \log (N_1(u))} \le 4 h^2 \sqrt{2 N_1(u) \log (n-1)}. \end{aligned}$$

Hence \(\mathbb {E}_{(X,V)} E_{11} \le 4h^2\sqrt{2 \beta (n-1) q(2u) \log (n-1)}\). \(E_2\) may also be decomposed as

$$\begin{aligned} E_2&= \mathbb {E}_{\sigma } \sup _{h \le u} {\biggl \Vert \biggl ( \sum _{i\ge 2} \sigma _i V_i U(X_i,h) X_i \biggr )m(h)^t \biggr \Vert _{\mathscr {F}}} \\&\le 2u \mathbb {E}_{\sigma } \sup _{h \le u} {\biggl \Vert \sum _{i\ge 2} \sigma _i V_i U(X_i,h) X_i \biggr \Vert } \\&\le 2u \biggl (\mathbb {E}_{\sigma } \sup _{h \le u} \biggl ( \biggl \Vert \sum _{i\ge 2} \sigma _i V_i U(X_i,h) X_i \biggr \Vert - \mathbb {E}_{\sigma } \biggl \Vert \sum _{i\ge 2} \sigma _i V_i U(X_i,h) X_i \biggr \Vert \biggr ) \\&\qquad \qquad + \sup _{h \le u} \mathbb {E}_{\sigma } \biggl \Vert \sum _{i\ge 2} \sigma _i V_i U(X_i,h) X_i \biggr \Vert \biggr ) := 2u( E_{21} + E_{22} ). \end{aligned}$$

Jensen’s inequality yields that \(E_{22} \le 2u \sqrt{N_1(u)}\), and the same argument as for \(E_{11}\) (expectation of a supremum of \(n-1\) sub-Gaussian processes with variance bounded by \(4u^2N_1(u)\)) gives \(E_{22} \le 2u \sqrt{2 N_1(u) \log (n-1)}\). Hence

$$\begin{aligned} \mathbb {E}_{(X,V)} E_2 \le 4u^2 \sqrt{\beta (n-1) q(2u)} \bigl ( \sqrt{2 \log (n-1)} +1 \bigr ). \end{aligned}$$

Similarly, we may write

$$\begin{aligned} \mathbb {E}_{(X,V)} E_3 \le 4u^2 \sqrt{\beta (n-1) q(u)} \bigl ( \sqrt{2 \log (n-1)} +1 \bigr ). \end{aligned}$$

At last, we may decompose \(E_4\) as

$$\begin{aligned} E_4&\le \mathbb {E}_{\sigma } 4u^2 \sup _{h \le u} {\biggl | \sum _{i \ge 2} V_i U(X_i,h) \biggr |} \\&\le 4u^2\biggl [\mathbb {E}_{\sigma } \sup _{h \le u}{ \biggl (\biggl | \sum _{i \ge 2} V_i U(X_i,h) \biggr | - \mathbb {E}_{\sigma }\biggl | \sum _{i \ge 2} V_i U(X_i,h) \biggr | \biggr )} + \sup _{h \le u} {\mathbb {E}_{\sigma }\biggr | \sum _{i \ge 2} V_i U(X_i,h) \biggr |} \biggr ] \\&\le 4u^2\sqrt{N_1(u)} \bigl ( \sqrt{2 \log (n-1)} +1 \bigr ), \end{aligned}$$

using the same argument. Combining all these terms leads to

$$\begin{aligned} \mathbb {E} Z(u) \le \frac{32 \sqrt{\beta q(2u)}}{\sqrt{n-1}} \bigl ( \sqrt{2 \log (n-1)} +1 \bigr ), \end{aligned}$$

hence we get

$$\begin{aligned} \mathbb {P} \biggl ( Z(u) \ge \frac{192 \sqrt{2}u^2 \sqrt{ \beta q(2u)\log (n-1)}}{\sqrt{n-1}} \biggl ( 1 + \frac{1}{48}\sqrt{\frac{x}{\log (n-1)}} \biggr ) + \frac{20 u^2 x}{n-1} \biggr ) \le e^{-x}. \end{aligned}$$

To derive a bound on the weighted process Z, we make use of the so-called peeling device (see, e.g., [7, Sect. 13.7, p. 387]). Set \(p = \lceil \log (h_0/h_\infty ) \rceil \le 1 + \log (h_0/h_\infty )\), so that \(e^{-{p}}h_0 \le h_-\). According to Lemma 9.3, if \(I_j\) denotes the slice \([e^{-j}h_0, e^{-(j-1)}h_0] \cap [h_-,h_0]\), then, for every h in \(I_j\), we have

$$\begin{aligned} r(h) \ge r(h_{j-1}) c_d\, \frac{f_\mathrm{min}}{f_\mathrm{max}}, \end{aligned}$$

where \(c_d\) depends only on the dimension, provided that \(h_0 \le h_+\). Now we may write

Since \(q(2h_{j-1}) \ge q(2h_-)\), we deduce that

$$\begin{aligned} \mathbb {P} \biggl ( Z \ge \frac{192f_\mathrm{max}\sqrt{2}}{f_\mathrm{min}c_d \sqrt{\beta q(2h_-)(n-1)}} \biggl ( 1 + \frac{1}{48}\sqrt{\frac{x + \log (p)}{n-1}} \biggr ) + \frac{20 f_\mathrm{max} (x + \log (p)) }{(n-1) c_{d} f_\mathrm{min} \beta q(2h_-)} \biggr )&\le p e^{ -(x+ \log (p))} \\&= e^{-x}. \end{aligned}$$

Now, according to Lemma 9.3, \(\beta q(2h_-) \ge c_d \kappa \log n /(n-1)\). On the other hand, \(p \le 1 + \log (h_0/h_\infty ) \le \log (\beta (n-1)/\kappa )/d \le \log n /d\), for \(\kappa \ge 1\). For n large enough, taking \(x = ( 1 + 2/d ) \log n\) in the previous inequality, we get

$$\begin{aligned} \mathbb {P} \biggl (Z \ge C_d \frac{f_\mathrm{max}}{f_\mathrm{min} \sqrt{\kappa }} \biggr ) \le \biggl ( \frac{1}{n} \biggr )^{1 + 2/d}. \end{aligned}$$

The last concentration inequality of Lemma 9.5 may be derived the same way, considering the functions

$$\begin{aligned} g_{T,h}(x,v) = \langle (x-m(h))U(x,h)v,T\rangle , \end{aligned}$$

where T is an element of \(\mathbb {R}^d\) satisfying \(\Vert T\Vert \le 1\). \(\square \)

1.3 C.3 Decluttering Rate

In this section we prove that, if the angle between tangent spaces is of order h, then we can distinguish between outliers and signal at order \(h^2\). We recall that the slab S(xTh) is the set of points y such that \( \Vert \pi _{T}(y-x) \Vert \le k_1 h\) and \(\Vert \pi _{T^{\perp }}(y-x)\Vert \le k_2h^2\), \(k_1\) and \(k_2\) defined in Lemma 5.4, and where \(\pi _{T}\) denotes the orthogonal projection onto T.

Lemma 9.6

Recall that \(h_0 = \bigl ( \kappa \frac{\log n}{\beta (n-1)} \bigr )^{1/(d+1)}\), and \(h_{\infty } = h_0^{(d+1)/d}\). Let K be fixed, and \(k_1\), \(k_2\) defined accordingly as in Lemma 5.4. If \(h_0 \le h_+\), for \(\kappa \) large enough (depending on d, \(\rho \) and \(f_\mathrm{min}\)) and n large enough, there exists a threshold t such that, for all \(h_{\infty } \le h \le h_0\), we have, with probability larger than \(1-3({1}/{n})^{2/d +1}\),

$$\begin{aligned} X_1 \in M \quad \hbox {and} \quad \angle (T,T_{X_1}M ) \le K h/\rho&\Longrightarrow \quad |S(X_1,T,h) \cap \{ X_2, \ldots , X_n \}| \ge t(n-1)h^d, \\ d(X_1,M) \ge h^2/\rho \quad \hbox {and} \quad \angle (T,T_{\pi (X_1)}M) \le K h/\rho&\Longrightarrow \quad |S(X_1,T,h) \cap \{ X_2, \ldots , X_n \}|< t(n-1)h^d, \\ d(X_1,M) \ge h/\sqrt{2}&\Longrightarrow \quad |S(X_1,T,h) \cap \{ X_2, \ldots , X_n \}| < t(n-1)h^d. \end{aligned}$$

Proof

Suppose that \(d(X_1,M) \ge h/\sqrt{2} \). Then, according to Lemma 5.4, \(S(X_1,T,h) \subset \mathscr {B}(X_1,h/2)\), with \(\mathscr {B}(X_1, h/2) \cap M = \emptyset \), hence \(P_n(S(X_1,T,h)) \le P_n(\mathscr {B}(X_1,h/2))\). Theorem 5.1 in [6] yields that, for all \(h_{\infty } \le h \le h_0\), with probability larger than \(1-({1}/{n})^{2/d +1}\),

$$\begin{aligned} P_n(\mathscr {B}(X_1, h/2)) \le 2 P(\mathscr {B}(X_1, h/2)) + \frac{4 (2/d +1 ) \log (8n)}{n-1}. \end{aligned}$$

Since \(\log (n)/(n-1) \le \beta h^d/\kappa \), we may write

$$\begin{aligned} P_n(S(X_1,T,h))&\le 2 Q'(\mathscr {B}(X_1, h/2)) + \frac{4 (2/d +1 ) \log (8n)}{n-1}\\&\le 2(1-\beta )\, \frac{h^D}{(2K_0)^D} + \frac{4 (2/d +1 ) \log (8n)}{n-1} \\&\le (1-\beta )C_{d,D,\rho ,f_\mathrm{min}} h^{d+1} + \frac{4 (2/d +1 ) \log (8n)}{n-1}\\&\le h^d \biggl ( (1-\beta )C_{d,D,\rho ,f_\mathrm{min}} h + \frac{C_d \beta }{\kappa } \biggr ), \end{aligned}$$

for n large enough so that \(h \le 1\).

If \(h/ \sqrt{2} \ge d(X_1,M) \ge h^2/\rho \) and \(\angle (T_{\pi (X_1)} M, T ) \le K h / \rho \), then Lemma 5.4 provides a big slab \(S'(x,T_{\pi (x)}M,h)\) so that \(S(x,T,h) \subset S'(x,T_{\pi (x)}M,h)\) and \(S'(x,T_{\pi (x)}M,h) \cap M = \emptyset \). Thus, \(P_n(S(x,T,h)) \le P_n(S'(x,T_{\pi (x)}M,h))\). Another application of Theorem 5.1 in [6] yields that, for all \(h_{\infty } \le h \le h_0\), with probability larger than \(1-({1}/{n})^{2/d +1}\),

$$\begin{aligned} P_n(S'(x,T_{\pi (x)}M,h)) \le 2 P(S'(x,T_{\pi (x)}M,h)) + \frac{4 (2/d +1 ) \log (8n)}{n-1}, \end{aligned}$$

hence, denoting by \(\omega _{r}\) the volume of the r-dimensional unit ball, we get

$$\begin{aligned} P_n(S(X_1,T,h))&\le 2 Q'(\mathscr {B}(X_1, h/2)) + \frac{4 (2/d +1 ) \log (8n)}{n-1} \\&\le \frac{2(1-\beta )\omega _d \omega _{D-d}}{K_0^D \omega _D}(k'_1h)^{d}(k'_2h^2)^{D-d} + \frac{4 (2/d +1 ) \log (8n)}{n-1} \\&\le (1-\beta )C_{d,D,f_\mathrm{min},\rho }h^{d+1} + \frac{4 (2/d +1 ) \log (8n)}{n-1}\\&\le h^d \biggl ( (1-\beta )C_{d,D,\rho ,f_\mathrm{min}} h + \frac{C_d \beta }{\kappa } \biggr ), \end{aligned}$$

when n is large enough.

Now, if \(X_1 \in M\) and \(\angle (T_{\pi (X_1)} M, T ) \le K h / \rho \), Lemma 5.4 entails that \(\mathscr {B}(X_1,k_3h) \cap M \subset S(X_1,T,h)\), hence \(P_n(S(X_1,T,h)) \ge P_n(\mathscr {B}(X_1,k_3h) \cap M)\). The last application of Theorem 5.1 in [6] yields that, for all \(h_{\infty } \le h \le h_0\), with probability larger than \(1-({1}/{n})^{2/d +1}\),

$$\begin{aligned} P_n(\mathscr {B}(X_1,k_3h) \cap M) \ge \frac{1}{2} \,P(\mathscr {B}(X_1,k_3h)) - \frac{2 (2/d +1 ) \log (8n)}{n-1}. \end{aligned}$$

Thus we deduce that

$$\begin{aligned} P_n(S(X_1,T,h))&\ge \frac{\beta }{2}\,Q(\mathscr {B}(X_1,k_3h))- \frac{2 (2/d +1 ) \log (8n)}{n-1}\\&\ge \frac{\beta }{2}\,q(k_3 h)- C_d \,\frac{\beta h^d}{\kappa } \ge h^d \biggl ( \beta c_{d,f_\mathrm{min},\rho } - C_d \frac{\beta }{\kappa } \biggr ), \end{aligned}$$

according to Lemma 9.3 (since \(k_3 \le 1\)). Choosing \(\kappa \) large enough (depending on d, \(\rho \) and \(f_\mathrm{min}\)) and then n large enough leads to the result. \(\square \)

D Matrix Decomposition and Principal Angles

In this section we expose a standard matrix perturbation result, adapted to our framework. For real symmetric matrices, we let \(\mu _i(\cdot )\) denote their i-th largest eigenvalue and \(\mu _\mathrm{min}(\cdot )\) the smallest one.

Theorem 10.1

(Sin \(\theta \) theorem [16], this version from Lemma 19 in [2]) Let \(O \in \mathbb {R}^{D\times D}\), \(B\in \mathbb {R}^{d\times d}\) be positive semi-definite symmetric matrices such that

$$\begin{aligned} O = \left( \begin{array}{c|c} B &{} 0 \\ \hline 0 &{} 0 \end{array} \right) + E. \end{aligned}$$

Let \(T_0\) (resp. T) be the vector space spanned by the first d vectors of the canonical basis (resp. by the first d eigenvectors of O). Then

$$\begin{aligned} \angle ( T_0, T ) \le \frac{\sqrt{2}\Vert E\Vert _\mathrm{op}}{\mu _\mathrm{min}(B)}. \end{aligned}$$

E Local PCA for Tangent Space Estimation and Decluttering

This section is dedicated to the proofs of Sect. 5. We begin with the case of additive noise (and no outliers), that is Proposition 5.1.

1.1 E.1 Proof of Proposition 5.1

Without loss of generality, the local PCA analysis will be conducted at base point \(X_1\), the results on the whole sample then follow from a standard union bound. For convenience, we assume that \(\pi _M(X_1) = 0\) and that \(T_{0}{M}\) is spanned by the d first vectors of the canonical basis of \(\mathbb {R}^D\). We recall that \(X_i = Y_i + Z_i\), with \(Y_i \in M\) and \(\Vert Z_i\Vert \le s h\), for \(s \le 1/4\). In particular, \(\Vert X_1\Vert \le \Vert Z_1\Vert \le sh \le h/4\).

We adopt the following notation for the local covariance matrix based on the whole sample \(\mathbb {X}_n\).

$$\begin{aligned} \widehat{\Sigma }(h)&= \frac{1}{n-1} \sum _{j \ge 2} (X_j - \overline{X}(h))(X_j - \overline{X}(h))^t U (X_i,h), \\ \overline{X}(h)&= \frac{1}{N(h)} \sum _{i \ge 2}{X_i U(X_i,h)}, \\ N(h)&= \sum _{i \ge 2}{U(X_i,h)}. \end{aligned}$$

Note that the tangent space estimator \(\mathtt {TSE}(\mathbb {X}_n,h)_1\) is the space spanned by the first d eigenvectors of \(\widehat{\Sigma }(h)\). From now on we suppose that all the inequalities of Lemma 9.5 are satisfied, defining then a global event of probability larger than \(1-4 ( {1}/{n} )^{2/d + 1}\).

We consider \(h=h_0 \le h_+\), so that Lemmas 9.3 and 9.4 hold. We may then decompose the local covariance matrix as follows.

$$\begin{aligned} \widehat{\Sigma }(h)&= \frac{1}{n-1}\sum _{i\ge 2} (X_i - m(h))(X_i-m(h))^t U(X_i,h) \nonumber \\&\quad - \frac{N(h)}{n-1}\, ( \overline{X}(h) -m(h))(\overline{X}(h) - m(h))^t := \widehat{\Sigma }_1+\widehat{\Sigma }_2. \end{aligned}$$
(3)

The first term may be written as

$$\begin{aligned} \widehat{\Sigma }_1&= \frac{1}{n-1}\sum _{i\ge 2} (X_i - m(h))(X_i-m(h))^t U(X_i,h) \\&= \frac{1}{n-1}\sum _{i\ge 2} (X_i - m(h))_{\top }(X_i-m(h))_{\top }^t U(X_i,h)+ R_1 \\&= \Sigma (h) + R_1 + R_2, \end{aligned}$$

where

$$\begin{aligned} \Sigma (h) = \left( \begin{matrix} A(h) &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) . \end{aligned}$$

According to Lemma 9.4 (with \(\beta =1\)), \(\mu _\mathrm{min}(A(h)) \ge c_d f_\mathrm{min} h^{d+2}\). On the other hand, using Proposition 8.1 and Lemma 9.4 we may write

$$\begin{aligned} (n-1)\Vert R_1\Vert _{\mathscr {F}}/N(h)&\le 2 \sup _{y+z \in \mathscr {B}(X_1,h)}{\Vert (y+z-m(h))_{\top }\Vert \Vert (y+z-m(h))_{\perp }\Vert }\\&\quad + \sup _{y+z \in \mathscr {B}(X_1,h)}{\Vert (y+z-m(h))_{\perp }\Vert ^2} \\&\le 2 \sup _{y+z \in \mathscr {B}(X_1,h)}{\Vert (y+z-m(h))\Vert (\Vert (y-m(h))_{\perp }\Vert + s h )} \\&\quad + \sup _{y \in \mathscr {B}(0,2h) \cap M}{ ( \Vert (y-m(h))_{\perp }\Vert + s h )^2} \\&\le 8h \biggl ( \frac{4h^2}{\rho } + 2 s h \biggr ) + \biggl ( \frac{4h^2}{\rho } + 2 s h \biggr )^2 \\&\le \frac{34 h^3}{\rho } + 20 s h^2, \end{aligned}$$

since \(h \le h_+\) and \(s \le 1/4\). In addition, we can write

$$\begin{aligned} R_2 = \left( \begin{matrix} R_2 &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) , \end{aligned}$$

with \(\Vert R_2\Vert _{\mathscr {F}} \le C_d \frac{f_\mathrm{max}}{f_\mathrm{min} \sqrt{\kappa }} q(2h) h^2\) according to Lemma 9.5 (with \(\beta =1\)). In turn, the term \(\widehat{\Sigma }_2\) may be decomposed as

$$\begin{aligned} \widehat{\Sigma }_2&= \left( \begin{matrix} R_4 &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) + R_3, \end{aligned}$$

with

$$\begin{aligned} \Vert R_4\Vert _{\mathscr {F}}&\le \frac{N(h)}{n-1}\, \Vert ( \overline{X}(h) -m(h))_{\top } \Vert \Vert (\overline{X}(h) - m(h))\Vert \\&\le \frac{2h}{n-1}\,\biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\top } U(X_i,h) \biggr \Vert \le \frac{2C_d q(2h) h^2 f_\mathrm{max} }{f_\mathrm{min} \sqrt{\kappa }}, \end{aligned}$$

according to Lemma 9.5. A similar bound on \(R_3\) may be derived,

$$\begin{aligned} \Vert R_3\Vert _{\mathscr {F}}&\le \frac{N(h)}{n-1}\, \Vert ( \overline{X}(h) -m(h))_{\perp }\Vert \Vert (\overline{X}(h) - m(h))\Vert \\&\le \frac{4h}{n-1} \,\biggl \Vert \sum _{i \ge 2} (Y_i + Z_i -m(h))_{\perp } U(X_i,h) \biggr \Vert \\&\le \frac{8hN(h) ( 2h^2/\rho + s h )}{n-1} \le \frac{N(h) h^2}{n-1} \,\biggl ( \frac{16h}{\rho } + 8 s \biggr ), \end{aligned}$$

according to Proposition 8.1 and Lemma 9.4. If we choose \(h = \bigl ( \kappa \frac{\log n}{n-1} \bigr )^{1/d}\), for \(\kappa \) large enough (depending on d, \(f_\mathrm{min}\) and \(f_\mathrm{max}\)), we have

$$\begin{aligned} \frac{\Vert R_2 + R_4\Vert _{\mathscr {F}}}{\mu _\mathrm{min}(A(h))} \le 1/4. \end{aligned}$$

Now, provided that \(\kappa \ge 1\), according to Lemma 9.5, we may write

$$\begin{aligned} \frac{\Vert R_1 + R_3\Vert _{\mathscr {F}}}{\mu _\mathrm{min}(A(h))} \le K_{f_\mathrm{max},f_\mathrm{min},d} ( h/ \rho + s ), \end{aligned}$$

which, for n large enough, leads to

$$\begin{aligned} \angle (T_0 M, \widehat{T}_{X_1} M) \le \sqrt{2} K_{f_\mathrm{max},f_\mathrm{min},d} ( h/ \rho + s ), \end{aligned}$$

according to Proposition 10.1.

1.2 E.2 Proof of Proposition 5.5

The proof of Proposition 5.5 follows the same path as the derivation of Proposition 5.1, with some technical difficulties due to the outliers (\(\beta < 1\)). We emphasize that in this framework, there is no additive noise (\(\sigma =0\)). As in the previous section, the analysis will be conducted for \(X_1\) \(\in \) \(\mathbb {X}^{(k)}\), for some fixed \(k \ge -1\), \(k=-1\) referring to the initialization step. Results on the whole sample then follow from a standard union bound. As before, we assume that \(\pi _{{M}}(X_1) = 0\) and that \(T_{0}{M}\) is spanned by the first d vectors of the canonical basis of \(\mathbb {R}^D\). In what follows, denote by \(\widehat{t}\) the map from \(\mathbb {R}^D\) to \(\{0,1\}\) such that \(\widehat{t}(X_i)=1\) if and only if \(X_i\) is in \(\mathbb {X}^{(k)}\).

We adopt the following notation for the local covariance matrix based on \(\mathbb {X}^{(k)}\) (after \(k+1\) iterations of the outlier filtering procedure).

$$\begin{aligned} \widehat{\Sigma }^{(k)}(h)&= \frac{1}{n-1} \sum _{j \ge 2} (X_j - \overline{X}(h)^{(k)})(X_j - \overline{X}(h)^{(k)})^t U (X_i,h) \widehat{t}(X_i), \\ \overline{X}^{(k)}(h)&= \frac{1}{N^{(k)}(h)} \sum _{i \ge 2}{X_i U(X_i,h) \widehat{t}(X_i)}, \\ N^{(k)}(h)&= \sum _{i \ge 2}{U(X_i,h) \widehat{t}(X_i)}. \end{aligned}$$

Also recall that we define \(N_0(h)\) and \(N_1(h)\) as the number of points drawn from respectively clutter and signal in \(\mathscr {B}(X_1,h) \cap M\) (based on the whole sample \(\mathbb {X}_n\)). At last, we suppose that all the inequalities of Lemmas 9.5 and 9.6 are satisfied, defining then a global event of probability larger than \(1-7 ( {1}/{n} )^{2/d + 1}\).

We recall that we consider \(h_\infty \le h \le h_{k}\), \(k \ge -1\) (with \(h_{-1} = h_0\)), and \(X_1\) in \(\mathbb {X}^{(k)}\) such that \({d}(X_1,M) \le h/\sqrt{2}\). We may then decompose the local covariance matrix as

$$\begin{aligned} \widehat{\Sigma }^{(k)}(h) = \frac{1}{n-1}&\sum _{i\ge 2} (X_i - m(h))(X_i-m(h))^t U(X_i,h) \widehat{t}(X_i) \nonumber \\&\, - \frac{N^{(k)}(h)}{n-1}\, ( \overline{X}^{(k)}(h) -m(h))(\overline{X}(h)^{(k)} - m(h))^t \nonumber \\ = \frac{1}{n-1}&\sum _{i\ge 2} (X_i - m(h))(X_i-m(h))^t U(X_i,h) \widehat{t}(X_i)V_i (X_i)\nonumber \\&\, + \frac{1}{n-1}\sum _{i\ge 2} (X_i - m(h))(X_i-m(h))^t U(X_i,h) (1-V_i) \widehat{t}(X_i) \nonumber \\&\quad \, - \frac{N^{(k)}(h)}{n-1}\, ( \overline{X}(h)^{(k)} -m(h))(\overline{X}(h)^{(k)} - m(h))^t, \nonumber \\ := \widehat{\Sigma }^{(k)}_1+&\widehat{\Sigma }^{(k)}_2+\widehat{\Sigma }^{(k)}_3. \end{aligned}$$
(4)

The proof of Proposition 5.5 will follow by induction.

Initialization step (\(k=-1\)): In this case \(\mathbb {X}^{(k)} = \mathbb {X}_n\), \(h=h_0\), \({d}(X_1,M) \le h_0/\sqrt{2}\), and \(\widehat{t}\) is always equal to 1. Then the first term \(\widehat{\Sigma }_1^{(k)}\) of (4) may be written as

$$\begin{aligned}&\frac{1}{n-1}\sum _{i\ge 2} (X_i - m(h))(X_i-m(h))^t U(X_i,h) V_i\\&\quad = \frac{1}{n-1}\sum _{i\ge 2} (X_i - m(h))_{\top }(X_i-m(h))_{\top }^t U(X_i,h) V_i + R_1 \\&\quad = \Sigma (h) + R_1 + R_2, \end{aligned}$$

where

$$\begin{aligned} \Sigma (h) = \left( \begin{matrix} A(h) &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) . \end{aligned}$$

According to Lemma 9.4, \(\mu _\mathrm{min}(A(h)) \ge c_d f_\mathrm{min} \beta h^{d+2}\), and \(\Vert R_1\Vert _{\mathscr {F}} \le 34 \frac{N_1(h) h^3 }{\rho (n-1)}\) according to Proposition 8.1. Moreover, we can write

$$\begin{aligned} R_2 = \left( \begin{matrix} R_2 &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) , \end{aligned}$$

with \(\Vert R_2\Vert _{\mathscr {F}} \le C_d \frac{f_\mathrm{max}}{f_\mathrm{min} \sqrt{\kappa }} \beta q(2h) h^2\) according to Lemma 9.5. Term \(\widehat{\Sigma }^{(k)}_2\) in inequality (4) may be bounded by

$$\begin{aligned} \Vert \widehat{\Sigma }^{(k)}_2\Vert _{\mathscr {F}} \le \frac{16 h^2 N_0(h)}{n-1}. \end{aligned}$$

In turn, term \(\widehat{\Sigma }^{(k)}_3\) may be decomposed as

$$\begin{aligned} \frac{N^{(k)}(h)}{n-1} \,( \overline{X}^{(k)}(h) -m(h))(\overline{X}^{(k)}(h) - m(h))^t&= \left( \begin{matrix} R_6 &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) + R_5, \end{aligned}$$

with

$$\begin{aligned} \Vert R_6\Vert _{\mathscr {F}}&\le \frac{{N}^{(k)}(h)}{n-1}\, \Vert ( \overline{X}^{(k)}(h) -m(h))_{\top } \Vert \Vert (\overline{X}^{(k)}(h) - m(h))\Vert \\&\le \frac{4h}{n-1}\,\biggl (\biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\top } U(X_i,h) V_i\biggr \Vert + \biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\top } U(X_i,h) (1-V_i) \biggr \Vert \biggr ) \\&\le \frac{4C_d \beta q(2h) h^2 f_\mathrm{max} }{f_\mathrm{min} \sqrt{\kappa }} + \frac{16 h^2 N_0(h)}{n-1}, \end{aligned}$$

according to Lemma 9.5. We may also write

$$\begin{aligned} \Vert R_5\Vert _{\mathscr {F}}&\le \frac{{N}^{(k)}(h)}{n-1}\, \Vert ( \overline{X}^{(k)}(h) -m(h))_{\perp } \Vert \Vert (\overline{X}^{(k)}(h) - m(h))\Vert \\&\le \frac{4h}{n-1}\, \biggl (\biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\perp } U(X_i,h) V_i \biggr \Vert + \biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\perp } U(X_i,h)(1-V_i) \biggr \Vert \biggr ) \\&\le \frac{16N_1(h) h^3}{(n-1) \rho } + \frac{16 N_0(h) h^2 }{(n-1)}, \end{aligned}$$

according to Proposition 8.1 and Lemma 9.4. As in the additive noise case (see proof of Proposition 5.1), provided that \(\kappa \) is large enough (depending on d, \(f_\mathrm{min}\), and \(f_\mathrm{max}\)), we have

$$\begin{aligned} \frac{\Vert R_2 + R_6\Vert _{\mathscr {F}}}{\mu _\mathrm{min}(A(h))} \le 1/4. \end{aligned}$$

Since \((n-1)h_0^{d} =\frac{\kappa \log n}{\beta h}\), if we ask \(\kappa \ge \rho \), then for n large enough we eventually get

$$\begin{aligned} \frac{\Vert \widehat{\Sigma }^{(k)}_2 + R_1 + R_5\Vert _{\mathscr {F}}}{\mu _\mathrm{min}(A(h))} \le K_{d,f_\mathrm{min},f_\mathrm{max},\beta }\, \frac{h_0}{\rho }, \end{aligned}$$

according to Lemma 9.5. Then, Proposition 10.1 can be applied to obtain

$$\begin{aligned} \angle ( \mathtt {TSE}(\mathbb {X}^{(-1)},h_{0})_1, T_{\pi (X_1)} {M} )\le \sqrt{2} K^{(0)}_{d,f_\mathrm{min},f_\mathrm{max},\beta } h_0/\rho . \end{aligned}$$

According to Lemma 9.6, we may choose \(\kappa \) large enough (with respect to \(K=\sqrt{2} K^{(0)}\), d, \(f_\mathrm{min}\) and \(\rho \)) and then a threshold t so that, if \(X_1 \in M\), then \(X_1 \in \mathbb {X}^{(0)}\), and if \(d(X_1,M) \ge h_0^2/ \rho \), then \(X_1 \notin \mathbb {X}^{(0)}\).

Iteration step: Now we assume that \(k \ge 0\), and that \({d}(X_i,M) \ge h_k^2/\rho \) implies \(\widehat{t}(X_i)=0\), with \(h_k = \bigl ( \kappa \frac{\log n}{\beta (n-1)} \bigr )^{\gamma _k}\), \(\gamma _k\) being between \(1/(d+1)\) and 1 / d. Let \(h_{\infty } \le h \le h_k\), and suppose that \({d}(X_1,M) \le h_k/\sqrt{2}\). As in the initialization step, \(\widehat{\Sigma }_1^{(k)}\) may be written as

$$\begin{aligned} \left( \begin{matrix} A(h) &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) + R_1 + R_2, \end{aligned}$$

with \(\mu _\mathrm{min}(A(h)) \ge c_d f_\mathrm{min} \beta h^{d+2}\), \(\Vert R_1\Vert _{\mathscr {F}} \le 34 \frac{N_1(h) h^3 }{\rho (n-1)}\), and \(\Vert R_2\Vert _{\mathscr {F}} \le C_d \frac{f_\mathrm{max}}{f_\mathrm{min} \sqrt{\kappa }} \times \beta q(2h) h^2\).

We can decompose \(\widehat{\Sigma }_2^{(k)}\) as

$$\begin{aligned}&\frac{1}{n-1}\sum _{i\ge 2} (X_i - m(h))(X_i-m(h))^t U(X_i,h)(1-V_i) \widehat{t}(X_i)\\&\quad = \frac{1}{n-1}\sum _{i\ge 2} (X_i - m(h))_{\top }(X_i-m(h))_{\top }^t U(X_i,h) (1-V_i) \widehat{t}(X_i) + R_3 \\&\quad = \left( \begin{matrix} R_4 &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) + R_3, \end{aligned}$$

with \(\Vert R_4\Vert _{\mathscr {F}} \le \frac{16N_0(h) h^2}{n-1}\) and \(\Vert R_3\Vert \le \frac{128 N_0(h) h h_k^2}{(n-1)\rho }\), according to Proposition 8.3, for n large enough so that \(h_0^2/\rho \le h_{\infty }\). Term \(\widehat{\Sigma }^{(k)}_3\) may also be written as

$$\begin{aligned} \frac{{N}^{(k)}(h)}{n-1}\, ( \overline{X}^{(k)}(h) -m(h))(\overline{X}^{(k)}(h) - m(h))^t&= \left( \begin{matrix} R_6 &{}\quad 0 \\ 0 &{}\quad 0 \end{matrix} \right) + R_5, \end{aligned}$$

with

$$\begin{aligned} \Vert R_6\Vert _{\mathscr {F}}&\le \frac{{N}^{(k)}(h)}{n-1} \,\Vert ( \overline{X}^{(k)}(h) -m(h))_{\top }\Vert \Vert (\overline{X}^{(k)}(h) - m(h))\Vert \\&\le \frac{4h}{n-1}\,\biggl (\biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\top } U(X_i,h) V_i \biggr \Vert + \biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\top }U(X_i,h) (1-V_i)\widehat{t}(X_i) \biggr \Vert \biggr ) \\&\le \frac{4C_d \beta q(2h) h^2 f_\mathrm{max} }{f_\mathrm{min} \sqrt{\kappa }} + \frac{16h^2 N_0(h)}{(n-1)}, \end{aligned}$$

according to Lemma 9.5. We may also write

$$\begin{aligned} \Vert R_5\Vert _{\mathscr {F}}&\le \frac{{N}^{(k)}(h)}{n-1} \,\Vert ( \overline{X}^{(k)}(h) -m(h))_{\perp } \Vert \Vert (\overline{X}^{(k)}(h) - m(h))\Vert \\&\le \frac{4h}{n-1} \,\biggl (\biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\perp } U(X_i,h) V_i \biggr \Vert + \biggl \Vert \sum _{i \ge 2} (X_i -m(h))_{\perp }U(X_i,h) (1-V_i)\widehat{t}(X_i) \biggr \Vert \biggr ) \\&\le \frac{16 N_1(h) h^3}{(n-1) \rho } + \frac{32 N_0(h) h h_k^2 }{\rho (n-1)}, \end{aligned}$$

according to Propositions 8.1, 8.3 and Lemma 9.4. As done before, we may choose \(\kappa \) large enough (depending on d, \(f_\mathrm{min}\) and \(f_\mathrm{max}\), but not on k) such that

$$\begin{aligned} \frac{\Vert R_2 + R_4 + R_6\Vert _{\mathscr {F}}}{\mu _\mathrm{min}(A(h))} \le 1/4. \end{aligned}$$

Now choose \(h = h_{k+1} = \bigl ( \kappa \frac{\log n}{\beta (n-1)} \bigr )^{(2\gamma _k +1)/(d+2)}\), with \(\kappa \ge 1\). This choice is made to optimize residual terms of the form \(h/\rho + h_k^2 N_0(h)/h\) coming from \({\Vert R_1 + R_3 + R_5 \Vert _{\mathscr {F}}}/{\mu _\mathrm{min}(A(h_{k+1}))}\). Then we get, according to Lemma 9.5,

$$\begin{aligned} \frac{\Vert R_1 + R_3 + R_5 \Vert _{\mathscr {F}}}{\mu _\mathrm{min}(A(h_{k+1}))}&\le C_d \,\frac{f_\mathrm{max}h_{k+1}}{\rho f_\mathrm{min}} + \frac{C'_d}{\beta \rho f_\mathrm{min}} \biggl ( \kappa \frac{\log n}{\beta (n-1)} \biggr )^{ \gamma _{k+1} + 2 \gamma _k - (2\gamma _k +1) + 1} \nonumber \\&\le K_{d,f_\mathrm{min},f_\mathrm{max},\beta } \frac{h_{k+1}}{\rho }, \end{aligned}$$
(5)

where again, \(K_{d,f_\mathrm{min},f_\mathrm{max},\beta }\) does not depend on k. At last, we may apply Proposition 10.1 to get

$$\begin{aligned} \angle ( \mathtt {TSE}(\mathbb {X}^{(k)},h_{k+1})_1, T_{\pi (X_1)} {M} )&\le \sqrt{2} K_{d,f_\mathrm{min},f_\mathrm{max},\beta } h_{k+1}/\rho \\&\le \sqrt{2} \bigl ( K_{d,f_\mathrm{min},f_\mathrm{max},\beta } \vee K^{(0)}_{d,f_\mathrm{min},f_\mathrm{max},\beta }\bigr ) h_{k+1}/\rho \\&:= C_{d,\beta ,f_\mathrm{max},f_\mathrm{min}} h_{k+1}/\rho . \end{aligned}$$

Then, according to Lemma 9.6, we may choose \(\kappa \) large enough (not depending on k) and t (not depending on k either) so that if \(X_1 \in M\), then \(X_1 \in \mathbb {X}^{(k+1)}\), and if \(d(X_1,M) \ge h_{k}^2/\rho \), then \(X_1 \notin \mathbb {X}^{(k+1)}\). Proposition 5.5 then follows from a straightforward union bound on the sample \(\{X_1, \ldots , X_n\}\).

1.3 E.3 Proof of Proposition 5.8

In this case, we have \({d}(X_j,M) \le h_\infty ^2/\rho \), for every \(X_j\) in \(\mathbb {X}^{(\widehat{k})}\). The proof of Proposition 5.8 follows from the same calculation as in the proof of Proposition 5.5, replacing \(h_k^2/\rho \) by its upper bound \(h_{\infty }^2/\rho \) and taking \(h_{k+1} = h_{\infty }\) in the iteration step.

F Proof of the Main Reconstruction Results

We now prove main results Theorem 2.7 (additive noise model), and Theorems 2.8 and 2.9 (clutter noise model).

1.1 F.1 Additive Noise Model

Proof of Corollary 5.2

Let \( Q \in \mathscr {G}_{D,d,f_\mathrm{min},f_\mathrm{max},\rho ,\sigma }\). Write \(\varepsilon = c_{d,f_\mathrm{min},f_\mathrm{max}} (h \vee \rho \sigma /h)\) for \(c_{d,f_\mathrm{min},f_\mathrm{max}}\) large enough, an consider the event A defined by

$$\begin{aligned} A= & {} \biggl \{ \max _{X_j \in \mathbb {X}_n} \angle ( T_{\pi _M(X_j)} {M} , \widehat{T}_j(h)) \le C_{d,f_\mathrm{min},f_\mathrm{max}} \biggl ( \frac{h}{\rho } + \frac{\sigma }{h} \biggr ) \biggr \} \\&\quad \cap \Big \{ \sup _{x \in M} d(x,\mathbb {X}_n) \le \sigma \Big \} \cap \biggl \{ \sup _{X_j \in \mathbb {X}_n} d(X_j,M) \le C_{d,f_\mathrm{min}} \biggl ( \frac{\log n}{n} \biggr )^{1/d} \biggr \} . \end{aligned}$$

Then from Proposition 5.1 and Lemma 9.1, \(\mathbb {P}_Q(A) \ge 1 - 5 \bigl ( \frac{1}{n} \bigr )^{2/d}\), and from the definition of \(\varepsilon \) and the construction of \(\mathbb {Y}_n\), for n large enough,

which yields the result. \(\square \)

Proof of Theorem 2.7

Following the above notation, we observe that on the event A, Theorem 4.4 holds for \(\varepsilon = c_{d,f_\mathrm{min},f_\mathrm{max}} (h \vee \rho \sigma /h)\), \(\theta = \varepsilon /(1140 \rho )\) (where we used the fact that \(\theta \le 2 \sin \theta \)) and \(\eta = \varepsilon ^2/(1140 \rho )\) with high probability, so that the first part of Theorem 2.7 is proven. Furthermore, for n large enough,

$$\begin{aligned} \mathbb {E}_Q\bigl [ {d}_{H}(M,\widehat{M}_{\mathtt {TDC}})\bigr ]&= \mathbb {E}_Q\bigl [ {d}_{H}(M,\widehat{M}_{\mathtt {TDC}})\mathbb {1}_A\bigr ] + \mathbb {E}_Q\bigl [ {d}_{H}(M,\widehat{M}_{\mathtt {TDC}})\mathbb {1}_{A^c}\bigr ] \\&\le C_d \frac{\varepsilon ^2}{\rho } + (1 - \mathbb {P}_Q(A) )( \mathrm {diam}(M) + \sigma ) \\&\le C'_{d,f_\mathrm{min},f_\mathrm{max},\rho } \varepsilon ^2, \end{aligned}$$

where for the last line we used the diameter bound of Lemma 2.2. \(\square \)

1.2 F.2 Clutter Noise Model

Proof of Corollary 5.6

Let \( P \in \mathscr {O}_{D,d,f_\mathrm{min},f_\mathrm{max},\rho ,\beta }\). For n large enough, write \(\varepsilon = c_{d,f_\mathrm{min}f_\mathrm{max}} h_{k_\delta }\) for \(c_{d,f_\mathrm{min}f_\mathrm{max}}\) large enough, and consider the event

$$\begin{aligned} A^\delta= & {} \biggl \{ \max _{X_j \in \mathbb {X}^{(k_\delta )}} \angle ( T_{\pi _M(X_j)} {M} , \widehat{T}^\delta _j) \le C_{d,f_\mathrm{min},f_\mathrm{max}}\, \frac{h_{k_\delta }}{\rho } \biggr \} \cap \biggl \{ \sup _{x \in M} d(x,\mathbb {X}^{(k_\delta )}) \le \frac{h_{k_\delta }^2}{\rho } \biggr \} \\&\quad \cap \biggl \{ \sup _{X_j \in \mathbb {X}^{(k_\delta )}} d(X_j,M) \le C_{d,f_\mathrm{min}} \biggl ( \frac{\log n}{n} \biggr )^{1/d} \biggr \} . \end{aligned}$$

From Proposition 5.5 and Lemma 9.1, \(\mathbb {P}_P\bigl (A^\delta \bigr ) \ge 1- 8 ( {1}/{n} )^{2/d}\)and from the definition of \(\varepsilon \) and the construction of \(\mathbb {Y}_n^\delta \), for n large enough,

$$\begin{aligned} A^\delta&\subset \biggl \{ \max _{X_j \in \mathbb {X}^{(k_\delta )}} \angle ( T_{\pi _M(X_j)} {M} , \widehat{T}_j^\delta ) \le \frac{\varepsilon }{2280 \rho } \biggr \} \cap \Big \{ \sup _{x \in M} d(x,\mathbb {X}^{(k_\delta )}) \le \varepsilon \Big \} \\&\qquad \cap \biggl \{ \sup _{X_j \in \mathbb {X}^{(k_\delta )}} d(X_j,M) \le \frac{\varepsilon ^2}{1140 \rho } \biggr \} \\&\subset \biggl \{ \max _{X_j \in \mathbb {Y}_n^\delta } \angle ( T_{\pi _M(X_j)} {M} , \widehat{T}_j^\delta ) \le \frac{\varepsilon }{2280 \rho } \biggr \} \cap \Big \{ \sup _{x \in M} d(x,\mathbb {Y}_n^\delta ) \le 2\varepsilon \Big \} \\&\qquad \cap \{ \mathbb {Y}_n \text {~is~} \varepsilon \text {-sparse} \} \cap \biggl \{ \sup _{X_j \in \mathbb {Y}_n^\delta } d(X_j,M) \le \frac{\varepsilon ^2}{1140 \rho } \biggr \} , \end{aligned}$$

which yields the result. \(\square \)

Proof of Theorem 2.8

Following the above notation, we observe that on the event \(A^\delta \), Theorem 4.4 holds for \(\varepsilon = c_{d,f_\mathrm{min}f_\mathrm{max}} h_{k_\delta }\), \(\theta = \varepsilon /(1140 \rho )\) and \(\eta = \varepsilon ^2/(1140 \rho )\), so that the first part of Theorem 2.8 is proven. As a consequence, for n large enough,

$$\begin{aligned} \mathbb {E}_P\bigl [ {d}_{H}(M,\widehat{M}_{\mathtt {TDC \delta }})\bigr ]&= \mathbb {E}_P\bigl [ {d}_{H}(M,\widehat{M}_{\mathtt {TDC \delta }})\mathbb {1}_{A^\delta }\bigr ] + \mathbb {E}_P\bigl [ {d}_{H}(M,\widehat{M}_{\mathtt {TDC \delta }})\mathbb {1}_{(A^{\delta })^c}\bigr ] \\&\le C_d\,\frac{\varepsilon ^2}{\rho } + (1- \mathbb {P}_P(A^{\delta })) \times 2K_0 \le C'_{d,f_\mathrm{min},f_\mathrm{max},\rho } \varepsilon ^2, \end{aligned}$$

where for the second line we used the fact that \(M \cup \widehat{M}_\mathtt {TDC \delta } \subset \mathscr {B}_0\), a ball of radius \(K_0 = K_0(d,f_\mathrm{min},\rho )\). \(\square \)

Finally, Theorem 2.9 is obtained similarly using Proposition 5.8.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aamari, E., Levrard, C. Stability and Minimax Optimality of Tangential Delaunay Complexes for Manifold Reconstruction . Discrete Comput Geom 59, 923–971 (2018). https://doi.org/10.1007/s00454-017-9962-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00454-017-9962-z

Keywords

Navigation