Skip to main content
Log in

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

In phase retrieval, we want to recover an unknown signal \({{\varvec{x}}}\in {{\mathbb {C}}}^d\) from n quadratic measurements of the form \(y_i = |\langle {{\varvec{a}}}_i,{{\varvec{x}}}\rangle |^2+w_i\), where \({{\varvec{a}}}_i\in {{\mathbb {C}}}^d\) are known sensing vectors and \(w_i\) is measurement noise. We ask the following weak recovery question: What is the minimum number of measurements n needed to produce an estimator \({\hat{{{\varvec{x}}}}}({{\varvec{y}}})\) that is positively correlated with the signal \({{\varvec{x}}}\)? We consider the case of Gaussian vectors \({{\varvec{a}}}_i\). We prove that—in the high-dimensional limit—a sharp phase transition takes place, and we locate the threshold in the regime of vanishingly small noise. For \(n\le d-o(d)\), no estimator can do significantly better than random and achieve a strictly positive correlation. For \(n\ge d+o(d)\), a simple spectral estimator achieves a positive correlation. Surprisingly, numerical simulations with the same spectral estimator demonstrate promising performance with realistic sensing matrices. Spectral methods are used to initialize non-convex optimization algorithms in phase retrieval, and our approach can boost the performance in this setting as well. Our impossibility result is based on classical information-theoretic arguments. The spectral algorithm computes the leading eigenvector of a weighted empirical covariance matrix. We obtain a sharp characterization of the spectral properties of this random matrix using tools from free probability and generalizing a recent result by Lu and Li. Both the upper bound and lower bound generalize beyond phase retrieval to measurements \(y_i\) produced according to a generalized linear model. As a by-product of our analysis, we compare the threshold of the proposed spectral method with that of a message passing algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. This gap is not due to the looseness of our lower bound. Indeed, by using the result of [6], one can show that the actual information-theoretic threshold is finite.

References

  1. Arora, S., Ge, R., Ma, T., Moitra, A.: Simple, efficient, and neural algorithms for sparse coding. In: Conference on Learning Theory (COLT), pp. 113–149. Paris, France (2015)

  2. Bahmani, S., Romberg, J.: Phase retrieval meets statistical learning theory: A flexible convex relaxation. In: Proc. of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 252–260. Fort Lauderdale, FL (2017)

  3. Bai, Z., Yao, J.: On sample eigenvalues in a generalized spiked population model. Journal of Multivariate Analysis 106, 167–177 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  4. Balan, R., Casazza, P., Edidin, D.: On signal reconstruction without phase. Applied and Computational Harmonic Analysis 20(3), 345–356 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bandeira, A.S., Cahill, J., Mixon, D.G., Nelson, A.A.: Saving phase: Injectivity and stability for phase retrieval. Applied and Computational Harmonic Analysis 37(1), 106–125 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  6. Barbier, J., Krzakala, F., Macris, N., Miolane, L., Zdeborová, L.: Phase transitions, optimal errors and optimality of message-passing in generalized linear models (2017). arXiv:1708.03395

  7. Barbier, J., Macris, N., Dia, M., Krzakala, F.: Mutual information and optimality of approximate message-passing in random linear estimation (2017). arXiv:1311.2445

  8. Bayati, M., Lelarge, M., Montanari, A.: Universality in polytope phase transitions and message passing algorithms. Annals of Applied Probability 25(2), 753–822 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bayati, M., Montanari, A.: The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inform. Theory 57, 764–785 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bayati, M., Montanari, A.: The LASSO risk for Gaussian matrices. IEEE Trans. Inform. Theory 58(4), 1997–2017 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  11. Belinschi, S.T., Bercovici, H., Capitaine, M., Février, M.: Outliers in the spectrum of large deformed unitarily invariant models (2015). arXiv:1412.4916

  12. Benaych-Georges, F., Nadakuditi, R.R.: The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics 227(1), 494���521 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cai, T.T., Li, X., Ma, Z.: Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. The Annals of Statistics 44(5), 2221–2251 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  14. Candès, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM Review 57(2), 225–251 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval from coded diffraction patterns. Applied and Computational Harmonic Analysis 39(2), 277–299 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  16. Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inform. Theory 61(4), 1985–2007 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  17. Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics 66(8), 1241–1274 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  18. Chen, Y., Candès, E.J.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. In: Advances in Neural Information Processing Systems, pp. 739–747 (2015)

  19. Chen, Y., Candès, E.J.: The projected power method: An efficient algorithm for joint alignment from pairwise differences (2016). arXiv:1609.05820

  20. Chen, Y., Candès, E.J.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. Communications on Pure and Applied Mathematics 70, 0822–0883 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  21. Conca, A., Edidin, D., Hering, M., Vinzant, C.: An algebraic characterization of injectivity in phase retrieval. Applied and Computational Harmonic Analysis 38(2), 346–356 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  22. Corbett, J.V.: The Pauli problem, state reconstruction and quantum-real numbers. Reports on Mathematical Physics 57(1), 53–68 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  23. Davis, C., Kahan, W.M.: The rotation of eigenvectors by a perturbation. III. SIAM Journal on Numerical Analysis 7(1), 1–46 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  24. Decelle, A., Krzakala, F., Moore, C., Zdeborová, L.: Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E 84(6), 066,106 (2011)

    Article  Google Scholar 

  25. Demanet, L., Jugnon, V.: Convex recovery from interferometric measurements. IEEE Trans. Computational Imaging 3(2), 282–295 (2017)

    Article  MathSciNet  Google Scholar 

  26. Deshpande, Y., Montanari, A.: Finding hidden cliques of size \(\sqrt{N/e}\) in nearly linear time. Foundations of Computational Mathematics pp. 1–60 (2013)

  27. Dhifallah, O., Lu, Y.M.: Fundamental limits of PhaseMax for phase retrieval: A replica analysis (2017). arXiv:1708.03355

  28. Dhifallah, O., Thrampoulidis, C., Lu, Y.M.: Phase retrieval via linear programming: Fundamental limits and algorithmic improvements. In: 55th Annual Allerton Conference on Communication, Control, and Computing (2017). arXiv:1710.05234

  29. Donoho, D.L., Javanmard, A., Montanari, A.: Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing. IEEE Trans. Inform. Theory 59(11), 7434–7464 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  30. Donoho, D.L., Maleki, A., Montanari, A.: Message Passing Algorithms for Compressed Sensing. Proceedings of the National Academy of Sciences 106, 18,914–18,919 (2009)

    Article  Google Scholar 

  31. Donoho, D.L., Maleki, A., Montanari, A.: The noise-sensitivity phase transition in compressed sensing. IEEE Trans. Inform. Theory 57(10), 6920–6941 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  32. Donoho, D.L., Montanari, A.: High dimensional robust M-estimation: Asymptotic variance via approximate message passing. Probability Theory and Related Fields 166(3–4), 935–969 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  33. Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval (2017). arXiv:1705.02356

  34. Fienup, J.R.: Phase retrieval algorithms: A comparison. Applied Optics 21(15), 2758–2769 (1982)

    Article  Google Scholar 

  35. Fienup, J.R., Dainty, J.C.: Phase retrieval and image reconstruction for astronomy. Image Recovery: Theory and Application pp. 231–275 (1987)

  36. Gerchberg, R.W.: A practical algorithm for the determination of the phase from image and diffraction plane pictures. Optik 35, 237–246 (1972)

    Google Scholar 

  37. Goldstein, T., Studer, C.: Phasemax: Convex phase retrieval via basis pursuit (2016). arXiv:1610.07531

  38. Harrison, R.W.: Phase problem in crystallography. J. Optical Soc. America A 10(5), 1046–1055 (1993)

    Article  Google Scholar 

  39. Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge University Press (2012)

  40. Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. In: Proc. of the 45th Ann. ACM Symp. on Theory of Computing (STOC), pp. 665–674. ACM, Palo Alto, CA (2013)

  41. Javanmard, A., Montanari, A.: State evolution for general approximate message passing algorithms, with applications to spatial coupling. Information and Inference pp. 115–144 (2013)

  42. Kabashima, Y., Krzakala, F., Mézard, M., Sakata, A., Zdeborová, L.: Phase transitions and sample complexity in bayes-optimal matrix factorization. IEEE Trans. Inform. Theory 62(7), 4228–4265 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  43. Karoui, N.E.: Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: Rigorous results (2013). arXiv:1311.2445

  44. Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inform. Theory 56(6), 2980–2998 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  45. Krzakala, F., Mézard, M., Zdeborová, L.: Phase diagram and approximate message passing for blind calibration and dictionary learning. In: Proc. of the IEEE Int. Symposium on Inform. Theory (ISIT), pp. 659–663. Istanbul, Turkey (2013)

  46. Lee, K., Li, Y., Junge, M., Bresler, Y.: Blind recovery of sparse signals from subsampled convolution. IEEE Trans. Inform. Theory 63(2), 802–821 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  47. Li, G., Gu, Y., Lu, Y.M.: Phase retrieval using iterative projections: Dynamics in the large systems limit. In: Proc. of the 53rd Annual Allerton Conf. on Commun., Control, and Computing (Allerton), pp. 1114–1118. Monticello, IL (2015)

  48. Li, X., Ling, S., Strohmer, T., Wei, K.: Rapid, robust, and reliable blind deconvolution via nonconvex optimization (2016). arXiv:1606.04933

  49. Lu, Y.M., Li, G.: Phase transitions of spectral initialization for high-dimensional nonconvex estimation (2017). arXiv:1702.06435

  50. Marčenko, V.A., Pastur, L.A.: Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. (N.S.) 72, 457–483 (in Russian) (1967)

  51. Miao, J., Ishikawa, T., Shen, Q., Earnest, T.: Extending X-ray crystallography to allow the imaging of noncrystalline materials, cells, and single protein complexes. Annu. Rev. Phys. Chem. 59, 387–410 (2008)

    Article  Google Scholar 

  52. Millane, R.P.: Phase retrieval in crystallography and optics. J. Optical Soc. America A 7(3), 394–411 (1990)

    Article  Google Scholar 

  53. Montanari, A., Richard, E.: Non-negative principal component analysis: Message passing algorithms and sharp asymptotics. IEEE Trans. Inform. Theory 62(3), 1458–1484 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  54. Montanari, A., Venkataramanan, R.: Estimation of low-rank matrices via approximate message passing (2017). arXiv:1711.01682

  55. Mossel, E., Neeman, J., Sly, A.: Belief propagation, robust reconstruction and optimal recovery of block models. In: Conference on Learning Theory (COLT), pp. 356–370. Barcelona, Spain (2014)

  56. Mossel, E., Xu, J.: Density evolution in the degree-correlated stochastic block model. In: Conference on Learning Theory (COLT), pp. 1319–1356. New York, NY (2016)

  57. Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. In: Advances in Neural Information Processing Systems, pp. 2796–2804 (2013)

  58. Neykov, M., Wang, Z., Liu, H.: Agnostic estimation for misspecified phase retrieval models. In: Advances in Neural Information Processing Systems, pp. 4089–4097 (2016)

  59. Oymak, S., Thrampoulidis, C., Hassibi, B.: The squared-error of generalized LASSO: A precise analysis. In: Proc. of the 51st Annual Allerton Conf. on Commun., Control, and Computing (Allerton), pp. 1002–1009. Monticello, IL (2013)

  60. Plan, Y., Vershynin, R.: The generalized lasso with non-linear observations. IEEE Transactions on information theory 62(3), 1528–1537 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  61. Rangan, S.: Generalized Approximate Message Passing for Estimation with Random Linear Mixing. In: Proc. of the IEEE Int. Symposium on Inform. Theory (ISIT), pp. 2168–2172. St. Petersburg (2011)

  62. Rangan, S., Goyal, V.K.: Recursive consistent estimation with bounded noise. IEEE Trans. Inform. Theory 47(1), 457–464 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  63. Reeves, G., Pfister, H.D.: The replica-symmetric prediction for compressed sensing with Gaussian matrices is exact. In: Proc. of the IEEE Int. Symposium on Inform. Theory (ISIT), pp. 665–669. Barcelona, Spain (2016)

  64. Schniter, P., Rangan, S.: Compressive phase retrieval via generalized approximate message passing. IEEE Transactions on Signal Processing 63(4), 1043–1055 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  65. Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Processing Magazine 32(3), 87–109 (2015)

    Article  Google Scholar 

  66. Silverstein, J.W., Bai, Z.: Spectral Analysis of Large Dimensional Random Matrices. (\(2^{nd}\) edition) Springer, (2010)

  67. Silverstein, J.W., Choi, S.I.: Analysis of the limiting spectral distribution of large-dimensional random matrices. Journal of Multivariate Analysis 54(2), 295–309 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  68. Soltanolkotabi, M.: Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization (2017). arXiv:1702.06175

  69. Speicher, R.: Free convolution and the random sum of matrices. Publ. Res. Inst. Math. Sci. 29, 731–744 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  70. Su, W., Candès, E.J.: Slope is adaptive to unknown sparsity and asymptotically minimax. Annals of Statistics 44(3), 1038–1068 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  71. Thrampoulidis, C., Abbasi, E., Hassibi, B.: Lasso with non-linear measurements is equivalent to one with linear measurements. In: Advances in Neural Information Processing Systems, pp. 3420–3428 (2015)

  72. Unser, M., Eden, M.: Maximum likelihood estimation of linear signal parameters for Poisson processes. IEEE Trans. Acoust., Speech, and Signal Process. 36(6), 942–945 (1988)

    Article  MATH  Google Scholar 

  73. Venkataramanan, R., Johnson, O.: Strong converse bounds for high-dimensional estimation (2017). arXiv:1706.04410

  74. Voiculescu, D.: Limit laws for random matrices and free products. Inventiones Mathematicae 104, 201–220 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  75. Waldspurger, I., d’Aspremont, A., Mallat, S.: Phase recovery, maxcut and complex semidefinite programming. Mathematical Programming 149(1-2), 47–81 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  76. Walther, A.: The question of phase retrieval in optics. Journal of Modern Optics 10(1), 41–49 (1963)

    MathSciNet  Google Scholar 

  77. Wang, G., Giannakis, G.B.: Solving random systems of quadratic equations via truncated generalized gradient flow. In: Advances in Neural Information Processing Systems, pp. 568–576 (2016)

  78. Wang, G., Giannakis, G.B., Eldar, Y.C.: Solving systems of random quadratic equations via truncated amplitude flow (2016). arXiv:1605.08285

  79. Wang, G., Giannakis, G.B., Saad, Y., Chen, J.: Solving almost all systems of random quadratic equations (2017). arXiv:1705.10407

  80. Wei, K.: Solving systems of phaseless equations via Kaczmarz methods: A proof of concept study. Inverse Problems 31(12) (2015)

  81. Yang, F., Lu, Y.M., Sbaiz, L., Vetterli, M.: Bits from photons: Oversampled image acquisition using binary Poisson statistics. IEEE Trans. Image Process. 21(4), 1421–1436 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  82. Zdeborová, L., Krzakala, F.: Statistical physics of inference: Thresholds and algorithms. Advances in Physics 65(5), 453–552 (2016)

    Article  Google Scholar 

  83. Zhang, H., Liang, Y.: Reshaped Wirtinger Flow for solving quadratic system of equations. In: Advances in Neural Information Processing Systems, pp. 2622–2630 (2016)

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Mondelli.

Additional information

Emmanuel Candes.

M. Mondelli was supported by an Early Postdoc.Mobility fellowship from the Swiss National Science Foundation. A. Montanari was partially supported by Grants NSF DMS-1613091 and NSF CCF-1714305.

Appendices

A Proof of Corollary 1

We start by providing in Lemma 6 a less compact, but more explicit form of expression (10). This more explicit expression is employed to prove Lemma 7, which yields the value of \(\delta _{\ell }\) for the case of phase retrieval. Finally, we provide the proof of Corollary 1.

Lemma 6

(Explicit formula for f(m)—complex case) Consider the function \(f:[0, 1]\rightarrow {\mathbb {R}}\) defined in (10). Then, f(m) is given by the following expression:

(144)

where \(I_0\) denotes the modified Bessel function of the first kind, given by

$$\begin{aligned} I_0(x) = \frac{1}{\pi }\int _0^\pi \exp \left( x\cos \theta \right) \mathrm{d}\theta . \end{aligned}$$
(145)

Proof

Let us rewrite G as

$$\begin{aligned} G = G^{(\mathrm R)}+ j G^{(\mathrm I)}, \quad \text{ with } (G^{(\mathrm R)}, G^{(\mathrm I)})\sim {\textsf {N}}\left( {\varvec{0}}_d, \frac{1}{2}{{\varvec{I}}}_2\right) , \end{aligned}$$

i.e., \(G^{(\mathrm R)}\) and \(G^{(\mathrm I)}\) are i.i.d. Gaussian random variables with mean 0 and variance 1 / 2. Set

$$\begin{aligned} R = \sqrt{\left( G^{(\mathrm R)}\right) ^2+\left( G^{(\mathrm I)}\right) ^2}. \end{aligned}$$

Then, R follows a Rayleigh distribution with scale parameter \(1/\sqrt{2}\), and hence

$$\begin{aligned} {{\mathbb {E}}}_{G}\left\{ p(y\mid |G|)\right\} ={{\mathbb {E}}}_{R}\left\{ p(y\mid R)\right\} = \int _0^{+\infty }2r \cdot p(y\mid r) \cdot \exp \left( -r^2\right) \mathrm{d}r. \end{aligned}$$
(146)

Let us rewrite \((G_{1}, G_{2})\) as

$$\begin{aligned} (G_{1}, G_{2}) = (G_{1}^{(\mathrm R)}+ j G_{1}^{(\mathrm I)}, G_{2}^{(\mathrm R)}+ j G_{2}^{(\mathrm I)}), \end{aligned}$$

with

$$\begin{aligned} (G_{1}^{(\mathrm R)}, G_{2}^{(\mathrm R)}, G_{1}^{(\mathrm I)}, G_{2}^{(\mathrm I)})\sim {\textsf {N}}\left( {\varvec{0}}_d, \frac{1}{2}\left[ \begin{array}{llll} 1 &{} \mathfrak {R}{(c)} &{} 0 &{} -\mathfrak {I}{(c)} \\ \mathfrak {R}{(c)} &{} 1 &{} \mathfrak {I}{(c)} &{} 0 \\ 0 &{} \mathfrak {I}{(c)} &{} 1 &{} \mathfrak {R}{(c)} \\ -\mathfrak {I}{(c)} &{} 0 &{} \mathfrak {R}{(c)} &{} 1 \\ \end{array}\right] \right) , \end{aligned}$$

and consider the following change in variables:

$$\begin{aligned} \left\{ \begin{array}{l} G_{1}^{(\mathrm R)} = R_1 \cos \theta _1 \\ G_{2}^{(\mathrm R)} = R_2 \cos \theta _2 \\ G_{1}^{(\mathrm I)} = R_1 \sin \theta _1 \\ G_{2}^{(\mathrm I)} = R_2 \sin \theta _2 \\ \end{array} \right. . \end{aligned}$$

Then, after some algebra, we have that

$$\begin{aligned} \begin{aligned}&{{\mathbb {E}}}_{G_{1}, G_{2}}\left\{ p(y\mid |G_{1}|)\cdot p(y\mid |G_{2}|)\right\} \\&\quad = \frac{1}{\pi ^2(1-|c|^2)}\int _{0}^{+\infty }\int _{0}^{+\infty }\int _{0}^{2\pi }\int _{0}^{2\pi }r_1 r_2 \cdot p(y\mid r_1) p(y\mid r_2) \cdot \\&\qquad \exp \left( -\frac{r_1^2+r_2^2-2r_1r_2\left( \mathfrak {R}{(c)}\cos (\theta _2-\theta _1)-\mathfrak {I}{(c)}\sin (\theta _2-\theta _1)\right) }{1-|c|^2}\right) \,\mathrm{d}r_1\,\mathrm{d}r_2\,\mathrm{d}\theta _1\,\mathrm{d}\theta _2. \end{aligned} \end{aligned}$$
(147)

By writing \((\mathfrak {R}{(c)}, \mathfrak {I}{(c)})=(|c|\cos \theta _c, |c|\sin \theta _c)\) and by using definition (145), we can further simplify the RHS of (147) as

$$\begin{aligned}&\frac{1}{1-|c|^2}\int _0^{+\infty }\int _0^{+\infty }4r_1 r_2 \cdot p(y\mid r_1) p(y\mid r_2) \cdot \exp \left( -\frac{r_1^2+r_2^2}{1-|c|^2}\right) \nonumber \\&\quad \cdot I_0\left( \frac{2r_1r_2|c|}{1-|c|^2}\right) \mathrm{d}r_1\mathrm{d}r_2. \end{aligned}$$
(148)

From (146) and (148), the claim easily follows. \(\square \)

Lemma 7

(Computation of \(\delta _{\ell }\) for phase retrieval) Computation of \(\delta _{\ell }\) for Phase Retrieval Let \(\delta _{\ell }(\sigma ^2)\) be defined as in (13) and assume that the distribution \(p(\cdot \mid |G|)\) appearing in (10) is given by (9). Then,

$$\begin{aligned} \lim _{\sigma \rightarrow 0}\delta _{\ell }(\sigma ^2)=1. \end{aligned}$$
(149)

Proof

For the special case of phase retrieval, it is possible to compute explicitly the function f(m) defined in (10) and simplified in Lemma 6. Indeed,

(150)

where in (a) we do the change in variables \(z= r^2\); in (b) we use definition (9) and we define \(H(x)=1\) if \(x\ge 0\) and \(H(x)=0\) otherwise; and in (c) we define \(Z\sim {\textsf {N}}(y, \sigma ^2)\). In the limit \(\sigma ^2\rightarrow 0\), we have that

$$\begin{aligned} {{\mathbb {E}}}_Z \left\{ \exp (-Z)H(Z)\right\} \rightarrow \exp (-y)\cdot H(y), \end{aligned}$$
(151)

by Lebesgue’s dominated convergence theorem. Similarly,

where in (a) we do the change in variables \(z_1= r_1^2\) and \(z_2= r_2^2\); in (b) we use definition (9) and we define \(H(x)=1\) if \(x\ge 0\) and \(H(x)=0\) otherwise; and in (c) we define \((Z_1, Z_2)\sim _{i.i.d.} {\textsf {N}}(y, \sigma ^2)\). In the limit \(\sigma ^2\rightarrow 0\), we have that

$$\begin{aligned} \begin{aligned}&{{\mathbb {E}}}_{Z_1, Z_2} \left\{ \exp \left( -\frac{Z_1+Z_2}{1-m}\right) \cdot I_0\left( \frac{2\sqrt{Z_1 Z_2}\sqrt{m}}{1-m}\right) \cdot H(Z_1)H(Z_2)\right\} \\&\quad \rightarrow \exp \left( -\frac{2y}{1-m}\right) \cdot I_0\left( \frac{2y\sqrt{m}}{1-m}\right) \cdot H(y), \end{aligned} \end{aligned}$$

by Lebesgue’s dominated convergence theorem. As a result, by using (145), we obtain that

Consequently,

which implies the desired result. \(\square \)

Proof (Proof of Corollary 1)

We follow the proof of Theorem 1 presented in Sect. 4. The first step is exactly the same, i.e., by applying Lemma 1, we show that (68) holds. On the contrary, the second step requires some modifications, since the definition of the error metric is different. In particular, we will prove that

$$\begin{aligned} \frac{1}{d^2} {{\mathbb {E}}}_{{{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}}\left\{ \left| \left| {{\mathbb {E}}}\left\{ {{\varvec{X}}}{{\varvec{X}}}^*\right\} - {{\mathbb {E}}}\left\{ {{\varvec{X}}}{{\varvec{X}}}^*\mid {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}\right\} \right| \right| _{F}^2\right\} =o_n(1). \end{aligned}$$
(152)

Furthermore, we have that

(153)

where in (a) we use the triangle inequality; in (b) we use that \({\mathbb {E}} \left\{ {{\varvec{X}}}{{\varvec{X}}}^*\right\} ={{\varvec{I}}}_d\) by Lemma 12; in (c) we use that, for any matrix \({{\varvec{A}}}\), \(\left| \left| {{\varvec{A}}}\right| \right| _F=\sqrt{\mathrm{trace}({{\varvec{A}}}{{\varvec{A}}}^*)}\); and in (d) we use that \({{\mathbb {E}}}\left\{ \mathrm{trace}\left( {{\varvec{X}}}{{\varvec{X}}}^*{{\varvec{X}}}{{\varvec{X}}}^*\right) \right\} ={{\mathbb {E}}}\left\{ \left| \left| {{\varvec{X}}}\right| \right| ^4\right\} =d^2\). By applying (152) and (153), the proof of Corollary 1 is complete.

Let us now give the proof of (152). Similarly to (71), we have that

$$\begin{aligned} \begin{aligned}&I(Y_{n+1}; {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}| {{\varvec{A}}}_{n+1})\\&\quad \ge \frac{1}{2K^2}\cdot {{\mathbb {E}}}_{{{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n+1}}\Biggl \{ \biggl |\int _{{\mathbb {C}}^d}p({{\varvec{x}}}\mid {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n})\int _{{\mathbb {R}}}p_{\mathrm{PR}}(y_{n+1}\mid {{\varvec{x}}}, \\&\qquad {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n+1})\varphi _{\mathrm{PR}}(y_{n+1})\mathrm{d}y_{n+1}\mathrm{d}{{\varvec{x}}}\\&\qquad -\int _{{\mathbb {C}}^d}p({{\varvec{x}}})\int _{{\mathbb {R}}}p_{\mathrm{PR}}(y_{n+1}\mid {{\varvec{x}}}, {{\varvec{A}}}_{n+1})\varphi _{\mathrm{PR}}(y_{n+1})\,\mathrm{d}y_{n+1}\,\mathrm{d}{{\varvec{x}}}\biggr |^2\Biggr \},\\ \end{aligned} \end{aligned}$$
(154)

where we define \(\varphi _{\mathrm{PR}}(x)=x\) for \(|x|\le M\), and \(\varphi _{\mathrm{PR}}(x)=M\cdot \mathrm{sign}(x)\) otherwise. Then,

(155)

where in (a) we set

$$\begin{aligned} E_1= & {} \int _{{\mathbb {C}}^d}p({{\varvec{x}}}\mid {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n})\int _{{\mathbb {R}}}p_{\mathrm{PR}}(y_{n+1}\mid {{\varvec{x}}}, {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n+1})\\&\cdot (\varphi _{\mathrm{PR}}(y_{n+1})-y_{n+1})\,\mathrm{d}y_{n+1}\,\mathrm{d}{{\varvec{x}}}, \end{aligned}$$

in (b) we use definition (9), and in (c) we use that \(|\langle {{\varvec{A}}}_{n+1}, {{\varvec{x}}}\rangle |^2=\langle {{\varvec{A}}}_{n+1}, {{\varvec{x}}}{{\varvec{x}}}^* {{\varvec{A}}}_{n+1}\rangle \). Similarly, we have that

$$\begin{aligned}&\int _{{\mathbb {C}}^d}p({{\varvec{x}}})\int _{{\mathbb {R}}}p_{\mathrm{PR}}(y_{n+1}\mid {{\varvec{x}}}, {{\varvec{A}}}_{n+1})\varphi _{\mathrm{PR}}(y_{n+1})\,\mathrm{d}y_{n+1}\,\mathrm{d}{{\varvec{x}}}\nonumber \\&\quad = \langle {{\varvec{A}}}_{n+1}, {{\mathbb {E}}}\left\{ {{\varvec{X}}}{{\varvec{X}}}^*\right\} {{\varvec{A}}}_{n+1}\rangle +E_2, \end{aligned}$$
(156)

with

$$\begin{aligned} E_2 = \int _{{\mathbb {C}}^d}p({{\varvec{x}}})\int _{{\mathbb {R}}}p_{\mathrm{PR}}(y_{n+1}\mid {{\varvec{x}}}, {{\varvec{A}}}_{n+1})\cdot (\varphi _{\mathrm{PR}}(y_{n+1})-y_{n+1})\,\mathrm{d}y_{n+1}\,\mathrm{d}{{\varvec{x}}}. \end{aligned}$$

By applying (155) and (156), we can rewrite the RHS of (154) as

$$\begin{aligned} \begin{aligned}&\frac{1}{2K^2}\cdot {{\mathbb {E}}}_{{{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}}{{\mathbb {E}}}_{{{\varvec{A}}}_{n+1}}\left\{ \left|\langle {{\varvec{A}}}_{n+1}, \left( {{\mathbb {E}}}\left\{ {{\varvec{X}}}{{\varvec{X}}}^*\mid {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}\right\} -{{\mathbb {E}}}\left\{ {{\varvec{X}}}{{\varvec{X}}}^*\right\} \right) {{\varvec{A}}}_{n+1}\rangle +E_1-E_2\right|^2\right\} \\&\quad \ge \frac{1}{2K^2}\cdot {{\mathbb {E}}}_{{{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}}\left( {{\mathbb {E}}}_{{{\varvec{A}}}_{n+1}}\left\{ \left|\langle {{\varvec{A}}}_{n+1}, {{\varvec{M}}}{{\varvec{A}}}_{n+1}\rangle \right|^2\right\} -{{\mathbb {E}}}_{{{\varvec{A}}}_{n+1}}\left\{ |E_1|^2\right\} -{{\mathbb {E}}}_{{{\varvec{A}}}_{n+1}}\left\{ |E_2|^2\right\} \right) , \end{aligned} \end{aligned}$$
(157)

where we define \({{\varvec{M}}}= {{\mathbb {E}}}\left\{ {{\varvec{X}}}{{\varvec{X}}}^*\mid {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}\right\} -{{\mathbb {E}}}\left\{ {{\varvec{X}}}{{\varvec{X}}}^*\right\} \). As K goes large, \({{\mathbb {E}}}_{{{\varvec{A}}}_{n+1}}\left\{ |E_i|^2\right\} \) tends to 0, for \(i\in \{1, 2\}\). Furthermore, we have that

(158)

where in (a) we use the following definition of the Kronecker delta:

$$\begin{aligned} \delta _{ab} = \left\{ \begin{array}{ll} 1, &{} \text{ if } a=b,\\ 0, &{} \text{ otherwise }, \end{array}\right. \end{aligned}$$
(159)

and in (b) we use that

$$\begin{aligned} \begin{aligned} \mathrm{trace}({{\varvec{M}}})&= \sum _{i=1}^d \left( {{\mathbb {E}}}\left\{ |X_i|^2 \mid {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}\right\} -{{\mathbb {E}}}\left\{ |X_i|^2\right\} \right) \\&= {{\mathbb {E}}}\left\{ \sum _{i=1}^d|X_i|^2 \mid {{\varvec{Y}}}_{1:n}, {{\varvec{A}}}_{1:n}\right\} -{{\mathbb {E}}}\left\{ \sum _{i=1}^d|X_i|^2\right\} =0. \end{aligned} \end{aligned}$$
(160)

As a result, we conclude that (152) holds. \(\square \)

B Proof of Corollary 2

First, we evaluate the RHS of (20), as well as the scaling between \(\delta _{\mathrm{u}}\) and \(\sigma ^2\) when \(\sigma ^2\rightarrow 0\). Then, we give the proof of Corollary 2.

Lemma 8

(Computation of \(\delta _{\mathrm{u}}\) for phase retrieval) Let \(\delta _{\mathrm{u}}(\sigma ^2)\) be defined as in (20) and assume that the distribution \(p(\cdot \mid |g|)\) is given by (9). Then,

$$\begin{aligned} \delta _{\mathrm{u}}(\sigma ^2)=1+\sigma ^2 + o(\sigma ^2). \end{aligned}$$
(161)

Proof

The proof boils down to computing expected values and integrals. By using (146) and (150), we immediately obtain that

$$\begin{aligned} {{\mathbb {E}}}_{G}\left\{ p_{\mathrm{PR}}(y\mid |G|)\right\}= & {} \int _0^{+\infty }2r \cdot p_{\mathrm{PR}}(y\mid r) \cdot \exp \left( -r^2\right) \mathrm{d}r\\= & {} \exp (-y)\,{{\mathbb {E}}}_{X}\left\{ \exp (-\sigma X)H(y+\sigma X)\right\} , \end{aligned}$$

where \(X\sim {\textsf {N}}(0, 1)\). By computing explicitly the expectation, we deduce that

$$\begin{aligned} {{\mathbb {E}}}_{G}\left\{ p_{\mathrm{PR}}(y\mid |G|)\right\} =\frac{1}{2}\exp \left( -y+\frac{\sigma ^2}{2}\right) \mathrm{erfc}\left( \frac{1}{\sqrt{2}}\left( -\frac{y}{\sigma }+\sigma \right) \right) , \end{aligned}$$
(162)

where \(\mathrm erfc(\cdot )\) is the complimentary error function. Similarly, we have that

$$\begin{aligned} \begin{aligned} {{\mathbb {E}}}_{G}\left\{ p_{\mathrm{PR}}(y\mid |G|)(|G|^2-1)\right\}&= \int _0^{+\infty }2(r^3-r) p_{\mathrm{PR}}(y\mid r) \cdot \exp \left( -r^2\right) \mathrm{d}r\\&=\exp (-y)\,{{\mathbb {E}}}_{X}\left\{ \exp (-\sigma X)H(y+\sigma X)(y-1+\sigma X)\right\} \\&= \frac{\sigma }{\sqrt{2\pi }} \exp \left( -\frac{y^2}{2\sigma ^2}\right) +\frac{1}{2}(y-1\\&\quad -\sigma ^2)\exp \left( -y+\frac{\sigma ^2}{2}\right) \mathrm{erfc}\left( \frac{1}{\sqrt{2}}\left( -\frac{y}{\sigma }+\sigma \right) \right) . \end{aligned} \end{aligned}$$
(163)

Thus, by using (162) and (163), after some manipulations, we obtain that

(164)

By performing the change in variables \(t=-y/\sigma +\sigma \), we simplify the first integral in the RHS of (164) as

$$\begin{aligned}&\int _{{\mathbb {R}}} \frac{\sigma ^2}{2\pi } \exp \left( y-\frac{\sigma ^2}{2}-\frac{y^2}{\sigma ^2}\right) \frac{2}{\mathrm{erfc}\left( \frac{1}{\sqrt{2}}\left( -\frac{y}{\sigma }+\sigma \right) \right) }\mathrm{d}y \nonumber \\&\quad = \frac{2\sigma ^3}{2\pi }\int _{{\mathbb {R}}} \frac{\exp \left( -t^2\right) }{\mathrm{erfc}\left( \frac{t}{\sqrt{2}}\right) }\exp \left( \sigma t-\frac{\sigma ^2}{2}\right) \,\mathrm{d}t = o(\sigma ^2), \end{aligned}$$
(165)

where in the last equality we use that the integral

$$\begin{aligned} \int _{{\mathbb {R}}} \frac{\exp \left( -t^2\right) }{\mathrm{erfc}\left( \frac{t}{\sqrt{2}}\right) }\,\mathrm{d}t \end{aligned}$$

is finite. The other two integrals in the RHS of (164) can be expressed in closed form as

$$\begin{aligned}&\int _{{\mathbb {R}}} \frac{2\sigma }{\sqrt{2\pi }} \exp \left( -\frac{y^2}{2\sigma ^2}\right) (y-1-\sigma ^2)\mathrm{d}y = -2\sigma ^2(1+\sigma ^2), \end{aligned}$$
(166)
$$\begin{aligned}&\begin{aligned}&\int _{{\mathbb {R}}} \frac{1}{2}\exp \left( -y+\frac{\sigma ^2}{2}\right) (y-1-\sigma ^2)^2\mathrm{erfc}\left( \frac{1}{\sqrt{2}}\left( -\frac{y}{\sigma }+\sigma \right) \right) \mathrm{d}y\\&\quad =\frac{\sigma }{2}\exp \left( -\frac{\sigma ^2}{2}\right) \int _{{\mathbb {R}}} \exp \left( \sigma t\right) (\sigma t+1)^2\mathrm{erfc}\left( \frac{t}{\sqrt{2}}\right) \mathrm{d}t=1+\sigma ^2+\sigma ^4. \end{aligned} \end{aligned}$$
(167)

By combining (164), (165), (166), and (167), the result follows. \(\square \)

Proof (Proof of Corollary 2)

Pick \(\sigma \) sufficiently small. Let \(G\sim {\textsf {CN}}(0, 1)\), \(Y\sim p_{\mathrm{PR}}(\cdot \mid |G|)\) and \(Z= {\mathcal {T}}(Y)\), where \(p_{\mathrm{PR}}\) is defined in (9) and \({\mathcal {T}}\) is a pre-processing function (possibly dependent on \(\sigma \)) that we will choose later on. Assume that

  1. (1)

    \({\mathcal {T}}(y)\) is upper and lower bounded by constants independent of \(\sigma \);

  2. (2)

    \({\mathbb {P}} (Z=0)\le c_1 <1\) and \(c_1\) is independent of \(\sigma \);

  3. (3)

    condition (82) holds.

Then, by Lemma 2, we have that, as \(n\rightarrow \infty \),

(168)

where \(\lambda _{\delta }^*\) is the unique solution of the equation \(\zeta _\delta (\lambda )=\phi (\lambda )\), and \(\phi \), \(\psi _\delta \), and \(\zeta _\delta \) are defined in (78), (79), and (81), respectively.

Let \(\tau \) be the supremum of the support of Z. Assume also that, for \(\bar{\lambda }_\delta < \lambda _{\delta }^*\),

  1. (4)

    \(\tau \ge c_2 > 0\) and \(c_2\) is independent of \(\sigma \);

  2. (5)

    \(\phi '(\lambda _{\delta }^*)\) is lower bounded by a constant independent of \(\sigma \);

  3. (6)

    \(\displaystyle \min _{\lambda \in (\min (\bar{\lambda }_\delta , \lambda _{\delta }^*))}\psi _{\delta }''(\lambda )\) is lower bounded by a strictly positive constant independent of \(\sigma \).

Let \(\bar{\lambda }_\delta \) be the point at which \(\psi _{\delta }\) attains its minimum. Then,

(169)

where in (a) we use that \(\zeta _\delta (\lambda _{\delta }^*)=\phi (\lambda _{\delta }^*)\); in (b) we use that \(\zeta _\delta (\bar{\lambda }_\delta )=\psi _\delta (\bar{\lambda }_\delta )\); (c) holds for some \(x_1 \in (\bar{\lambda }_\delta , \lambda _{\delta }^*)\) by the mean value theorem; (d) holds for some \(x_2 \in (\bar{\lambda }_\delta , \lambda _{\delta }^*)\) by the mean value theorem; and in (e) we use that \(\psi _{\delta }'(\bar{\lambda }_\delta )=0\). Note that (f) holds for some constant \(c_3\) independent of \(\sigma \), as \(\zeta _\delta '(x_1)\ge 0\), \(\psi _{\delta }''(x_2)\) is bounded, and \(\phi '(x_2)<0\) since \({\mathbb {P}} (Z=0) <1\).

As \(\phi '(\lambda _{\delta }^*)\) is bounded, from (168) and (169) we deduce that

$$\begin{aligned} \rho \ge c_4 \cdot \bigl (\phi (\bar{\lambda }_\delta ) -\psi _\delta (\bar{\lambda }_\delta )\bigr ), \end{aligned}$$
(170)

for some constant \(c_4\) independent of \(\sigma \). Notice that, if \(\lambda _{\delta }^*\le \bar{\lambda }_{\delta }\), then the right-hand side is non-positive and hence the lower bound still holds.

As \(\tau >0\), we also have that \(\bar{\lambda }_\delta >0\). Consider now the matrix \({{\varvec{D}}}_n'={{\varvec{D}}}_n/\alpha \) for some \(\alpha >0\). Then, the principal eigenvector of \({{\varvec{D}}}_n'\) is equal to the principal eigenvector of \({{\varvec{D}}}_n\). Hence, we can assume without loss of generality that \(\bar{\lambda }_\delta =1\). This condition can be rewritten as

$$\begin{aligned} {{\mathbb {E}}}\left\{ \frac{Z^2}{(1-Z)^2}\right\} =\frac{1}{\delta }, \end{aligned}$$
(171)

and (170) can be rewritten as

$$\begin{aligned} \rho \ge c_4 \cdot \left( {{\mathbb {E}}}\left\{ \frac{Z(|G|^2-1)}{1-Z}\right\} -\frac{1}{\delta }\right) . \end{aligned}$$
(172)

We set

$$\begin{aligned} {\mathcal {T}}(y)= {\mathcal {T}}_\delta ^*(y, \sigma ) \triangleq \frac{y_+-1}{y_++\sqrt{\delta \, c(\sigma )}-1}, \end{aligned}$$
(173)

where \(y_+=\max (y,0)\) and \(c(\sigma )\) is a function of \(\sigma \) to be set as to satisfy Eq. (171). By substituting (173) into (171), we get

$$\begin{aligned} {{\mathbb {E}}}\left\{ \frac{Z^2}{(1-Z)^2}\right\} = \frac{1}{\delta c(\sigma )}{{\mathbb {E}}}\big \{(Y_+-1)^2\big \} . \end{aligned}$$
(174)

Hence, Eq. (171) is satisfied by

$$\begin{aligned} c(\sigma ) = {{\mathbb {E}}}\big \{(Y_+-1)^2\big \} = {{\mathbb {E}}}\Big \{\big ((|G|^2+\sigma W)_+-1\big )^2\Big \} , \end{aligned}$$
(175)

where \(W\sim {\textsf {N}}(0,1)\) is independent of G. Therefore, \(c(\sigma )\) is always well defined and, by dominated convergence, \(c(\sigma )\rightarrow c(0) = 1\), as \(\sigma \rightarrow 0\). Furthermore,

$$\begin{aligned} {{\mathbb {E}}}\left\{ \frac{Z(|G|^2-1)}{1-Z}\right\}&= \frac{1}{\sqrt{\delta c(\sigma )} } {{\mathbb {E}}}\big \{(Y_+-1)(|G|^2-1)\big \}. \end{aligned}$$
(176)

By applying again dominated convergence, we get

$$\begin{aligned} \lim _{\sigma \rightarrow 0}{{\mathbb {E}}}\left\{ \frac{Z(|G|^2-1)}{1-Z}\right\}&= \frac{1}{\sqrt{\delta } } {{\mathbb {E}}}\big \{(|G|^2-1)^2\big \} = \frac{1}{\sqrt{\delta }} . \end{aligned}$$
(177)

Hence, by using (170), we get that, for \(\delta >1\) and \(\sigma \le \sigma _1(\delta )\),

$$\begin{aligned} \liminf _{n\rightarrow \infty }\frac{|\langle \hat{{{\varvec{x}}}}_{\sigma }, {{\varvec{x}}}\rangle |^2}{\left| \left| \hat{{{\varvec{x}}}}_\sigma \right| \right| _2^2 \left| \left| {{\varvec{x}}}\right| \right| _2^2} \ge c_5\Big (\frac{1}{\sqrt{\delta }}-\frac{1}{\delta }\Big )>0 , \end{aligned}$$
(178)

where \(\hat{{{\varvec{x}}}}_{\sigma }\) denotes the spectral estimator corresponding to the pre-processing function (173). Let us now verify that, by setting \({\mathcal {T}}={\mathcal {T}}_\delta ^*\), the requirements stated above are fulfilled. As \(\delta >1\), the function \({\mathcal {T}}\) is bounded by constants independent of \(\sigma \). It is also clear that conditions (2) and (4) hold. Furthermore, conditions (5) and (6) follow by showing that \(\phi (\lambda )\), \(\psi _{\delta }(\lambda )\) have well-defined uniform limits as \(\sigma \rightarrow 0\) that satisfy those conditions: This can be proved by one more application of dominated convergence.

In order to show that condition (3) holds, we follow the argument presented at the end of the proof of Lemma 2. First, we add a point mass with associated probability at most \(\epsilon _1\), which immediately implies that (82) is satisfied. Then, by applying the Davis–Kahan theorem [23], we show that we can take \(\epsilon _1 = 0\).

This proves the claim of the corollary for the pre-processing function \({{\mathcal {T}}}_\delta ^*(y, \sigma )\), defined in (173). Let us now prove that the same conclusion holds for \({{\mathcal {T}}}^*_{\delta }(y)\) defined in (25). Let

$$\begin{aligned} f_a(x) = \frac{x+1}{x+a}. \end{aligned}$$
(179)

Then, for any \(x, a \in {{\mathbb {R}}}_{\ge 0}\),

$$\begin{aligned} |f_a'(x)|= \frac{|a-1|}{(x+a)^2}\le \max \left( 1,\frac{1}{x^2}\right) . \end{aligned}$$
(180)

Therefore, since \({{\mathcal {T}}}_\delta ^*(y,\sigma ) = 1-f_{y_+}(\sqrt{\delta c(\sigma )}-1)\), we have that

$$\begin{aligned} \sup _{y\in {{\mathbb {R}}}}\big |{{\mathcal {T}}}_\delta ^*(y, \sigma )-{{\mathcal {T}}}^*_\delta (y)\big |\le \frac{1}{\bigl (\min (\sqrt{\delta c(\sigma )}-1, \sqrt{\delta }-1, 1)\bigr )^2}\,\sqrt{\delta }\cdot \big |\sqrt{c(\sigma )}-1\big | . \end{aligned}$$
(181)

Denote by \({{\varvec{D}}}_n(\sigma )\) and \({{\varvec{D}}}_n\) the matrices constructed with the pre-processing functions \({{\mathcal {T}}}_\delta ^*(y, \sigma )\) and \({{\mathcal {T}}}^*_\delta (y)\), respectively. It follows that, for any \(\delta >1\), there exists a function \({\varDelta }(\sigma )\) with \({\varDelta }(\sigma )\rightarrow 0\) as \(\sigma \rightarrow 0\) such that

$$\begin{aligned} \left| \left| {{\varvec{D}}}_n(\sigma )-{{\varvec{D}}}_n\right| \right| _{\mathrm{op}} \le {\varDelta }(\sigma ) . \end{aligned}$$
(182)

Hence, by applying again the Davis–Kahan theorem, we conclude that, for all \(\delta >1\) and \(\sigma \le \sigma _2(\delta )\),

$$\begin{aligned} \liminf _{n\rightarrow \infty }\frac{|\langle \hat{{{\varvec{x}}}}, {{\varvec{x}}}\rangle |^2}{\left| \left| \hat{{{\varvec{x}}}}\right| \right| _2^2 \left| \left| {{\varvec{x}}}\right| \right| _2^2} \ge c_5\Big (\frac{1}{\sqrt{\delta }}-\frac{1}{\delta }\Big )>0 , \end{aligned}$$
(183)

where \(\hat{{{\varvec{x}}}}\) is the estimator corresponding to the pre-processing function \({{\mathcal {T}}}^*_{\delta }(y)\). \(\square \)

C Auxiliary Lemmas

Lemma 9

(Distribution of scalar product of two unit complex vectors) Let \({{\varvec{x}}}_1, {{\varvec{x}}}_2 \sim _{i.i.d.} {\mathrm{Unif}}({\textsf {S}} _{{\mathbb {C}}}^{d-1})\) and define \(M = |\langle {{\varvec{x}}}_1, {{\varvec{x}}}_2 \rangle |^2\). Then,

$$\begin{aligned} M\sim \mathrm{Beta}(1, d-1). \end{aligned}$$
(184)

Proof

Without loss of generality, we can pick \({{\varvec{x}}}_2\) to be the first element of the canonical base of \({\mathbb {C}}^d\). Thus, M is equal to the squared modulus of the first component of \({{\varvec{x}}}_1\). Furthermore, we can think to \({{\varvec{x}}}_1\) as being chosen uniformly at random on the 2d-dimensional real sphere with radius 1. Note that, by taking a vector of i.i.d. standard Gaussian random variables and dividing it by its norm, we obtain a vector uniformly random on the sphere of radius 1. Hence,

$$\begin{aligned} M = \frac{U_1^2+U_2^2}{\sum _{i=1}^{2d} U_i^2}, \quad \text{ with } \{U_i\}_{1\le i \le 2d} \sim _{i.i.d.} {\textsf {N}}(0, 1). \end{aligned}$$

Set \(A = U_1^2+U_2^2\) and \(B = \sum _{i=3}^{2d} U_i^2\). Then, A and B are independent, A follows a gamma distribution with shape 1 and scale 2, i.e., \(A\sim {\varGamma }(1, 2)\), and B follows a Gamma distribution with shape \(d-1\) and scale 2, i.e., \(B\sim {\varGamma }(d-1, 2)\). Thus, we conclude that

$$\begin{aligned} M = \frac{A}{A+B}\sim \mathrm{Beta}(1, d-1), \end{aligned}$$

which proves the claim. \(\square \)

Lemma 10

(Distribution of scalar product of two unit real vectors) Let \({{\varvec{x}}}_1, {{\varvec{x}}}_2 \sim _{i.i.d.} {\mathrm{Unif}}({\textsf {S}} _{{\mathbb {R}}}^{d-1})\) and define \(M = \langle {{\varvec{x}}}_1, {{\varvec{x}}}_2 \rangle \). Then, the distribution of M is given by

$$\begin{aligned} p(m) = \frac{{\varGamma }(\frac{d}{2})}{\sqrt{\pi }{\varGamma }(\frac{d-1}{2})}(1-m^2)^{\frac{d-3}{2}}, \qquad \qquad m\in [-1, 1]. \end{aligned}$$
(185)

Proof

Without loss of generality, we can pick \({{\varvec{x}}}_2\) to be the first element of the canonical base of \({\mathbb {R}}^d\). Thus, M is equal to the first component of \({{\varvec{x}}}_1\). Note that, by taking a vector of i.i.d. standard Gaussian random variables and dividing it by its norm, we obtain a vector uniformly random on the sphere of radius 1. Hence,

$$\begin{aligned} M^2 = \frac{U_1^2}{\sum _{i=1}^{d} U_i^2}, \quad \text{ with } \{U_i\}_{1\le i \le d} \sim _{i.i.d.} {\textsf {N}}(0, 1). \end{aligned}$$

Set \(A = U_1^2\) and \(B = \sum _{i=2}^{d} U_i^2\). Then, A and B are independent, A follows a gamma distribution with shape 1 / 2 and scale 2, i.e., \(A\sim {\varGamma }(1/2, 2)\), and B follows a Gamma distribution with shape \((d-1)/2\) and scale 2, i.e., \(B\sim {\varGamma }((d-1)/2, 2)\). Thus, we obtain that

$$\begin{aligned} M^2 = \frac{A}{A+B}\sim \mathrm{Beta}(1/2, (d-1)/2). \end{aligned}$$

A change in variable and the observation that the distribution of M is symmetric around 0 immediately let us conclude that, for \(m\in [-1, 1]\),

$$\begin{aligned} p(m) = c \cdot (1-m^2)^{\frac{d-3}{2}}, \end{aligned}$$
(186)

where the normalization constant c is given by

$$\begin{aligned} c=\left( \int _{-1}^1(1-m^2)^{\frac{d-3}{2}}\mathrm{d}m\right) ^{-1} = \frac{{\varGamma }(\frac{d}{2})}{\sqrt{\pi }{\varGamma }(\frac{d-1}{2})}. \end{aligned}$$

\(\square \)

Lemma 11

(Laplace’s method) Let \(F:[0, 1]\rightarrow {\mathbb {R}}\) be such that

  • F is continuous;

  • \(F(x)<0\) for \(x\in (0, 1]\);

  • \(F(0)=0\).

Then,

$$\begin{aligned} \lim _{n\rightarrow +\infty }\int _0^1 \exp \left( n\cdot F(x)\right) \,\mathrm{d}x=0. \end{aligned}$$
(187)

Proof

Pick \(\epsilon >0\) and separate the integral into two parts:

$$\begin{aligned} \int _0^1\exp \left( n\cdot F(x)\right) \,\mathrm{d}x= \int _0^\epsilon \exp \left( n\cdot F(x)\right) \,\mathrm{d}x+\int _\epsilon ^1\exp \left( n\cdot F(x)\right) \,\mathrm{d}x. \end{aligned}$$

Now, the first integral is at most \(\epsilon \) since \(F(x)\le 0\) for any \(x\in [0, 1]\), and the second integral tends to 0 as \(n\rightarrow +\infty \) since \(F(x)<0\) for \(x\in (0, 1]\). Thus, the claim immediately follows. \(\square \)

Lemma 12

(Second moment of uniform vector on complex sphere) Let \({{\varvec{x}}}\sim {\mathrm{Unif}}(\sqrt{d}{} {\textsf {S}} _{{\mathbb {C}}}^{d-1})\). Then,

$$\begin{aligned} {\mathbb {E}} \left\{ {{\varvec{X}}}{{\varvec{X}}}^*\right\} ={{\varvec{I}}}_d. \end{aligned}$$
(188)

Proof

Let \({{\varvec{z}}}\sim {\textsf {CN}}({\varvec{0}}_d, {{\varvec{I}}}_d)\) and note that, by taking a vector of i.i.d. standard complex normal random variables and dividing it by its norm, we obtain a vector uniformly random on the complex sphere of radius 1. Then, \({{\varvec{x}}}= \sqrt{d}{{\varvec{z}}}/\left| \left| {{\varvec{z}}}\right| \right| \).

For \(i\in [d]\), denote by \(x_i\) and by \(z_i\) the ith component of \({{\varvec{x}}}\) and \({{\varvec{z}}}\), respectively. Then, for \(i\ne j\),

$$\begin{aligned} {\mathbb {E}} \left\{ X_i X_j^*\right\} =d\cdot {\mathbb {E}} \left\{ \frac{Z_i Z_j^*}{\left| \left| Z\right| \right| ^2}\right\} =0, \end{aligned}$$

where the last equality holds by symmetry. Furthermore,

$$\begin{aligned} {\mathbb {E}} \left\{ |X_i|^2\right\} =d\cdot {\mathbb {E}} \left\{ \frac{|Z_i|^2}{\left| \left| Z\right| \right| ^2}\right\} =1, \end{aligned}$$

as \(|Z_i|^2/\left| \left| Z\right| \right| ^2 \sim \mathrm{Beta}(1, d-1)\) by the argument of Lemma 9. As a result, the thesis is readily proved. \(\square \)

D Proof of Lemma 3

Before presenting the proof of the lemma, let us introduce some basic definitions and well-known results. Let H be a probability measure on \([0, +\infty )\). Denote by \({\varGamma }_H\) the support of H and by \(\tau \) the supremum of \({\varGamma }_H\). Let \(s_H(g)\) denote the Stieltjes transform of H, which is defined as

$$\begin{aligned} s_H(g) = \int \frac{1}{t-g}\,\mathrm{d}H(t), \end{aligned}$$
(189)

and let \(g_H(s)\) denote its inverse.

Consider a matrix

$$\begin{aligned} {{\varvec{S}}}_n = \frac{1}{d}{{\varvec{U}}}{{\varvec{M}}}_n {{\varvec{U}}}^*, \end{aligned}$$
(190)

and assume that

  1. (1)

    \({{\varvec{M}}}_n\) is PSD for all \(n\in {\mathbb {N}}\);

  2. (2)

    \({{\varvec{U}}}\in {\mathbb {C}}^{d\times n}\) is a random matrix whose entries \(\{u_{i, j}\}_{1\le i \le d, 1\le j \le n}\) are i.i.d. such that \({{\mathbb {E}}}\left\{ U_{i, j}\right\} =0\), \({{\mathbb {E}}}\left\{ |U_{i, j}|^2\right\} =1\), and \({{\mathbb {E}}}\left\{ |U_{i, j}|^4\right\} <\infty \) (this includes the cases in which the entries are \(\sim _{i.i.d.}{\textsf {CN}}(0, 1)\) or are \(\sim _{i.i.d.}{\textsf {N}}(0, 1)\));

  3. (3)

    The sequence of empirical spectral distributions of \({{\varvec{M}}}_n\in {\mathbb {C}}^{n\times n}\) converges weakly to a probability distribution H, as \(n\rightarrow +\infty \);

  4. (4)

    \(n/d\rightarrow \delta \in (0, +\infty )\), as \(n\rightarrow \infty \);

  5. (5)

    The sequence of spectral norms of \({{\varvec{M}}}_n\) is bounded.

Note that the normalization of (190) differs from the normalization of (86) by a factor of \(\delta \). However, since the form (190) is more common in the literature, we will stick to it for the rest of this section. In order to obtain the desired result for the matrix (86), it suffices to incorporate a factor \(1/\delta \) in the definition of the function \(\psi _{\delta }\).

Let \(F_{\delta , H}\) be the probability measure on \([0, +\infty )\) such that the inverse \(g_{F_{\delta , H}}\) of its Stieltjes transform \(s_{F_{\delta , H}}\) is given by

$$\begin{aligned} g_{F_{\delta , H}}(s) = -\frac{1}{s}+\delta \int \frac{t}{1+ts}\,\mathrm{d}H(t), \quad s\in \{z\in {\mathbb {C}}:\mathfrak {I}{(z)}>0\}. \end{aligned}$$
(191)

Then, the sequence of empirical spectral distributions of \({{\varvec{S}}}_n\) converges weakly to \(F_{\delta , H}\) [50], [66, Chapter 4].

For \(\alpha \not \in {\varGamma }_H\) and \(\alpha \ne 0\), let us also define

$$\begin{aligned} \psi _{F_{\delta , H}}(\alpha ) = g_{F_{\delta , H}}\left( -\frac{1}{\alpha }\right) . \end{aligned}$$
(192)

The function \(\psi _{F_{\delta , H}}\) links the support of \(F_{\delta , H}\) with the support of the generating measure H (see [67, Section 4] and [3, Lemma 3.1]). In particular, if \(\lambda \not \in {\varGamma }_{F_{\delta , H}}\), then \(s_{F_{\delta , H}}(\lambda )\ne 0\) and \(\alpha = -1/s_{F_{\delta , H}}(\lambda )\) satisfies

  1. (1)

    \(\alpha \not \in {\varGamma }_H\) and \(\alpha \ne 0\) (so that \(\psi _{F_{\delta , H}}(\alpha )\) is well defined);

  2. (2)

    \(\psi _{F_{\delta , H}}'(\alpha )>0\).

Conversely, if \(\alpha \) satisfies (1) and (2), then \(\lambda = \psi _{F_{\delta , H}}(\alpha )\not \in {\varGamma }_{F_{\delta , H}}\).

Let \(\lambda _1^{{{\varvec{M}}}_n}\) denote the largest eigenvalue of \({{\varvec{M}}}_n\) and assume that, as \(n\rightarrow \infty \),

(193)

Denote by \(\lambda _1^{{{\varvec{S}}}_n}\) the largest eigenvalue of \({{\varvec{S}}}_n\). Then, the results in [3] prove that

(194)

Informally, the eigenvalue \(\lambda _1^{{{\varvec{M}}}_n}\) is mapped into the point \(\psi _{F_{\delta , H}}(\alpha _*)\), where \(\alpha _* = -1/s_{F_{\delta , H}}(\lambda _*)\). This point emerges from the support of \(F_{\delta , H}\) if and only if \(\psi '_{F_{\delta , H}}(\alpha _*)>0\).

In what follows, we relax the first hypothesis, i.e., we consider the case in which the matrix \({{\varvec{M}}}_n\) is not PSD. We will show that (194) still holds, which implies the claim of Lemma 3.

Proof (Proof of Lemma 3)

As \({{\varvec{U}}}\) is drawn from a rotationally invariant distribution, we can assume without loss of generality that \({{\varvec{M}}}_n\) is diagonal. Then, we have that

$$\begin{aligned} \begin{aligned} {{\varvec{S}}}_n&= \left( {{\varvec{U}}}_+, {{\varvec{U}}}_-\right) \left( \begin{array}{cc} {{\varvec{M}}}_n^+ &{} {{\varvec{0}}}_k\\ {{\varvec{0}}}_{n-k} &{} -{{\varvec{M}}}_n^ -\\ \end{array}\right) \left( \begin{array}{l} {{\varvec{U}}}_+^*\\ {{\varvec{U}}}_-^*\\ \end{array}\right) \\&= \frac{1}{d}{{\varvec{U}}}_+ {{\varvec{M}}}_n^+ {{\varvec{U}}}_+^* - \frac{1}{d}{{\varvec{U}}}_- {{\varvec{M}}}_n^- {{\varvec{U}}}_-^*,\\ \end{aligned} \end{aligned}$$
(195)

where \({{\varvec{M}}}_n^+\in {\mathbb {R}}^{k \times k}\) is the diagonal matrix containing the positive eigenvalues of \({{\varvec{M}}}_n\), \({{\varvec{M}}}_n^-\in {\mathbb {R}}^{(n-k) \times (n-k)}\) is the diagonal matrix containing the negative eigenvalues of \({{\varvec{M}}}_n\) with the sign changed, \({{\varvec{U}}}_+\) contains the first k columns of \({{\varvec{U}}}\), and \({{\varvec{U}}}_-\) contains the remaining \(n-k\) columns of \({{\varvec{U}}}\).

Note that \({{\varvec{U}}}_+\) and \({{\varvec{U}}}_-\) are independent. Furthermore, if \({{\varvec{H}}}\) is a unitary matrix, then \({{\varvec{U}}}_-\) and \({{\varvec{H}}}{{\varvec{U}}}_-\) have the same distribution. Hence, we can rewrite the matrix \({{\varvec{S}}}_n\) as

$$\begin{aligned} {{\varvec{S}}}_n= \frac{1}{d}{{\varvec{U}}}_1 {{\varvec{M}}}_n^+ {{\varvec{U}}}_1^* - \frac{1}{d}{{\varvec{H}}}{{\varvec{U}}}_2 {{\varvec{M}}}_n^- {{\varvec{U}}}_2^*{{\varvec{H}}}^*, \end{aligned}$$
(196)

where \({{\varvec{U}}}_1\) and \({{\varvec{U}}}_2\) are independent with entries \(\sim _{i.i.d.}{\textsf {CN}}(0, 1)\), and \({{\varvec{H}}}\) is a random unitary matrix distributed according to the Haar measure.

Recall that, by hypothesis, the sequence of empirical spectral distributions of \({{\varvec{M}}}_n\) converges weakly to the probability distribution H, where H is the law of the random variable Z. Then, the sequence of empirical spectral distributions of \({{\varvec{M}}}_n^+\) converges weakly to the probability distribution \(H^+\), where \(H^+\) is the law of \(Z^+ = \max (Z, 0)\). Let \(F_{\delta , H^+}\) be the probability measure on \([0, +\infty )\) such that the inverse \(g_{F_{\delta , H^+}}\) of its Stieltjes transform \(s_{F_{\delta , H^+}}\) is given by

$$\begin{aligned} g_{F_{\delta , H^+}}(s) = -\frac{1}{s}+\delta \int \frac{t}{1+ts}\,\mathrm{d}H^+(t). \end{aligned}$$
(197)

Define \({{\varvec{S}}}_n^+=\frac{1}{d}{{\varvec{U}}}_1 {{\varvec{M}}}_n^+ {{\varvec{U}}}_1^*\). Then, as \({{\varvec{M}}}_n^+\) is PSD, the sequence of empirical spectral distributions of \({{\varvec{S}}}_n^+\) converges weakly to \(F_{\delta , H^+}\) [50], [66, Chapter 4].

Similarly, the sequence of empirical spectral distributions of \({{\varvec{M}}}_n^-\) converges weakly to the probability distribution \(H^-\), where \(H^-\) is the law of \(Z^- = -\min (Z, 0)\). Let \(F_{\delta , H^-}\) be the probability measure on \([0, +\infty )\) such that the inverse \(g_{F_{\delta , H^-}}\) of its Stieltjes transform \(s_{F_{\delta , H^-}}\) is given by

$$\begin{aligned} g_{F_{\delta , H^-}}(s) = -\frac{1}{s}+\delta \int \frac{t}{1+ts}\,\mathrm{d}H^-(t). \end{aligned}$$
(198)

Define \({{\varvec{S}}}_n^-=\frac{1}{d}{{\varvec{U}}}_2 {{\varvec{M}}}_n^- {{\varvec{U}}}_2^*\). Then, as \({{\varvec{M}}}_n^-\) is PSD, the sequence of empirical spectral distributions of \({{\varvec{S}}}_n^-\) converges weakly to \(F_{\delta , H^-}\) [50], [66, Chapter 4]. Furthermore, the sequence of empirical spectral distributions of \(-{{\varvec{S}}}_n^-\) converges weakly to the probability measure \(F_{\delta , H^-_{\mathrm{inv}}}\) such that

$$\begin{aligned} g_{F_{\delta , H^-_{\mathrm{inv}}}}(s)=-g_{F_{\delta , H^-}}(-s), \end{aligned}$$
(199)

where \(g_{F_{\delta , H^-_{\mathrm{inv}}}}\) denotes the inverse of the Stieltjes transform \(s_{F_{\delta , H^-_{\mathrm{inv}}}}\) of \(F_{\delta , H^-_{\mathrm{inv}}}\).

Define

$$\begin{aligned} F_{\delta , H} = F_{\delta , H^+} \boxplus F_{\delta , H^-_{\mathrm{inv}}}, \end{aligned}$$
(200)

where \(\boxplus \) denotes the free additive convolution. Recall the decomposition (196). Then, the sequence of empirical spectral distributions of \({{\varvec{S}}}_n\) converges weakly to \(F_{\delta , H}\) [69, 74]. Consequently, the inverse \(g_{F_{\delta , H}}\) of the Stieltjes transform \(s_{F_{\delta , H}}\) of \(F_{\delta , H}\) can be computed as

(201)

where in (a) we use (200); in (b) we use that the \({\mathcal {R}}\)-transform of the free convolution is the sum of the \({\mathcal {R}}\)-transforms of the addends; in (c) we use (197), (198), and (199); in (d) we perform the change in variable \(t\rightarrow -t\) in the second integral; and in (e) we use the fact that \(H^+(t)\) is the law of \(\max (Z, 0)\), \(H^-(-t)\) is the law of \(\min (Z, 0)\), and that \(t/(1+ts)=0\) for \(t=0\).

By hypothesis, . First, we establish under what condition the largest eigenvalue of \({{\varvec{S}}}_n^+\), call it \(\lambda _1^{{{\varvec{S}}}_n^+}\), converges to a point outside the support of \(F_{\delta , H^+}\). To do so, define \(\psi _{F_{\delta , H^+}}(\alpha ) = g_{F_{\delta , H^+}}(-1/\alpha )\). Then, , if \(\psi '_{F_{\delta , H^+}}(\alpha _*)>0\); and \(\lambda _1^{{{\varvec{S}}}_n^+}\) converges almost surely to a point inside the support of \(F_{\delta , H^+}\), otherwise [3].

For the moment, assume that \(\psi '_{F_{\delta , H^+}}(\alpha _*)>0\). We now establish under what condition the largest eigenvalue of \({{\varvec{S}}}_n\), call it \(\lambda _1^{{{\varvec{S}}}_n}\), converges to a point outside the support of \(F_{\delta , H}\). To do so, let \(\omega _1\) and \(\omega _2\) denote the subordination functions corresponding to the free convolution \(F_{\delta , H^+} \boxplus F_{\delta , H^-_{\mathrm{inv}}}\). These functions satisfy the following analytic subordination property:

$$\begin{aligned} s_{F_{\delta , H^+} \boxplus F_{\delta , H^-_{\mathrm{inv}}}}(z) = s_{F_{\delta , H^+}}(\omega _1(z)) =s_{F_{\delta , H^-_{\mathrm{inv}}}}(\omega _2(z)). \end{aligned}$$
(202)

Then, by Theorem 2.1 of [11], we have that the spike \(\psi _{F_{\delta , H^+}}(\alpha _*)\) is mapped into \(\omega _1^{-1}(\psi _{F_{\delta , H^+}}(\alpha _*))\). The Stieltjes transform at this point is given by

where in (a) we use (202); in (b) we use the definition of \(\psi _{F_{\delta , H^+}}\); and in (c) we use that \(g_{F_{\delta , H^+}}\) is the functional inverse of the Stieltjes transform \(s_{F_{\delta , H^+}}\). As a result, by [67, Section 4], we conclude that \(\omega _1^{-1}(\psi _{F_{\delta , H^+}}(\alpha _*))\not \in {\varGamma }_{F_{\delta , H}}\) if and only if \(\psi '_{F_{\delta , H}}(\alpha _*)>0\). Furthermore, the condition \(\psi '_{F_{\delta , H}}(\alpha _*)>0\) is more restrictive than the condition \(\psi '_{F_{\delta , H^+}}(\alpha _*)>0\) since

$$\begin{aligned} \psi '_{F_{\delta , H^+}}(\alpha _*) = 1-\delta \int \left( \frac{t}{\alpha _*-t}\right) ^2\,\mathrm{d}H^+ \ge 1-\delta \int \left( \frac{t}{\alpha _*-t}\right) ^2\,\mathrm{d}H = \psi '_{F_{\delta , H}}(\alpha _*). \end{aligned}$$

Hence, \(\lambda _1^{{{\varvec{S}}}_n}\) converges to a point outside the support of \(F_{\delta , H}\) if and only if \(\psi '_{F_{\delta , H}}(\alpha _*)>0\) and the proof is complete. \(\square \)

Remark 8

(Lemma 3 for the real case) Consider the random matrix \(\frac{1}{n}{{\varvec{U}}}{{\varvec{M}}}_n{{\varvec{U}}}^{{{\textsf {T}}}}\), where \({{\varvec{U}}}\in {\mathbb {R}}^{(d-1)\times n}\) is a random matrix whose entries are \(\sim _{i.i.d.}{\textsf {N}}(0, 1)\) and \({{\varvec{M}}}_n \in {\mathbb {R}}^{n\times n}\). Then, the claim of Lemma 3 still holds. Let us briefly explain why this is the case.

If \({{\varvec{M}}}_n\) is PSD, then the results of [3] allow us to conclude. If \({{\varvec{M}}}_n\) is not PSD, we can write an expression analogous to (196):

$$\begin{aligned} \frac{1}{d}{{\varvec{U}}}{{\varvec{M}}}_n {{\varvec{U}}}^{{{\textsf {T}}}}=\frac{1}{d}{{\varvec{U}}}_1 {{\varvec{M}}}_n^+ {{\varvec{U}}}_1^{{{\textsf {T}}}} - \frac{1}{d}{{\varvec{H}}}{{\varvec{U}}}_2 {{\varvec{M}}}_n^- {{\varvec{U}}}_2^{{{\textsf {T}}}}{{\varvec{H}}}^*, \end{aligned}$$
(203)

where \({{\varvec{M}}}_n^+\) is the diagonal matrix containing the positive eigenvalues of \({{\varvec{M}}}_n\), \({{\varvec{M}}}_n^-\) is the diagonal matrix containing the negative eigenvalues of \({{\varvec{M}}}_n\) with the sign changed, \({{\varvec{U}}}_1\) and \({{\varvec{U}}}_2\) are independent with entries \(\sim _{i.i.d.}{\textsf {N}}(0, 1)\), \({{\varvec{H}}}\) is a random unitary matrix distributed according to the Haar measure, and we have used the fact that the eigenvalues of \({{\varvec{U}}}_2 {{\varvec{M}}}_n^- {{\varvec{U}}}_2^{{{\textsf {T}}}}\) are the same as the eigenvalues of \({{\varvec{H}}}{{\varvec{U}}}_2 {{\varvec{M}}}_n^- {{\varvec{U}}}_2^{{{\textsf {T}}}}{{\varvec{H}}}^*\) since \({{\varvec{H}}}\) is unitary. Hence, the proof follows from the same argument of Lemma 3.

E Proof of Lemma 4 and Theorem 5

We start by proving a result similar to Lemma 4 for a general AMP iteration, where the function \(f_t({\hat{z}}; y)\) is generic.

Lemma 13

(State evolution for general AMP iteration) Let \({{\varvec{x}}}\in {\mathbb {R}}^d\) denote the unknown signal such that \(\left| \left| {{\varvec{x}}}\right| \right| _2 = \sqrt{d}\), \({{\varvec{A}}}= ({{\varvec{a}}}_1, \ldots , {{\varvec{a}}}_n)^{{{\textsf {T}}} }\)\(\in {\mathbb {R}}^{n\times d}\) with \(\{{{\varvec{a}}}_i\}_{1\le i\le n}\sim _{i.i.d.}{} {\textsf {N}} ({{\varvec{0}}}_d,{{\varvec{I}}}_d/d)\), and \({{\varvec{y}}}= (y_1, \ldots , y_n)\) with \(y_i\sim p(\cdot \mid \langle {{\varvec{x}}}, {{\varvec{a}}}_i \rangle )\). Consider the AMP iterates \({{\varvec{z}}}^t, \hat{{\varvec{z}}}^t\) defined in (126) for some function \(f_t({\hat{z}}; y)\), with \(\textsf {b} _t\) given by

$$\begin{aligned} \textsf {b} _t = \delta \cdot {{\mathbb {E}}}\{ f'_t(\mu _t G_0+\tau _t G_1;Y)\}\, , \end{aligned}$$
(204)

where the expectation is with respect to \(G_0,G_1\sim _{i.i.d.}{} {\textsf {N}} (0,1)\) and \(Y\sim p(\,\cdot \,\mid G_0)\). Assume that the initialization \({{\varvec{z}}}^0\) is independent of \({{\varvec{A}}}\) and that, almost surely,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{d}\langle {{\varvec{x}}},{{\varvec{z}}}^0\rangle = \mu _0 , \;\;\; \lim _{n\rightarrow \infty }\frac{1}{d}\Vert {{\varvec{z}}}^0\Vert ^2 = \mu _0^2+\tau _0^2 . \end{aligned}$$
(205)

Let the state evolution recursion \(\tau _t,\mu _t\) be defined as

$$\begin{aligned} \begin{aligned} \mu _{t+1}&=\delta \int _{{\mathbb {R}}} {{\mathbb {E}}}\big \{\partial _g p(y\mid X_0) f_t(\mu _t X_0+\tau _t G;y)\big \} \, {\mathrm{d}}y ,\\ \tau _{t+1}^2&= \delta \cdot {{\mathbb {E}}}\Big \{\big (f_t(\mu _t X_0+\tau _t G;Y)\big )^2\Big \} , \end{aligned} \end{aligned}$$
(206)

with initialization \(\mu _0\) and \(\tau _0\), where the expectation is taken with respect to \(X_0,G\sim _{i.i.d.}{} {\textsf {N}} (0,1)\). Then, for any t, and for any function \(\psi :{{\mathbb {R}}}^2\rightarrow {{\mathbb {R}}}\) such that \(|\psi ({{\varvec{u}}})-\psi ({{\varvec{v}}})|\le L(1+\Vert {{\varvec{u}}}\Vert _2+\Vert {{\varvec{v}}}\Vert _2)\Vert {{\varvec{u}}}-{{\varvec{v}}}\Vert _2\) for some \(L\in {\mathbb {R}}\), we have that, almost surely,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^n\psi (x_i,z^t_i) = {{\mathbb {E}}}\left\{ \psi (X_0, \mu _t X_0+ \tau _t G)\right\} . \end{aligned}$$
(207)

Proof

For \(g\in {{\mathbb {R}}}\), let \({{\mathcal {H}}}(\,\cdot \, ;g):[0,1]\rightarrow {{\mathbb {R}}}\cup \{+\infty ,-\infty \}\) be the generalized inverse of

$$\begin{aligned} {{\mathcal {F}}}(y\mid g)\equiv \int _{-\infty }^y p(y'\mid g)\,{\mathrm{d}}y', \end{aligned}$$

namely,

$$\begin{aligned} {{\mathcal {H}}}(w;g) \equiv \inf \big \{y\in {{\mathbb {R}}}:\;\; {{\mathcal {F}}}(y\mid g)\ge w\big \} . \end{aligned}$$
(208)

With this definition, the model \(y_i\sim p(\,\cdot \,\mid \langle {{\varvec{a}}}_i,{{\varvec{x}}}\rangle )\) is equivalent to \(y_i= {{\mathcal {H}}}(w_i;\langle {{\varvec{a}}}_i,{{\varvec{x}}}\rangle )\) for \(\{w_i\}_{1\le i\le n}\sim _{i.i.d.} {\mathrm{Unif}}([0,1])\) independent of \({{\varvec{A}}}\) and \({{\varvec{x}}}\). Let \({{\varvec{w}}}= (w_1, \ldots ,w_n)\in {{\mathbb {R}}}^n\) and denote by \([{{\varvec{v}}}_1\mid \cdots \mid {{\varvec{v}}}_k]\in {{\mathbb {R}}}^{m\times k}\) the matrix obtained by stacking column vectors \({{\varvec{v}}}_1, \ldots ,{{\varvec{v}}}_k\in {{\mathbb {R}}}^m\).

For \(t\ge 0\), define \({{\varvec{r}}}^t = {{\varvec{0}}}_d\), \(\hat{{\varvec{r}}}^t = {{\varvec{A}}}{{\varvec{x}}}\), and introduce the extended state variables \({{\varvec{s}}}^t \in {{\mathbb {R}}}^{d\times 2}\) and \(\hat{{\varvec{s}}}^t \in {{\mathbb {R}}}^{n\times 2}\), defined as

$$\begin{aligned} \begin{aligned} {{\varvec{s}}}^t&= [{{\varvec{z}}}^t\mid {{\varvec{r}}}^t],\\ \hat{{\varvec{s}}}^t&= [\hat{{\varvec{z}}}^t\mid \hat{{\varvec{r}}}^t]. \end{aligned} \end{aligned}$$
(209)

We further define the functions \(h_t =[h_{t, 1}\mid h_{t, 2}]:{{\mathbb {R}}}^{2}\times {{\mathbb {R}}}\rightarrow {{\mathbb {R}}}^{2}\) and \({\hat{h}}_t =[{\hat{h}}_{t, 1}\mid {\hat{h}}_{t, 2}]:{{\mathbb {R}}}^{2}\times {{\mathbb {R}}}\rightarrow {{\mathbb {R}}}^{2}\) by setting

$$\begin{aligned} \begin{aligned} h_t(s_1, s_2 ;x)&\equiv [s_1 \mid x\,] ,\\ {\hat{h}}_t({\hat{s}}_1, {\hat{s}}_2;w)&\equiv [f_t({\hat{s}}_1;{{\mathcal {H}}}(w;{\hat{s}}_2)) \mid \, 0 \,] . \end{aligned} \end{aligned}$$
(210)

With these notations, the iteration (126) is equivalent to

$$\begin{aligned} \begin{aligned} {{\varvec{s}}}^{t+1}&= {{\varvec{A}}}^{{{\textsf {T}}}} {\hat{h}}_t(\hat{{\varvec{s}}}^t;{{\varvec{w}}})- h_t({{\varvec{s}}}^t;{{\varvec{x}}})\hat{\textsf {B}}_t ,\\ \hat{{\varvec{s}}}^t&={{\varvec{A}}}h_t({{\varvec{s}}}^t;{{\varvec{x}}})- {\hat{h}}_{t-1}(\hat{{\varvec{s}}}^{t-1};{{\varvec{w}}}){\textsf {B}}_{t-1} , \end{aligned} \end{aligned}$$
(211)

where the functions \(h_t({{\varvec{s}}}^t;{{\varvec{x}}})\) and \({\hat{h}}_t(\hat{{\varvec{s}}}^t;{{\varvec{w}}})\) are understood to be applied component-wise to their arguments and \({\textsf {B}}_t,\hat{\textsf {B}}_t\in {{\mathbb {R}}}^{2\times 2}\) are defined by

$$\begin{aligned} \begin{aligned} (\hat{\textsf {B}}_t)_{j,k}&= \delta \cdot {{\mathbb {E}}}\left\{ \frac{\partial {\hat{h}}_{t, k}}{\partial {\hat{s}}_j}(\mu _t X_0+\tau _t G, X_0;W)\right\} ,\\ ({\textsf {B}}_t)_{j,k}&= \delta \cdot {{\mathbb {E}}}\left\{ \frac{\partial h_{t, k}}{\partial s_j}(\mu _t X_0+\tau _t G, 0;X_0)\right\} . \end{aligned} \end{aligned}$$
(212)

The iteration (211) satisfies the assumptions of [41][Proposition 5]. By applying that result, the claim follows. \(\square \)

At this point, first we present the proof of Lemma 4 and then of Theorem 5.

Proof (Proof of Lemma 4)

Consider the state evolution recursion defined in (124) with initialization \(\mu _0\). Let \(f_t\) be defined as in (127) with \(\textsf {F}\) given by (123). Suppose that, for any t, (206) holds with \(\mu _t = \tau _t^2\). Then, by Lemma 13, the claim immediately follows.

The remaining part of the proof is devoted to show that (206) holds with \(\mu _t = \tau _t^2\), for \(t\ge 0\). First, we prove by induction that \(\mu _t = \tau _t^2\), for \(t\ge 0\). The basis of the induction, i.e., \(\mu _0 =\tau _0^2\), is true by the hypothesis of the Lemma. Now, we assume that \(\mu _t = \tau _t^2\) and we show that \(\mu _{t+1} = \tau _{t+1}^2\). Set

$$\begin{aligned} Z = \mu _t X_0 + \tau _t G, \end{aligned}$$
(213)

and note that \(Z\sim {\textsf {N}}(0, \mu _t^2+\tau _t^2)\). Then, we can rewrite \(X_0\) as

$$\begin{aligned} X_0 = a Z + b {\widetilde{G}}, \end{aligned}$$

for some \(a, b\in {\mathbb {R}}\), where \({\widetilde{G}}\sim {\textsf {N}}(0, 1)\) and independent from Z. In order to compute the coefficients a and b, we evaluate \({\mathbb {E}}\{X_0^2\}\) and \({\mathbb {E}}\{X_0\cdot Z\}\), thus obtaining the equations

$$\begin{aligned} \begin{aligned} a^2(\mu _t^2+\tau _t^2) +b^2&= 1,\\ a(\mu _t^2+\tau _t^2)&=\mu _t, \end{aligned} \end{aligned}$$

which can be simplified as

$$\begin{aligned} \begin{aligned} a&= \frac{\mu _t}{\mu _t^2+\tau _t^2},\\ b&= \frac{\tau _t}{\sqrt{\mu _t^2+\tau _t^2}}.\\ \end{aligned} \end{aligned}$$

Furthermore, by using the inductive hypothesis \(\mu _{t}=\tau _t^2\) and that \(q_t = \mu _t/(1+\mu _t)\), we obtain that

$$\begin{aligned} X_0 = (1-q_t) \, Z + \sqrt{1-q_t}\, {\widetilde{G}}. \end{aligned}$$
(214)

Hence, the following chain of equalities holds:

(215)

where in (a) we use that \(Y\sim p(\cdot \mid X_0)\); in (b) we use (213) and (214); in (c) we condition with respect to Z; in (d) we use definition (127) of \(f_t\); in (e) we use again definition (127) of \(f_t\); and in (f) we use again (213) and (214).

Finally, we prove that \(\mu _{t+1}\) satisfies (206). Indeed, the following chain of equalities holds:

(216)

where in (a) we use (215); in (b) we set \(G_1 = {\widetilde{G}}\) and \(G_0=Z/\sqrt{\mu _t^2+\tau _t^2}\); and in (c) we use that \(\mu _{t}=\tau _t^2\) and that \(q_t = \mu _t/(1+\mu _t)\). \(\square \)

Proof (Proof of Theorem 5)

In view of Lemma 4, it is sufficient to show that \((q, \mu ) = (0, 0)\) is an attractive fixed point of the recursion (124).

First of all, let us check that \((q, \mu ) = (0, 0)\) is a fixed point. This happens if and only if

$$\begin{aligned} h(0) = \int _{{\mathbb {R}}} \frac{\big ({\mathbb {E}}_{G_1} \{\partial _{g} p(y\mid G_1)\}\big )^2}{{\mathbb {E}}_{G_1}\{p(y\mid G_1)\}}\, {\mathrm{d}}y = 0, \end{aligned}$$
(217)

which holds because of condition (131).

Let us now prove that this fixed point is stable. We start by rewriting the function h(q) defined in (125) as

$$\begin{aligned} h(q) = \int _{{\mathbb {R}}} {\mathbb {E}}_{G_0}\left\{ \frac{\left( h_{\mathrm{num}}(\sqrt{q}, y)\right) ^2}{h_{\mathrm{den}}(\sqrt{q}, y)}\right\} \, {\mathrm{d}}y , \end{aligned}$$
(218)

where

$$\begin{aligned} \begin{aligned} h_{\mathrm{num}}(x, y)&= {\mathbb {E}}_{G_1} \left\{ \partial _{g} p(y\mid x\cdot G_0+\sqrt{1-x^2}\, G_1)\right\} , \\ h_{\mathrm{den}}(x, y)&={\mathbb {E}}_{G_1}\left\{ p(y\mid x\cdot G_0+\sqrt{1-x^2}\, G_1)\right\} . \end{aligned} \end{aligned}$$
(219)

Note that \(h_{\mathrm{num}}(0, y)=0\) by assumption (131). Then,

$$\begin{aligned} \begin{aligned} h_{\mathrm{num}}(\sqrt{q}, y)&= \sqrt{q}\frac{\partial h_{\mathrm{num}}(x, y)}{\partial x}\bigg |_{x=0} + \frac{q}{2}\frac{\partial ^2 h_{\mathrm{num}}(x, y)}{\partial ^2 x}\bigg |_{x=x_1}, \\ h_{\mathrm{den}}(x, y)&=h_{\mathrm{den}}(0, y) + \sqrt{q}\frac{\partial h_{\mathrm{den}}(x, y)}{\partial x}\bigg |_{x=x_2}, \end{aligned} \end{aligned}$$
(220)

for some \(x_1, x_2 \in [0, \sqrt{q}]\). Furthermore, by applying Stein’s lemma, we have that

$$\begin{aligned} h_{\mathrm{num}}(x, y) = \frac{1}{\sqrt{1-x^2}}{\mathbb {E}}_{G_1} \left\{ G_1\cdot p(y\mid x\cdot G_0+\sqrt{1-x^2}\, G_1)\right\} . \end{aligned}$$
(221)

By using (221), we can rewrite (220) as

$$\begin{aligned} \begin{aligned} h_{\mathrm{num}}(\sqrt{q}, y)&= \sqrt{q}\, G_0 \cdot {\mathbb {E}}_{G_1} \{G_1\cdot \partial _g p(y\mid G_1)\} \\&\quad + \frac{q}{2}\frac{1}{(1-x_1^2)^{5/2}} {\mathbb {E}}_{G_1}\left\{ f_{\mathrm{num}}(G_0, G_1, x_1)\right\} , \\ h_{\mathrm{den}}(x, y)&={\mathbb {E}}_{G_1}\{p(y\mid G_1)\}+\sqrt{q} \,{\mathbb {E}}_{G_1} \left\{ f_{\mathrm{den}}(G_0, G_1, x_2)\right\} , \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned}&f_{\mathrm{num}} (G_0, G_1, x_1) = G_1 \Biggl ( (1+2x_1^2)\, p(y\mid x_1\cdot G_0+\sqrt{1-x_1^2}\, G_1)\\&\quad -\big (2G_0 \cdot x_1 (x_1^2-1)+G_1\sqrt{1-x_1^2}\,(1+2x_1^2)\big ) \partial _g p(y\mid x_1\cdot G_0+\sqrt{1-x_1^2}\, G_1) \\&\quad + (x_1^2-1)(G_0^2\,(x_1^2-1)-G_1^2 \,x_1^2+2G_0 G_1 x_1\sqrt{1-x_1^2})\partial ^2_g p(y\mid x_1\cdot G_0\\&\quad +\sqrt{1-x_1^2}\, G_1)\Biggr ),\\&f_{\mathrm{den}} (G_0, G_1, x_2) =\left( G_0-\frac{x_2}{\sqrt{1-x_2^2}}G_1\right) \cdot \partial _{g} p(y\mid x_2\cdot G_0+\sqrt{1-x_2^2}\, G_1). \end{aligned} \end{aligned}$$
(222)

By applying again Stein’s lemma and by using that the conditional density \(p(y\mid g)\) is bounded, we note that \({\mathbb {E}}_{G_1}\left\{ f_{\mathrm{num}}(G_0, G_1, x_1)\right\} \) and \({\mathbb {E}}_{G_1} \left\{ f_{\mathrm{den}}(G_0, G_1, x_2)\right\} \) are bounded. Hence, by dominated convergence, we obtain that

$$\begin{aligned} h(q) = q\cdot \int _{{\mathbb {R}}} \frac{\big ({\mathbb {E}}_{G_1} \{\partial _g^2 p(y\mid G_1)\}\big )^2}{{\mathbb {E}}_{G_1}\{p(y\mid G_1)\}}\, {\mathrm{d}}y + o(q). \end{aligned}$$

Therefore, in a neighborhood of the fixed point we have

$$\begin{aligned} \begin{aligned} q_t&= \mu _t + o(\mu _t) ,\\ \mu _{t+1}&= \delta \cdot q_t\cdot \int _{{\mathbb {R}}} \frac{\big ({\mathbb {E}}_{G_1} \{\partial _g^2 p(y\mid G_1)\}\big )^2}{{\mathbb {E}}_{G_1}\{p(y\mid G_1)\}}\, {\mathrm{d}}y + o(q_t) . \end{aligned} \end{aligned}$$
(223)

Furthermore, by applying twice Stein’s lemma, we also have that

$$\begin{aligned} {\mathbb {E}}_{G_1} \{\partial _g^2 p(y\mid G_1)\} = {\mathbb {E}}_{G_1} \{p(y\mid G_1)(G_1^2-1)\}. \end{aligned}$$
(224)

By using (223), (224) and by recalling definition (42) of \(\delta _{\mathrm{u}}\), we conclude that

$$\begin{aligned} \begin{aligned} q_t&= \mu _t + o(\mu _t) ,\\ \mu _{t+1}&= \frac{\delta }{\delta _{\mathrm{u}}}\, q_t + o(q_t). \end{aligned} \end{aligned}$$
(225)

As \(\delta < \delta _{\mathrm{u}}\), the fixed point is stable. \(\square \)

F Proof of Lemma 5 and Theorem 6

For the proofs in this section, it is convenient to introduce the function

$$\begin{aligned} \textsf {G}(x,y;{\bar{q}}) = \frac{{\mathbb {E}}_G\{\partial _g^2 p(y\mid {\bar{q}}\, x+\sqrt{{\bar{q}}}G )\}}{{\mathbb {E}}_G\{ p(y\mid {\bar{q}}\, x+\sqrt{{\bar{q}}}G )\}} -\left( \frac{{\mathbb {E}}_G\{\partial _{g}p(y\mid {\bar{q}}\, x+\sqrt{{\bar{q}}}G )\}}{{\mathbb {E}}_G\{ p(y\mid {\bar{q}}\, x+\sqrt{{\bar{q}}}G )\}}\right) ^2 .\nonumber \\ \end{aligned}$$
(226)

First, we present the proof of Lemma 5 and then of Theorem 6.

Proof (Proof of Lemma 5)

Condition (131) implies that

$$\begin{aligned} \textsf {F}(0, y;1)=0. \end{aligned}$$
(227)

Furthermore, we have that

(228)

where in (a) we use (227) and definition (226) of \(\textsf {G}(0, y;1)\) and in (b) we use the fact that y has density \({\mathbb {E}}_G\{ p(y\mid G)\}\).

Denote by \(\textsf {F}'(x,y;{\bar{q}})\) the derivative of \(\textsf {F}\) with respect to its first argument. Then, we have

$$\begin{aligned} \textsf {F}'(x,y;{\bar{q}}) = {\bar{q}}\textsf {G}(x,y;{\bar{q}}) . \end{aligned}$$
(229)

Hence,

$$\begin{aligned} \textsf {b}_t= & {} \delta \cdot (1-q_t)\cdot {{\mathbb {E}}}\{ \textsf {G}(\mu _t G_0+\sqrt{\mu _t} G_1, Y; 1-q_t)\} = \delta \cdot {{\mathbb {E}}}\{ \textsf {G}(0;Y;1)\} + o_{q_t}(1) \nonumber \\= & {} o_{q_t}(1) . \end{aligned}$$
(230)

By using (229) and (230), we linearize the recursion (126) around the fixed point \({{\varvec{z}}}^t={{\varvec{0}}}_d\) and \(\hat{{\varvec{z}}}^{t-1}={{\varvec{0}}}_n\) as

$$\begin{aligned} {{\varvec{z}}}^{t+1}&= {{\varvec{A}}}^{{{\textsf {T}}}} {{\varvec{J}}}\hat{{\varvec{z}}}^t +o_{q_t}(1) (\Vert {{\varvec{z}}}^t\Vert _2+\Vert \hat{{\varvec{z}}}^t\Vert _2) +o (\Vert \hat{{\varvec{z}}}^t\Vert _2)\, , \end{aligned}$$
(231)
$$\begin{aligned} \hat{{\varvec{z}}}^t&= {{\varvec{A}}}{{\varvec{z}}}^t- {{\varvec{J}}}\hat{{\varvec{z}}}^{t-1} +o_{q_t}(1) \, \Vert \hat{{\varvec{z}}}^{t-1}\Vert _2+o (\Vert \hat{{\varvec{z}}}^{t-1}\Vert _2) , \end{aligned}$$
(232)

where \({{\varvec{J}}}\in {\mathbb {R}}^{n\times n}\) is a diagonal matrix with entries \(j_{i} =\textsf {F}'(0,y_i;1)\) for \(i\in [n]\). By substituting expression (232) for \(\hat{{\varvec{z}}}^t\) into the RHS of (231), the result follows. \(\square \)

Proof (Proof of Theorem 6)

By definition, \(\alpha \) is an eigenvalue of \({{\varvec{L}}}_n\) if and only if

$$\begin{aligned} \mathrm{det}({{\varvec{L}}}_n -\alpha {{\varvec{I}}}_{n+d})=0. \end{aligned}$$
(233)

Recall that, when \({{\varvec{D}}}\) is invertible,

$$\begin{aligned} \mathrm{det}\left( \begin{array}{cc} {{\varvec{A}}}&{} {{\varvec{B}}}\\ {{\varvec{C}}}&{} {{\varvec{D}}}\\ \end{array}\right) = \mathrm{det}({{\varvec{D}}}) \cdot \mathrm{det}({{\varvec{A}}}- {{\varvec{B}}}{{\varvec{D}}}^{-1}{{\varvec{C}}}). \end{aligned}$$
(234)

Then, after some calculations, we obtain that (233) is equivalent to

$$\begin{aligned} \alpha ^d \cdot \mathrm{det}(-{{\varvec{J}}}-\alpha {{\varvec{I}}}_n)\cdot \mathrm{det}({{\varvec{I}}}_d-{{\varvec{A}}}^{{{\textsf {T}}}} ( {{\varvec{I}}}_n +\alpha {{\varvec{J}}}^{-1})^{-1} {{\varvec{A}}})=0. \end{aligned}$$
(235)

From (235), we immediately deduce that the eigenvalues of \({{\varvec{L}}}_n\) are real if and only if all the solutions to

$$\begin{aligned} \mathrm{det}({{\varvec{I}}}_d-{{\varvec{A}}}^{{{\textsf {T}}}} ( {{\varvec{I}}}_n +\alpha {{\varvec{J}}}^{-1})^{-1} {{\varvec{A}}})=0 \end{aligned}$$
(236)

are real. We will prove that in fact this equation does not have any solution for \(\alpha \in {{\mathbb {C}}}{\setminus } {{\mathbb {R}}}\).

Let \({{\varvec{U}}}{\varvec{\varSigma }}{{\varvec{V}}}^{{{\textsf {T}}}}\) be the SVD of \({{\varvec{A}}}\). Then, (236) is equivalent to

$$\begin{aligned} \mathrm{det}({{\varvec{U}}}{\varvec{\varSigma }}^{-2}-( {{\varvec{I}}}_n +\alpha {{\varvec{J}}}^{-1})^{-1} {{\varvec{U}}})=0. \end{aligned}$$

Using the fact that \(\mathrm{det}({\varvec{\varSigma }})\ne 0\), and \(\mathrm{det}( {{\varvec{I}}}_n +\alpha {{\varvec{J}}}^{-1})\ne 0\) for \(\alpha \in {{\mathbb {C}}}{\setminus } {{\mathbb {R}}}\), Eq. (236) is equivalent to

$$\begin{aligned} \mathrm{det}( ({{\varvec{I}}}_n +\alpha {{\varvec{J}}}^{-1}){{\varvec{U}}}- {{\varvec{U}}}{\varvec{\varSigma }}^2)=0 , \end{aligned}$$

or equivalently

$$\begin{aligned} \mathrm{det}( {{\varvec{I}}}_n +\alpha {{\varvec{J}}}^{-1}- {{\varvec{A}}}{{\varvec{A}}}^{{{\textsf {T}}}})=\mathrm{det}( {{\varvec{I}}}_n +\alpha {{\varvec{J}}}^{-1}- {{\varvec{U}}}{\varvec{\varSigma }}^2{{\varvec{U}}}^{{{\textsf {T}}}})=0 . \end{aligned}$$

Given that the solutions of this equations are generalized eigenvalues for the pairs of symmetric matrices \({{\varvec{A}}}{{\varvec{A}}}^{{{\textsf {T}}}}-{{\varvec{I}}}_n\) and \({{\varvec{J}}}^{-1}\), they must be real. We conclude that the eigenvalues of \({{\varvec{L}}}_n\) are real.

Note that

(237)

where in (a) we use that \(\textsf {F}(0, 1; y)=0\) as (131) holds; in (b) we apply twice Stein’s lemma; and in (c) we use definition (45) of \({\mathcal {T}}^*\). Then, (236) can be rewritten as

$$\begin{aligned} \mathrm{det}\left( {{\varvec{I}}}_d - \sum _{i=1}^n \frac{{\mathcal {T}}^*(y_i)}{{\mathcal {T}}^*(y_i) + \alpha (1-{\mathcal {T}}^*(y_i))} {{\varvec{a}}}_i {{\varvec{a}}}_i^{{{\textsf {T}}}}\right) =0. \end{aligned}$$
(238)

Let \(\lambda _1^{{{\varvec{D}}}^*_n}(\alpha )\) be the largest eigenvalue of the matrix \({{\varvec{D}}}^*_n(\alpha )\) defined as

$$\begin{aligned} {{\varvec{D}}}^*_n(\alpha ) = \sum _{i=1}^n \frac{{\mathcal {T}}^*(y_i)}{{\mathcal {T}}^*(y_i) + \alpha (1-{\mathcal {T}}^*(y_i))} {{\varvec{a}}}_i {{\varvec{a}}}_i^{{{\textsf {T}}}}. \end{aligned}$$
(239)

Note that, as \(\alpha \rightarrow +\infty \), the entries of \({{\varvec{D}}}^*_n(\alpha )\) tend to 0 with high probability. Since the eigenvalues of a matrix are continuous functions of the elements of the matrix, we also obtain that

$$\begin{aligned} \lim _{\alpha \rightarrow +\infty }\lambda _1^{{{\varvec{D}}}^*_n}(\alpha ) = 0. \end{aligned}$$

Hence, if there exists \(\bar{\alpha }>1\) such that \(\lambda _1^{{{\varvec{D}}}^*_n}(\bar{\alpha })>1\), then there exists also \(\bar{\alpha }_0> \bar{\alpha } > 1\) such that \(\lambda _1^{{{\varvec{D}}}^*_n}(\bar{\alpha }_0)=1\). Consequently, there exists \(\alpha > 1\) that satisfies (238), which implies the result of the theorem.

The rest of the proof consists in showing that \(\bar{\alpha }= \sqrt{\delta /\delta _{\mathrm{u}}}\) satisfies the desired requirements. First of all, note that \(\sqrt{\delta /\delta _{\mathrm{u}}}>1\), as \(\delta >\delta _{\mathrm{u}}\). Furthermore, we have that

$$\begin{aligned} {{\varvec{D}}}^*_n(\bar{\alpha }) = \sum _{i=1}^n {\mathcal {T}}_{\delta }^*(y_i) {{\varvec{a}}}_i {{\varvec{a}}}_i^{{{\textsf {T}}}}, \end{aligned}$$
(240)

where \({\mathcal {T}}_{\delta }^*\) is defined in (44). Recall that, by hypothesis, \({{\varvec{x}}}\) is such that \(\left| \left| {{\varvec{x}}}\right| \right| _2 = \sqrt{d}\) and \(\{{{\varvec{a}}}_i\}_{1\le i\le n}\sim _{i.i.d.}{\textsf {N}}({\varvec{0}}_d,{{\varvec{I}}}_d/d)\). Let \({\tilde{{{\varvec{x}}}}} = {{\varvec{x}}}/\sqrt{d}\) and \(\tilde{{{\varvec{a}}}_i} =\sqrt{d}\cdot {{\varvec{a}}}_i\). Then, \(\langle {{\varvec{x}}}, {{\varvec{a}}}_i \rangle = \langle {\tilde{{{\varvec{x}}}}}, {\tilde{{{\varvec{a}}}}}_i \rangle \). Let \(\lambda _1^{{\widetilde{{{\varvec{D}}}}}_n}\) be the largest eigenvalue of the matrix \({\widetilde{{{\varvec{D}}}}}_n\) defined as

$$\begin{aligned} {\widetilde{{{\varvec{D}}}}}_n = \frac{1}{n}\sum _{i=1}^n {\mathcal {T}}_\delta ^*(y_i) {\tilde{{{\varvec{a}}}}}_i {\tilde{{{\varvec{a}}}}}_i^{{{\textsf {T}}}}. \end{aligned}$$
(241)

Since \({\widetilde{{{\varvec{D}}}}}_n = {{\varvec{D}}}^*_n(\bar{\alpha })/\delta \), it remains to prove that

(242)

To do so, we apply a result analogous to that of Lemma 2 for the real case with \({\mathcal {T}}={\mathcal {T}}_\delta ^*\). For the moment, assume that \({\mathcal {T}}_\delta ^*\) fulfills the hypotheses of Lemma 2 (we will prove later that this is the case). Then, \(\lambda _1^{{\widetilde{{{\varvec{D}}}}}_n}\) converges almost surely to \(\zeta _{\delta }(\lambda _{\delta }^*)\).

Recall that

$$\begin{aligned} \zeta _{\delta }(\lambda ) = \psi _{\delta }(\max (\lambda , \bar{\lambda }_\delta )), \end{aligned}$$

where \(\bar{\lambda }_\delta \) is the point of minimum of the convex function \(\psi _{\delta }(\lambda )\) defined as

$$\begin{aligned} \psi _{\delta }(\lambda ) = \lambda \left( \frac{1}{\delta }+{{\mathbb {E}}}\left\{ \frac{{\mathcal {T}}_\delta ^*(Y)}{\lambda -{\mathcal {T}}_\delta ^*(Y)}\right\} \right) . \end{aligned}$$

Notice also that this minimum is the unique local minimizer since \(\psi _{\delta }\) is convex and analytic.

Furthermore, \(\lambda _{\delta }^*\) is the unique solution to the equation \(\zeta _{\delta }(\lambda _{\delta }^*) = \phi (\lambda _{\delta }^*)\), where \(\phi (\lambda )\) is defined as

$$\begin{aligned} \phi (\lambda ) = \lambda \cdot {{\mathbb {E}}}\left\{ \frac{{\mathcal {T}}_\delta ^*(Y)\cdot G^2}{\lambda -{\mathcal {T}}_\delta ^*(Y)}\right\} . \end{aligned}$$

By setting the derivative of \(\psi _{\delta }(\lambda )\) to 0, we have that

$$\begin{aligned} {{\mathbb {E}}}\left\{ \frac{({\mathcal {T}}_\delta ^*(Y))^2}{(\bar{\lambda }_\delta -{\mathcal {T}}_\delta ^*(Y))^2}\right\} =\frac{1}{\delta }. \end{aligned}$$

By using definition (44) of \({\mathcal {T}}^*_\delta \) and definition (45) of \({\mathcal {T}}^*\), we verify that

$$\begin{aligned} \frac{{\mathcal {T}}_\delta ^*(Y)}{1-{\mathcal {T}}_\delta ^*(Y)} =\sqrt{ \frac{\delta _{\mathrm{u}}}{\delta }}\, \frac{{\mathbb {E}}_G\{p(y\mid G)(G^2-1)\}}{{\mathbb {E}}_G\{ p(y\mid G)\}}. \end{aligned}$$
(243)

Hence, by using definition (42) of \(\delta _{\mathrm{u}}\), we obtain that

$$\begin{aligned} {{\mathbb {E}}}\left\{ \frac{({\mathcal {T}}_\delta ^*(Y))^2}{(1-{\mathcal {T}}_\delta ^*(Y))^2}\right\} = \frac{\delta _{\mathrm{u}}}{\delta }\int _{{\mathbb {R}}} \frac{\left( {\mathbb {E}}_G\{p(y\mid G)(G^2-1)\}\right) ^2}{{\mathbb {E}}_G\{ p(y\mid G)\}}\,\mathrm{d}y = \frac{1}{\delta }, \end{aligned}$$

which immediately implies that

$$\begin{aligned} \bar{\lambda }_\delta = 1. \end{aligned}$$
(244)

By using (243), one also obtains that

$$\begin{aligned} {{\mathbb {E}}}\left\{ \frac{{\mathcal {T}}_\delta ^*(Y)}{1-{\mathcal {T}}_\delta ^*(Y)}\right\} =\sqrt{\frac{\delta _{\mathrm{u}}}{\delta }}\int _{{\mathbb {R}}}{\mathbb {E}}_G\{p(y\mid G)(G^2-1)\}\,\mathrm{d}y=\sqrt{\frac{\delta _{\mathrm{u}}}{\delta }}\,\, {\mathbb {E}}_G\{G^2-1\}=0, \end{aligned}$$

which implies that

$$\begin{aligned} \psi _{\delta }(1) = \frac{1}{\delta }. \end{aligned}$$
(245)

Furthermore, we have that

$$\begin{aligned} {{\mathbb {E}}}\left\{ \frac{{\mathcal {T}}_\delta ^*(Y)(G^2-1)}{1-{\mathcal {T}}_\delta ^*(Y)}\right\} = \sqrt{\frac{\delta _{\mathrm{u}}}{\delta }}\int _{{\mathbb {R}}} \frac{\left( {\mathbb {E}}_G\{p(y\mid G)(G^2-1)\}\right) ^2}{{\mathbb {E}}_G\{ p(y\mid G)\}}\,\mathrm{d}y = \frac{1}{\sqrt{\delta \cdot \delta _{\mathrm{u}}}} >\frac{1}{\delta }, \end{aligned}$$

which implies that

$$\begin{aligned} \phi (1) = \frac{1}{\sqrt{\delta \cdot \delta _{\mathrm{u}}}} > \frac{1}{\delta }, \end{aligned}$$
(246)

as \(\delta > \delta _{\mathrm{u}}\). By putting (244), (245), and (246) together, we obtain that

$$\begin{aligned} \phi (\bar{\lambda }_\delta )>\zeta _\delta (\bar{\lambda }_\delta ). \end{aligned}$$
(247)

Recall that \(\zeta _\delta (\lambda )\) is monotone non-decreasing and \(\phi (\lambda )\) is monotone non-increasing. Consequently, (247) implies that \(\lambda _{\delta }^* > \bar{\lambda }_\delta \). Thus, we conclude that

$$\begin{aligned} \begin{aligned} \lim _{n\rightarrow \infty } \lambda _1^{{\widetilde{{{\varvec{D}}}}}_n}&= \zeta _{\delta }(\lambda ^*_{\delta }) =\psi _{\delta }(\lambda _{\delta }^*) \\&>\psi _{\delta }(\bar{\lambda }_{\delta }) =\psi _{\delta }(1)=\frac{1}{\delta } . \end{aligned} \end{aligned}$$
(248)

Now, we show that \({\mathcal {T}}_\delta ^*\) fulfills the hypotheses of Lemma 2 by using arguments similar to those at the end of the proof of Theorem 2. First of all, since \({\mathcal {T}}^*(y)\le 1\), we have that \({\mathcal {T}}_\delta ^*(y)\) is bounded. Furthermore, if \({\mathcal {T}}_\delta ^*(y)\) is equal to the constant value 0, then \(\delta _{\mathrm{u}}=\infty \) and the claim of Theorem 6 trivially holds. Hence, we can assume that \({\mathbb {P}}({\mathcal {T}}_\delta ^*(Y)=0)<1\). Let \(\tau \) be the supremum of the support of \({\mathcal {T}}_\delta ^*(Y)\). If \({\mathbb {P}}({\mathcal {T}}_\delta ^*(Y)=\tau )>0\), then the condition (82) is satisfied and the proof is complete. Otherwise, for any \(\epsilon _1 >0\), there exists \({\varDelta }_1(\epsilon _1)\) such that Eq. (115) holds. Define \({\mathcal {T}}_{\delta }^*(y, \epsilon _1)\) as in (116). Clearly, the random variable \({\mathcal {T}}_{\delta }^*(Y, \epsilon _1)\) has a point mass at \(\delta \); hence, condition (82) is satisfied. As a final step, we show that we can take \(\epsilon _1 \downarrow 0\). Define

$$\begin{aligned} {\widetilde{{{\varvec{D}}}}}_n(\epsilon _1) = \frac{1}{n}\sum _{i=1}^n {\mathcal {T}}_{\delta }^*(y_i, \epsilon _1) {\tilde{{{\varvec{a}}}}}_i {\tilde{{{\varvec{a}}}}}_i^*. \end{aligned}$$

Then,

$$\begin{aligned} \left| \left| {\widetilde{{{\varvec{D}}}}}_n(\epsilon _1)-{\widetilde{{{\varvec{D}}}}}_n\right| \right| _{\mathrm{op}} \le C_1 \cdot {\varDelta }_1(\epsilon _1), \end{aligned}$$
(249)

where the constant \(C_1\) depends only on n / d. Consequently, by using (249) and Weyl’s inequality, we conclude that

$$\begin{aligned} |\lambda _1^{{\widetilde{{{\varvec{D}}}}}_n(\epsilon _1)}-\lambda _1^{{\widetilde{{{\varvec{D}}}}}_n}|\le C_1 \cdot {\varDelta }_1(\epsilon _1). \end{aligned}$$
(250)

Hence, for any n, as \(\epsilon _1\) tends to 0, the largest eigenvalue of \({\widetilde{{{\varvec{D}}}}}_n(\epsilon _1)\) tends to the largest eigenvalue of \({\widetilde{{{\varvec{D}}}}}_n\), which concludes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mondelli, M., Montanari, A. Fundamental Limits of Weak Recovery with Applications to Phase Retrieval. Found Comput Math 19, 703–773 (2019). https://doi.org/10.1007/s10208-018-9395-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-018-9395-y

Keywords

Mathematics Subject Classification

Navigation