×

Optimal estimation of high-dimensional Gaussian location mixtures. (English) Zbl 1539.62090

Summary: This paper studies the optimal rate of estimation in a finite Gaussian location mixture model in high dimensions without separation conditions. We assume that the number of components \(k\) is bounded and that the centers lie in a ball of bounded radius, while allowing the dimension \(d\) to be as large as the sample size \(n\). Extending the one-dimensional result of P. Heinrich and J. Kahn [Ann. Stat. 46, No. 6A, 2844–2870 (2018; Zbl 1420.62215)], we show that the minimax rate of estimating the mixing distribution in Wasserstein distance is \(\Theta ({(d/n)^{1/4}}+{n^{-1/(4k-2)}})\), achieved by an estimator computable in time \(O(n{d^2}+{n^{5/4}})\). Furthermore, we show that the mixture density can be estimated at the optimal parametric rate \(\Theta (\sqrt{d/n})\) in Hellinger distance and provide a computationally efficient algorithm to achieve this rate in the special case of \(k=2\).
Both the theoretical and methodological development rely on a careful application of the method of moments. Central to our results is the observation that the information geometry of finite Gaussian mixtures is characterized by the moment tensors of the mixing distribution, whose low-rank structure can be exploited to obtain a sharp local entropy bound.

MSC:

62G07 Density estimation
62C20 Minimax procedures in statistical decision theory
62H12 Estimation in multivariate analysis

Citations:

Zbl 1420.62215

Software:

CVXPY; POT; CVXOPT

References:

[1] ABRAMOWITZ, M. and STEGUN, I. A. (1964). Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. Dover Publications, Inc., New York. · Zbl 0171.38503
[2] ACHARYA, J., JAFARPOUR, A., ORLITKSY, A. and SURESH, A. T. (2014). Near-optimal sample estimators for spherical Gaussian mixtures. In Neural Information Processing Systems 1395-1403. Curran Associates, Red Hook.
[3] ACHLIOPTAS, D. and MCSHERRY, F. (2005). On spectral learning of mixtures of distributions. In International Conference on Computational Learning Theory 458-469. Springer, Berlin. · Zbl 1137.68512
[4] ANANDKUMAR, A., HSU, D. J. and KAKADE, S. M. (2012). A method of moments for mixture models and hidden Markov models. In Conference on Learning Theory.
[5] ANDERSEN, M. S., DAHL, J. and VANDENBERGHE, L. (2013). CVXOPT: A Python package for convex optimization.
[6] ARORA, S. and KANNAN, R. (2005). Learning mixtures of separated nonspherical Gaussians. Ann. Appl. Probab. 15 69-92. · Zbl 1059.62062
[7] ASHTIANI, H., BEN-DAVID, S., HARVEY, N. J. A., LIAW, C., MEHRABIAN, A. and PLAN, Y. (2018). Near-optimal sample complexity bounds for robust learning of Gaussian mixtures via compression schemes. Adv. Neural Inf. Process. Syst..
[8] BANACH, S. (1938). Über homogene polynome in \[({L^2})\]. Studia Math. 7 36-44. · Zbl 0018.21902
[9] BANDEIRA, A. S., RIGOLLET, P. and WEED, J. (2017). Optimal rates of estimation for multi-reference alignment. Preprint. Available at arXiv:1702.08546.
[10] BAYRAKTAR, E. and GUO, G. (2021). Strong equivalence between metrics of Wasserstein type. Electron. Commun. Probab. 26 1-13. · Zbl 1483.90108 · doi:10.1214/21-ECP383
[11] BELKIN, M. and SINHA, K. (2010). Toward learning Gaussian mixtures with arbitrary separation. In 23rd Annual Conference on Learning Theory (COLT) 407-419.
[12] BIRGÉ, L. (1983). Approximation dans les espaces métriques et théorie de l’estimation. Z. Für Wahrscheinlichkeitstheorie und Verw. Geb. 65 181-237. · Zbl 0506.62026
[13] BIRGÉ, L. (1986). On estimating a density using Hellinger distance and some other strange facts. Probab. Theory Related Fields 71 271-291. · Zbl 0561.62029
[14] BONTEMPS, D. and GADAT, S. (2014). Bayesian methods for the shape invariant model. Electron. J. Stat. 8 1522-1568. · Zbl 1297.62069 · doi:10.1214/14-EJS933
[15] CHEN, J. (1995). Optimal rate of convergence for finite mixture models. Ann. Statist. 23 221-233. · Zbl 0821.62023 · doi:10.1214/aos/1176324464
[16] COMON, P., GOLUB, G., LIM, L.-H. and MOURRAIN, B. (2008). Symmetric tensors and symmetric tensor rank. SIAM J. Matrix Anal. Appl. 30 1254-1279. · Zbl 1181.15014
[17] DACUNHA-CASTELLE, D. and GASSIAT, E. (1997). The estimation of the order of a mixture model. Bernoulli 3 279-299. · Zbl 0889.62012
[18] DASGUPTA, S. (1999). Learning mixtures of Gaussians. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science 634. IEEE Comput. Soc., USA.
[19] DAVIS, C. and KAHAN, W. M. (1970). The rotation of eigenvectors by a peturbation. SIAM J. Numer. Anal. 7. · Zbl 0198.47201
[20] DESHPANDE, I., HU, Y.-T., SUN, R., PYRROS, A., SIDDIQUI, N., KOYEJO, S., ZHAO, Z., FORSYTH, D. and SCHWING, A. (2019). Max-sliced Wasserstein distance and its use for GANs. Preprint. Available at arXiv:1904.05877.
[21] DIAMOND, S. and BOYD, S. (2016). CVXPY: A Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17 1-5. · Zbl 1360.90008
[22] DOSS, N., WU, Y., YANG, P. and ZHOU, H. H. (2023). Supplement to “Optimal estimation of high-dimensional Gaussian location mixtures.” https://doi.org/10.1214/22-AOS2207SUPP
[23] FELDMAN, J., SERVEDIO, R. A. and O’DONNELL, R. (2006). PAC learning axis-aligned mixtures of Gaussians with no separation assumption. In Learning Theory (G. Lugosi and H. U. Simon, eds.) 20-34. Springer, Berlin, Heidelberg. · Zbl 1143.68533
[24] FLAMARY, R. and COURTY, N. (2017). POT: Python optimal transport library.
[25] FRIEDLAND, S. and LIM, L.-H. (2018). Nuclear norm of higher-order tensors. Math. Comp. 87 1255-1281. · Zbl 1383.15018 · doi:10.1090/mcom/3239
[26] GASSIAT, E. and VAN HANDEL, R. (2012). Consistent order estimation and minimal penalties. IEEE Trans. Inf. Theory 59 1115-1128. · Zbl 1364.62078
[27] GASSIAT, E. and VAN HANDEL, R. (2014). The local geometry of finite mixtures. Trans. Amer. Math. Soc. 366 1047-1072. · Zbl 1291.52004
[28] Genovese, C. R. and Wasserman, L. (2000). Rates of convergence for the Gaussian mixture sieve. Ann. Statist. 28 1105-1127. · Zbl 1105.62333 · doi:10.1214/aos/1015956709
[29] GHOSAL, S., GHOSH, J. K. and RAMAMOORTHI, R. V. (1999). Posterior consistency of Dirichlet mixtures in density estimation. Ann. Statist. 27 143-158. · Zbl 0932.62043 · doi:10.1214/aos/1018031105
[30] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500-531. · Zbl 1105.62315 · doi:10.1214/aos/1016218228
[31] GHOSAL, S. and VAN DER VAART, A. (2007). Posterior convergence rates of Dirichlet mixtures at smooth densities. Ann. Statist. 35 697-723. · Zbl 1117.62046 · doi:10.1214/009053606000001271
[32] GHOSAL, S. and VAN DER VAART, A. W. (2001). Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Statist. 1233-1263. · Zbl 1043.62025
[33] GUHA, A., HO, N. and NGUYEN, X. (2021). On posterior contraction of parameters and interpretability in Bayesian mixture modeling. Bernoulli 27 2159-2188. · Zbl 1473.62218 · doi:10.3150/20-BEJ1275
[34] HANSEN, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 1029-1054. · Zbl 0502.62098
[35] Hardt, M. and Price, E. (2015). Tight bounds for learning a mixture of two gaussians. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing 753-760. · Zbl 1321.68405
[36] HARDT, M. and PRICE, E. (2015). Tight bounds for learning a mixture of two Gaussians. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing 753-760. ACM, New York.
[37] HARTIGAN, J. A. (1985). A failure of likelihood asymptotics for normal mixtures. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer 2 807-810. · Zbl 1373.62070
[38] Heinrich, P. and Kahn, J. (2018). Strong identifiability and optimal minimax rates for finite mixture estimation. Ann. Statist. 46 2844-2870. · Zbl 1420.62215 · doi:10.1214/17-AOS1641
[39] HO, N. and NGUYEN, X. (2016). Convergence rates of parameter estimation for some weakly identifiable finite mixtures. Ann. Statist. 44 2726-2755. · Zbl 1359.62076 · doi:10.1214/16-AOS1444
[40] HO, N. and NGUYEN, X. (2016). On strong identifiability and convergence rates of parameter estimation in finite mixtures. Electron. J. Stat. 10 271-307. · Zbl 1332.62095 · doi:10.1214/16-EJS1105
[41] HOPKINS, S. B. and LI, J. (2018). Mixture models, robustness, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. STOC 2018 1021-1034. Assoc. Comput. Mach., New York, NY, USA. · Zbl 1428.68240 · doi:10.1145/3188745.3188748
[42] HSU, D. and KAKADE, S. M. (2013). Learning mixtures of spherical Gaussians: Moment methods and spectral decompositions. In Fourth Innovations in Theoretical Computer Science.
[43] IBRAGIMOV, I. (2001). Estimation of analytic functions. In State of the Art in Probability and Statistics. Lecture Notes-Monograph Series 36 359-383. IMS, Beachwood, OH. · Zbl 1373.62130 · doi:10.1214/lnms/1215090078
[44] JIN, C., ZHANG, Y., BALAKRISHNAN, S., WAINWRIGHT, M. J. and JORDAN, M. I. (2016). Local maxima in the likelihood of Gaussian mixture models: Structural results and algorithmic consequences. Adv. Neural Inf. Process. Syst. 29.
[45] Kalai, A. T., Moitra, A. and Valiant, G. (2010). Efficiently learning mixtures of two Gaussians. In Proceedings of the forty-second ACM symposium on Theory of computing 553-562. · Zbl 1293.68229
[46] KANNAN, R., SALMASIAN, H. and VEMPALA, S. (2005). The spectral method for general mixture models. In 18th Annual Conference on Learning Theory (COLT) 444-457. · Zbl 1137.68543
[47] KIM, A. K. H. (2014). Minimax bounds for estimation of normal mixtures. Bernoulli 20 1802-1818. · Zbl 1320.62082
[48] KOENKER, R. and MIZERA, I. (2014). Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. J. Amer. Statist. Assoc. 109 674-685. · Zbl 1367.62020
[49] Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev. 51 455-500. · Zbl 1173.65029 · doi:10.1137/07070111X
[50] KRUSKAL, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl. 18 95-138. · Zbl 0364.15021
[51] LE CAM, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38-53. · Zbl 0255.62006
[52] LI, J. and SCHMIDT, L. (2017). Robust and proper learning for mixtures of Gaussians via systems of polynomial inequalities. In Proceedings of the 2017 Conference on Learning Theory (S. Kale and O. Shamir, eds.). Proceedings of Machine Learning Research 65 1302-1382. PMLR, Amsterdam, Netherlands.
[53] LI, P. and CHEN, J. (2010). Testing the order of a finite mixture. J. Amer. Statist. Assoc. 105. · Zbl 1390.62024
[54] LINDSAY, B. G. (1989). Moment matrices: Applications in mixtures. Ann. Statist. 17 722-740. · Zbl 0672.62063
[55] Liu, X. and Shao, Y. (2003). Asymptotics for likelihood ratio tests under loss of identifiability. Ann. Statist. 31 807-832. · Zbl 1032.62014 · doi:10.1214/aos/1056562463
[56] LÖFFLER, M., ZHANG, A. Y. and ZHOU, H. H. (2019). Optimality of spectral clustering for Gaussian mixture model. Ann. Statist.. Available at arXiv:1911.00538. To Appear.
[57] LUGOSI, G. and MENDELSON, S. (2019). Sub-Gaussian estimators of the mean of a random vector. Ann. Statist. 47 783-794. · Zbl 1417.62192 · doi:10.1214/17-AOS1639
[58] MAUGIS, C. and MICHEL, B. (2011). A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM Probab. Stat. 15 41-68. · Zbl 1395.62162
[59] MILLER, J. W. and HARRISON, M. T. (2014). Inconsistency of Pitman-Yor process mixtures for the number of components. J. Mach. Learn. Res. 15 3333-3370. · Zbl 1319.62100
[60] MOITRA, A. and VALIANT, G. (2010). Settling the polynomial learnability of mixtures of Gaussians. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science 93-102.
[61] Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. Ann. Statist. 41 370-400. · Zbl 1347.62117 · doi:10.1214/12-AOS1065
[62] Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. Ann. Statist. 41 370-400. · Zbl 1347.62117 · doi:10.1214/12-AOS1065
[63] Niles-Weed, J. and Rigollet, P. (2019). Estimation of Wasserstein distances in the spiked transport model. Preprint. Available at arXiv:1909.07513.
[64] PATY, F.-P. and CUTURI, M. (2019). Subspace robust Wasserstein distances. In Proceedings of the 36th International Conference on Machine Learning 97 5072-5081. PMLR, Long Beach, CA.
[65] POLYANSKIY, Y. and WU, Y. (2015). Dissipation of information in channels with input constraints. IEEE Trans. Inf. Theory 62 35-55. · Zbl 1359.94277
[66] QI, L. (2011). The best rank-one approximation ratio of a tensor space. SIAM J. Matrix Anal. Appl. 32 430-442. · Zbl 1228.15010
[67] RABIN, J., PEYRÉ, G., DELON, J. and BERNOT, M. (2011). Wasserstein barycenter and its application to texture mixing. In International Conference on Scale Space and Variational Methods in Computer Vision 435-446. Springer, Berlin.
[68] Rousseau, J. and Mengersen, K. (2011). Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 689-710. · Zbl 1228.62034 · doi:10.1111/j.1467-9868.2011.00781.x
[69] Rudelson, M. and Vershynin, R. (2009). Smallest singular value of a random rectangular matrix. Comm. Pure Appl. Math. 62 1707-1739. · Zbl 1183.15031 · doi:10.1002/cpa.20294
[70] SAHA, S. and GUNTUBOYINA, A. (2020). On the nonparametric maximum likelihood estimator for Gaussian location mixture densities with application to Gaussian denoising. Ann. Statist. 48 738-762. · Zbl 1454.62120
[71] Shen, W., Tokdar, S. T. and Ghosal, S. (2013). Adaptive Bayesian multivariate density estimation with Dirichlet mixtures. Biometrika 100 623-640. · Zbl 1284.62183
[72] SHEN, X. and WASSERMAN, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687-714. · Zbl 1041.62022 · doi:10.1214/aos/1009210686
[73] SHOHAT, J. A. and TAMARKIN, J. D. (1943). The Problem of Moments 1. Am. Math. Soc., Providence, RI. · Zbl 0063.06973
[74] TSYBAKOV, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York, NY. · Zbl 1176.62032
[75] VAN DE GEER, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge. · Zbl 1179.62073
[76] VAN DER VAART, A. (2002). The statistical work of Lucien Le Cam. Ann. Statist. 631-682. · Zbl 1103.62301
[77] VAN DER VAART, A. W. and WELLNER, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York. · Zbl 0862.60002 · doi:10.1007/978-1-4757-2545-2
[78] VEMPALA, S. and WANG, G. (2004). A spectral algorithm for learning mixture models. J. Comput. System Sci. 2004. · Zbl 1074.68028
[79] VILLANI, C. (2003). Topics in Optimal Transportation. Amer. Math. Soc., Providence, RI. · Zbl 1106.90001
[80] WU, Y. (2017). Lecture notes on information-theoretic methods for high-dimensional statistics.
[81] Wu, Y. and Yang, P. (2020). Optimal estimation of Gaussian mixtures via denoised method of moments. Ann. Statist. 48 1981-2007. · Zbl 1455.62075 · doi:10.1214/19-AOS1873
[82] WU, Y. and ZHOU, H. H. (2021). Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in \[O(\sqrt{n})\] iterations. Math. Stat. Learn. 4 143-220. · Zbl 1493.62350 · doi:10.4171/msl/29
[83] ZHANG, C.-H. (2009). Generalized maximum likelihood estimation of normal mixture densities. Statist. Sinica 19 1297-1318 · Zbl 1166.62013
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.