×

Optimal estimation of Gaussian mixtures via denoised method of moments. (English) Zbl 1455.62075

Summary: The method of moments [K. Pearson, Philos. Trans. R. Soc. Lond., A 185, 71–110 (1894; JFM 25.0347.02)] is one of the most widely used methods in statistics for parameter estimation, by means of solving the system of equations that match the population and estimated moments. However, in practice and especially for the important case of mixture models, one frequently needs to contend with the difficulties of non-existence or nonuniqueness of statistically meaningful solutions, as well as the high computational cost of solving large polynomial systems. Moreover, theoretical analyses of the method of moments are mainly confined to asymptotic normality style of results established under strong assumptions.
This paper considers estimating a \(k\)-component Gaussian location mixture with a common (possibly unknown) variance parameter. To overcome the aforementioned theoretic and algorithmic hurdles, a crucial step is to denoise the moment estimates by projecting to the truncated moment space (via semidefinite programming) before solving the method of moments equations. Not only does this regularization ensure existence and uniqueness of solutions, it also yields fast solvers by means of Gauss quadrature. Furthermore, by proving new moment comparison theorems in the Wasserstein distance via polynomial interpolation and majorization techniques, we establish the statistical guarantees and adaptive optimality of the proposed procedure, as well as oracle inequality in misspecified models. These results can also be viewed as provable algorithms for generalized method of moments [L. P. Hansen, Econometrica 50, 1029–1054 (1982; Zbl 0502.62098)] which involves nonconvex optimization and lacks theoretical guarantees.

MSC:

62G05 Nonparametric estimation
62C20 Minimax procedures in statistical decision theory
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

OPQ; CVXPY

References:

[1] Akhiezer, N. I. (1965). The Classical Moment Problem and Some Related Questions in Analysis. Hafner Publishing Co., New York. Zentralblatt MATH: 0135.33803
· Zbl 0135.33803
[2] Anandkumar, A., Ge, R., Hsu, D., Kakade, S. M. and Telgarsky, M. (2014). Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15 2773-2832. Zentralblatt MATH: 1319.62109
· Zbl 1319.62109
[3] Atkinson, K. E. (2008). An Introduction to Numerical Analysis. Wiley, New York.
[4] Balakrishnan, S., Wainwright, M. J. and Yu, B. (2017). Statistical guarantees for the EM algorithm: From population to sample-based analysis. Ann. Statist. 45 77-120. Zentralblatt MATH: 1367.62052
Digital Object Identifier: doi:10.1214/16-AOS1435
Project Euclid: euclid.aos/1487667618
· Zbl 1367.62052 · doi:10.1214/16-AOS1435
[5] Belkin, M. and Sinha, K. (2010). Polynomial learning of distribution families. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science—FOCS 2010 103-112. IEEE Computer Soc., Los Alamitos, CA. Zentralblatt MATH: 1335.68100
Digital Object Identifier: doi:10.1137/13090818X
· Zbl 1335.68100 · doi:10.1137/13090818X
[6] Chaussé, P. (2010). Computing generalized method of moments and generalized empirical likelihood with R. J. Stat. Softw. 34 1-35.
[7] Chen, J. H. (1995). Optimal rate of convergence for finite mixture models. Ann. Statist. 23 221-233. Zentralblatt MATH: 0821.62023
Digital Object Identifier: doi:10.1214/aos/1176324464
Project Euclid: euclid.aos/1176324464
· Zbl 0821.62023 · doi:10.1214/aos/1176324464
[8] Dasgupta, S. (1999). Learning mixtures of Gaussians. In 40th Annual Symposium on Foundations of Computer Science (New York, 1999) 634-644. IEEE Computer Soc., Los Alamitos, CA.
[9] Daskalakis, C., Tzamos, C. and Zampetakis, M. (2017). Ten steps of EM suffice for mixtures of two gaussians. In Conference on Learning Theory 704-710.
[10] Deely, J. J. and Kruse, R. L. (1968). Construction of sequences estimating the mixing distribution. Ann. Math. Stat. 39 286-288. Zentralblatt MATH: 0174.22302
Digital Object Identifier: doi:10.1214/aoms/1177698536
Project Euclid: euclid.aoms/1177698536
· Zbl 0174.22302 · doi:10.1214/aoms/1177698536
[11] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. Zentralblatt MATH: 0364.62022
Digital Object Identifier: doi:10.1111/j.2517-6161.1977.tb01600.x
· Zbl 0364.62022 · doi:10.1111/j.2517-6161.1977.tb01600.x
[12] Diaconis, P. (1987). Application of the method of moments in probability and statistics. In Moments in Mathematics (San Antonio, Tex., 1987). Proc. Sympos. Appl. Math. 37 125-142. Amer. Math. Soc., Providence, RI. Zentralblatt MATH: 0631.60018
· Zbl 0631.60018
[13] Diamond, S. and Boyd, S. (2016). CVXPY: A Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17 Paper No. 83, 5. Zentralblatt MATH: 1360.90008
· Zbl 1360.90008
[14] Edelman, D. (1988). Estimation of the mixing distribution for a normal mean with applications to the compound decision problem. Ann. Statist. 16 1609-1622. Zentralblatt MATH: 0653.62008
Digital Object Identifier: doi:10.1214/aos/1176351056
Project Euclid: euclid.aos/1176351056
· Zbl 0653.62008 · doi:10.1214/aos/1176351056
[15] Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer Series in Statistics. Springer, New York. Mathematical Reviews (MathSciNet): MR2265601
· Zbl 1108.62002
[16] Gautschi, W. (2004). Orthogonal Polynomials: Computation and Approximation. Numerical Mathematics and Scientific Computation. Oxford Univ. Press, New York. Zentralblatt MATH: 1130.42300
· Zbl 1130.42300
[17] Genovese, C. R. and Wasserman, L. (2000). Rates of convergence for the Gaussian mixture sieve. Ann. Statist. 28 1105-1127. Zentralblatt MATH: 1105.62333
Digital Object Identifier: doi:10.1214/aos/1015956709
Project Euclid: euclid.aos/1015956709
· Zbl 1105.62333 · doi:10.1214/aos/1015956709
[18] Ghosal, S. and van der Vaart, A. W. (2001). Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Statist. 29 1233-1263. Zentralblatt MATH: 1043.62025
Digital Object Identifier: doi:10.1214/aos/1013203453
Project Euclid: euclid.aos/1013203452
· Zbl 1043.62025 · doi:10.1214/aos/1013203453
[19] Golub, G. H. and Welsch, J. H. (1969). Calculation of Gauss quadrature rules. Math. Comp. 23 221-230. Zentralblatt MATH: 0179.21901
Digital Object Identifier: doi:10.1090/S0025-5718-69-99647-1
· Zbl 0179.21901 · doi:10.1090/S0025-5718-69-99647-1
[20] Hall, A. R. (2005). Generalized Method of Moments. Advanced Texts in Econometrics. Oxford Univ. Press, Oxford. · Zbl 1076.62118
[21] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50 1029-1054. Zentralblatt MATH: 0502.62098
Digital Object Identifier: doi:10.2307/1912775
· Zbl 0502.62098 · doi:10.2307/1912775
[22] Hardt, M. and Price, E. (2015). Tight bounds for learning a mixture of two Gaussians [extended abstract]. In STOC’15—Proceedings of the 2015 ACM Symposium on Theory of Computing 753-760. ACM, New York. Zentralblatt MATH: 1321.68405
· Zbl 1321.68405
[23] Heinrich, P. and Kahn, J. (2018). Strong identifiability and optimal minimax rates for finite mixture estimation. Ann. Statist. 46 2844-2870. Zentralblatt MATH: 1420.62215
Digital Object Identifier: doi:10.1214/17-AOS1641
Project Euclid: euclid.aos/1536307235
· Zbl 1420.62215 · doi:10.1214/17-AOS1641
[24] Hopkins, S. B. and Li, J. (2018). Mixture models, robustness, and sum of squares proofs. In STOC’18—Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing 1021-1034. ACM, New York. Zentralblatt MATH: 1428.68240
· Zbl 1428.68240
[25] Horn, R. A. and Johnson, C. R. (2013). Matrix Analysis, 2nd ed. Cambridge Univ. Press, Cambridge. Zentralblatt MATH: 1267.15001
· Zbl 1267.15001
[26] Ibragimov, I. (2001). Estimation of analytic functions. In State of the Art in Probability and Statistics (Leiden, 1999). Institute of Mathematical Statistics Lecture Notes—Monograph Series 36 359-383. IMS, Beachwood, OH. Zentralblatt MATH: 1373.62130
· Zbl 1373.62130
[27] Kalai, A. T., Moitra, A. and Valiant, G. (2010). Efficiently learning mixtures of two Gaussians. In STOC’10—Proceedings of the 2010 ACM International Symposium on Theory of Computing 553-562. ACM, New York. Zentralblatt MATH: 1293.68229
· Zbl 1293.68229
[28] Karlin, S. and Shapley, L. S. (1953). Geometry of moment spaces. Mem. Amer. Math. Soc. 12 93. Zentralblatt MATH: 0052.18501
· Zbl 0052.18501
[29] Karlis, D. and Xekalaki, E. (2003). Choosing initial values for the EM algorithm for finite mixtures Comput. Statist. Data Anal. 41 577-590. Zentralblatt MATH: 1429.62082
Digital Object Identifier: doi:10.1016/S0167-9473(02)00177-9
· Zbl 1429.62082 · doi:10.1016/S0167-9473(02)00177-9
[30] Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Stat. 27 887-906. Zentralblatt MATH: 0073.14701
Digital Object Identifier: doi:10.1214/aoms/1177728066
Project Euclid: euclid.aoms/1177728066
· Zbl 0073.14701 · doi:10.1214/aoms/1177728066
[31] Kim, A. K. H. (2014). Minimax bounds for estimation of normal mixtures. Bernoulli 20 1802-1818. Zentralblatt MATH: 1320.62082
Digital Object Identifier: doi:10.3150/13-BEJ542
Project Euclid: euclid.bj/1411134445
· Zbl 1320.62082 · doi:10.3150/13-BEJ542
[32] Koenker, R. and Mizera, I. (2014). Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. J. Amer. Statist. Assoc. 109 674-685. Zentralblatt MATH: 1367.62020
Digital Object Identifier: doi:10.1080/01621459.2013.869224
· Zbl 1367.62020 · doi:10.1080/01621459.2013.869224
[33] Kong, W. and Valiant, G. (2017). Spectrum estimation from samples. Ann. Statist. 45 2218-2247. Zentralblatt MATH: 06821124
Digital Object Identifier: doi:10.1214/16-AOS1525
Project Euclid: euclid.aos/1509436833
· Zbl 1443.62156 · doi:10.1214/16-AOS1525
[34] Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference. Springer Series in Statistics. Springer, New York. Zentralblatt MATH: 1180.62137
· Zbl 1180.62137
[35] Krawtchouk, M. (1932). Sur le problème de moments. In ICM Proceedings 127-128. Available at https://www.mathunion.org/fileadmin/ICM/Proceedings/ICM1932.2/ICM1932.2.ocr.pdf. · JFM 58.0439.04
[36] Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixed distribution. J. Amer. Statist. Assoc. 73 805-811. Zentralblatt MATH: 0391.62029
Digital Object Identifier: doi:10.1080/01621459.1978.10480103
· Zbl 0391.62029 · doi:10.1080/01621459.1978.10480103
[37] Lasserre, J. B. (2010). Moments, Positive Polynomials and Their Applications. Imperial College Press Optimization Series 1. Imperial College Press, London. Zentralblatt MATH: 1211.90007
· Zbl 1211.90007
[38] Li, J. and Schmidt, L. (2017). Robust and proper learning for mixtures of Gaussians via systems of polynomial inequalities. In Conference on Learning Theory 1302-1382.
[39] Lindsay, B. G. (1981). Properties of the maximum likelihood estimator of a mixing distribution. In Statistical Distributions in Scientific Work, Vol. 5 (Trieste, 1980). NATO Adv. Study Inst. Ser. C: Math. Phys. Sci. 79 95-109. Reidel, Dordrecht. · Zbl 0474.62027
[40] Lindsay, B. G. (1989). Moment matrices: Applications in mixtures. Ann. Statist. 17 722-740. Zentralblatt MATH: 0672.62063
Digital Object Identifier: doi:10.1214/aos/1176347138
Project Euclid: euclid.aos/1176347138
· Zbl 0672.62063 · doi:10.1214/aos/1176347138
[41] Lindsay, B. G. (1995). Mixture models: Theory, geometry and applications. In NSF-CBMS Regional Conference Series in Probability and Statistics i-163. JSTOR. Zentralblatt MATH: 1163.62326
· Zbl 1163.62326
[42] Lu, Y. and Zhou, H. H. (2016). Statistical and computational guarantees of Lloyd’s algorithm and its variants. Preprint. Available at arXiv:1612.02099. arXiv: 1612.02099
[43] Meng, X.-L. and Van Dyk, D. (1997). The EM algorithm—An old folk-song sung to a fast new tune. J. R. Stat. Soc. Ser. B. Stat. Methodol. 59 511-567. Zentralblatt MATH: 1090.62518
Digital Object Identifier: doi:10.1111/1467-9868.00082
· Zbl 1090.62518 · doi:10.1111/1467-9868.00082
[44] Moitra, A. and Valiant, G. (2010). Settling the polynomial learnability of mixtures of Gaussians. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science—FOCS 2010 93-102. IEEE Computer Soc., Los Alamitos, CA. Zentralblatt MATH: 1293.68229
· Zbl 1293.68229
[45] Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. Ann. Statist. 41 370-400. Zentralblatt MATH: 1347.62117
Digital Object Identifier: doi:10.1214/12-AOS1065
Project Euclid: euclid.aos/1364302747
· Zbl 1347.62117 · doi:10.1214/12-AOS1065
[46] Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. Ser. A 185 71-110. Zentralblatt MATH: 25.0347.02
Digital Object Identifier: doi:10.1098/rsta.1894.0003
· JFM 25.0347.02 · doi:10.1098/rsta.1894.0003
[47] Pilla, R. S. and Lindsay, B. G. (2001). Alternative EM methods for nonparametric finite mixture models. Biometrika 88 535-550. Zentralblatt MATH: 0984.62024
Digital Object Identifier: doi:10.1093/biomet/88.2.535
· Zbl 0984.62024 · doi:10.1093/biomet/88.2.535
[48] Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195-239. Zentralblatt MATH: 0536.62021
Digital Object Identifier: doi:10.1137/1026034
· Zbl 0536.62021 · doi:10.1137/1026034
[49] Schmüdgen, K. (2017). The Moment Problem. Graduate Texts in Mathematics 277. Springer, Cham.
[50] Seidel, W., Mosler, K. and Alker, M. (2000). A cautionary note on likelihood ratio tests in mixture models. Ann. Inst. Statist. Math. 52 481-487. Zentralblatt MATH: 0960.62025
Digital Object Identifier: doi:10.1023/A:1004117419204
· Zbl 0960.62025 · doi:10.1023/A:1004117419204
[51] Shohat, J. A. and Tamarkin, J. D. (1943). The Problem of Moments. American Mathematical Society Mathematical Surveys, Vol. I. Amer. Math. Soc., New York. Zentralblatt MATH: 0063.06973
· Zbl 0063.06973
[52] Stoer, J. and Bulirsch, R. (2002). Introduction to Numerical Analysis, 3rd ed. Texts in Applied Mathematics 12. Springer, New York. Zentralblatt MATH: 1004.65001
· Zbl 1004.65001
[53] Uspensky, J. V. (1937). Introduction to Mathematical Probability. McGraw-Hill, New York. Zentralblatt MATH: 63.1069.01
· JFM 63.1069.01
[54] van der Vaart, A. W. (2000). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge. · Zbl 0943.62002
[55] Villani, C. (2003). Topics in Optimal Transportation. Graduate Studies in Mathematics 58. Amer. Math. Soc., Providence, RI. Zentralblatt MATH: 1106.90001
· Zbl 1106.90001
[56] Wolkowicz, H., Saigal, R. and Vandenberghe, L., eds. (2012). Handbook of Semidefinite Programming: Theory, Algorithms, and Applications. International Series in Operations Research & Management Science 27. Kluwer Academic, Boston, MA.
[57] Wu, Y. and Yang, P. (2020). Supplement to “Optimal estimation of Gaussian mixtures via denoised method of moments.” https://doi.org/10.1214/19-AOS1873SUPP.
[58] Xu, J., Hsu, D. J. and Maleki, A. (2016). Global analysis of expectation maximization for mixtures of two Gaussians. In Advances in Neural Information Processing Systems 2676-2684.
[59] Xu, L.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.