×

Distributionally robust inverse covariance estimation: the Wasserstein shrinkage estimator. (English) Zbl 1485.90084

Summary: We introduce a distributionally robust maximum likelihood estimation model with a Wasserstein ambiguity set to infer the inverse covariance matrix of a \(p\)-dimensional Gaussian random vector from \(n\) independent samples. The proposed model minimizes the worst case (maximum) of Stein’s loss across all normal reference distributions within a prescribed Wasserstein distance from the normal distribution characterized by the sample mean and the sample covariance matrix. We prove that this estimation problem is equivalent to a semidefinite program that is tractable in theory but beyond the reach of general-purpose solvers for practically relevant problem dimensions \(p\). In the absence of any prior structural information, the estimation problem has an analytical solution that is naturally interpreted as a nonlinear shrinkage estimator. Besides being invertible and well conditioned even for \(p>n\), the new shrinkage estimator is rotation equivariant and preserves the order of the eigenvalues of the sample covariance matrix. These desirable properties are not imposed ad hoc but emerge naturally from the underlying distributionally robust optimization model. Finally, we develop a sequential quadratic approximation algorithm for efficiently solving the general estimation problem subject to conditional independence constraints typically encountered in Gaussian graphical models.

MSC:

90C17 Robustness in mathematical programming
90C22 Semidefinite programming
90C90 Applications of mathematical programming

References:

[1] Banerjee O, El Ghaoui L, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Machine Learn. Res. 9(June):485-516.Google Scholar · Zbl 1225.68149
[2] Berge C (1963) Topological Spaces: Including a Treatment of Multi-Valued Functions, Vector Spaces, and Convexity (Dover, Mineola, NY).Google Scholar · Zbl 0114.38602
[3] Bernstein DS (2009) Matrix Mathematics: Theory, Facts, and Formulas (Princeton University Press, Princeton, NJ).Crossref, Google Scholar · Zbl 1183.15001 · doi:10.1515/9781400833344
[4] Bertsekas DP (2009) Convex Optimization Theory (Athena Scientific, Belmont, MA).Google Scholar · Zbl 1242.90001
[5] Bien J, Tibshirani RJ (2011) Sparse estimation of a covariance matrix. Biometrika 98(4):807-820.Crossref, Google Scholar · Zbl 1228.62063 · doi:10.1093/biomet/asr054
[6] Blanchet J, Murthy K (2016) Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2):565-600.Google Scholar · Zbl 1434.60113
[7] Blanchet J, Si N (2019) Optimal uncertainty size in distributionally robust inverse covariance estimation. Oper. Res. Lett. 47(6):618-621.Google Scholar · Zbl 1476.62104
[8] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar · Zbl 1058.90049 · doi:10.1017/CBO9780511804441
[9] Chun SY, Browne MW, Shapiro A (2018) Modified distribution-free goodness-of-fit test statistic. Psychometrika 83(1):48-66.Crossref, Google Scholar · Zbl 1402.62305 · doi:10.1007/s11336-017-9574-9
[10] Dahl J, Roychowdhury V, Vandenberghe L (2005) Maximum likelihood estimation of Gaussian graphical models: Numerical implementation and topology selection. Working paper, University of California, Los Angeles, Los Angeles.Google Scholar
[11] Das A, Sampson AL, Lainscsek C, Muller L, Lin W, Doyle JC, Cash SS, Halgren E, Sejnowski TJ (2017) Interpretation of the precision matrix and its application in estimating sparse brain connectivity during sleep spindles from human electrocorticography recordings. Neural Comput. 29(3):603-642.Crossref, Google Scholar · Zbl 1414.92163 · doi:10.1162/NECO_a_00936
[12] Delage E, Ye Y (2010) Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3):595-612.Link, Google Scholar · Zbl 1228.90064
[13] DeMiguel V, Nogales FJ (2009) Portfolio selection with robust estimation. Oper. Res. 57(3):560-577.Link, Google Scholar · Zbl 1233.91240
[14] Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20(18):3583-3593.Crossref, Google Scholar · doi:10.1093/bioinformatics/bth447
[15] Dey DK, Srinivasan C (1985) Estimation of a covariance matrix under Stein’s loss. Ann. Statist. 13(4):1581-1591.Crossref, Google Scholar · Zbl 0582.62042 · doi:10.1214/aos/1176349756
[16] Du L, Li J, Stoica P (2010) Fully automatic computation of diagonal loading levels for robust adaptive beamforming. IEEE Trans. Aerospace Electronic Systems 46(1):449-458.Crossref, Google Scholar · doi:10.1109/TAES.2010.5417174
[17] Fan J, Fan Y, Lv J (2008) High dimensional covariance matrix estimation using a factor model. J. Econometrics 147(1):186-197.Crossref, Google Scholar · Zbl 1429.62185 · doi:10.1016/j.jeconom.2008.09.017
[18] Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2):179-188.Crossref, Google Scholar · doi:10.1111/j.1469-1809.1936.tb02137.x
[19] Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432-441.Crossref, Google Scholar · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[20] Gao R, Kleywegt A (2016) Distributionally robust stochastic optimization with Wasserstein distance. Preprint, submitted April 8, https://arxiv.org/abs/1604.02199.Google Scholar
[21] Gao R, Chen X, Kleywegt A (2016) Wasserstein distributional robustness and regularization in statistical learning. Preprint, submitted December 17, https://arxiv.org/abs/1712.06050.Google Scholar
[22] Givens CR, Shortt RM (1984) A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31(2):231-240.Crossref, Google Scholar · Zbl 0582.60002 · doi:10.1307/mmj/1029003026
[23] Goh J, Sim M (2010) Distributionally robust optimization and its tractable approximations. Oper. Res. 58(4, Part 1):902-917.Link, Google Scholar · Zbl 1228.90067
[24] Goto S, Xu Y (2015) Improving mean variance optimization through sparse hedging restrictions. J. Financial Quant. Anal. 50(6):1415-1441.Crossref, Google Scholar · doi:10.1017/S0022109015000526
[25] Haff LR (1991) The variational form of certain Bayes estimators. Ann. Statist. 19(3):1163-1190.Crossref, Google Scholar · Zbl 0739.62046 · doi:10.1214/aos/1176348244
[26] Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning (Springer, New York).Crossref, Google Scholar · Zbl 0973.62007 · doi:10.1007/978-0-387-21606-5
[27] Hespanha JP (2009) Linear Systems Theory (Princeton University Press, Princeton, NJ).Google Scholar · Zbl 1185.93001
[28] Hsieh C-J, Sustik MA, Dhillon IS, Ravikumar P (2014) QUIC: Quadratic approximation for sparse inverse covariance estimation. J. Machine Learn. Res. 15(83):2911-2947.Google Scholar · Zbl 1319.65048
[29] Jagannathan R, Ma T (2003) Risk reduction in large portfolios: Why imposing the wrong constraints helps. J. Finance 58(4):1651-1683.Crossref, Google Scholar · doi:10.1111/1540-6261.00580
[30] James W, Stein C (1961) Estimation with quadratic loss. Proc. 4th Berkeley Sympos. Math. Statist. Probab., Vol. 1: Contributions to the Theory of Statistics (University of California Press, Berkeley), 361-379.Google Scholar · Zbl 1281.62026
[31] Lauritzen SL (1996) Graphical Models (Oxford University Press, Oxford, UK).Google Scholar · Zbl 0907.62001
[32] Ledoit O (2004a) A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88(2):365-411.Crossref, Google Scholar · Zbl 1032.62050 · doi:10.1016/S0047-259X(03)00096-4
[33] Ledoit O (2004b) Honey, I shrunk the sample covariance matrix. J. Portfolio Management 30(4):110-119.Crossref, Google Scholar · doi:10.3905/jpm.2004.110
[34] Ledoit O (2012) Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40(2):1024-1060.Crossref, Google Scholar · Zbl 1274.62371 · doi:10.1214/12-AOS989
[35] Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empirical Finance 10(5):603-621.Crossref, Google Scholar · doi:10.1016/S0927-5398(03)00007-0
[36] Löfberg J (2004) YALMIP: A toolbox for modeling and optimization in MATLAB. 2004 IEEE Internat. Conf. Robotics Automation (IEEE, Piscataway, NJ), 284-289.Google Scholar
[37] Markowitz H (1952) Portfolio selection. J. Finance 7(1):77-91.Google Scholar
[38] Mohajerin Esfahani P, Kuhn D (2017) Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programming 171(1-2):115-166.Crossref, Google Scholar · Zbl 1433.90095 · doi:10.1007/s10107-017-1172-1
[39] Murphy KP (2013) Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, MA).Google Scholar
[40] Nocedal J, Wright SJ (2006) Numerical Optimization (Springer, New York).Google Scholar · Zbl 1104.65059
[41] Nguyen VA, Shafieezadeh-Abadeh S, Kuhn D, Mohajerin Esfahani P (2022) Bridging Bayesian and minimax mean square error estimation via Wasserstein distributionally robust optimization. Math. Oper. Res. Forthcoming.Google Scholar
[42] Oztoprak F, Nocedal J, Rennie S, Olsen PA (2012) Newton-like methods for sparse inverse covariance estimation. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 25 (Curran Associates, Red Hook, NY), 755-763.Google Scholar
[43] Pan VY, Chen ZQ (1999) The complexity of the matrix eigenproblem. Proc. 31st Annual ACM Sympos. Theory Comput. (ACM, New York), 507-516.Google Scholar · Zbl 1346.68103
[44] Perlman MD (2007) STAT 542: Multivariate statistical analysis. Lecture notes, University of Washington, Seattle. http://courses.washington.edu/stat512/542Notes2007.pdf.Google Scholar
[45] Ribes A, Azaïs J-M, Planton S (2009) Adaptation of the optimal fingerprint method for climate change detection using a well-conditioned covariance matrix estimate. Climate Dynam. 33(5):707-722.Crossref, Google Scholar · doi:10.1007/s00382-009-0561-4
[46] Rippl T, Munk A, Sturm A (2016) Limit laws of the empirical Wasserstein distance: Gaussian distributions. J. Multivariate Anal. 151(October):90-109.Crossref, Google Scholar · Zbl 1351.62064 · doi:10.1016/j.jmva.2016.06.005
[47] Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genetics Molecular Biol. 4(1):Article 32.Google Scholar
[48] Shafieezadeh-Abadeh S, Kuhn D, Mohajerin Esfahani P (2017) Regularization via mass transportation. Preprint, submitted October 27, https://arxiv.org/abs/1710.10016.Google Scholar · Zbl 1434.68450
[49] Shafieezadeh-Abadeh S, Mohajerin Esfahani P, Kuhn D (2015) Distributionally robust logistic regression. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Red Hook, NY), 1576-1584.Google Scholar
[50] Stein C (1975) Estimation of a covariance matrix. Rietz Lecture, 39th Annual Meeting IMS, Institute of Mathematical Statistics, Cleveland.Google Scholar
[51] Stein C (1986) Lectures on the theory of estimation of many parameters. J. Soviet Math. 34(1):1373-1403.Crossref, Google Scholar · Zbl 0593.62049 · doi:10.1007/BF01085007
[52] Stevens GVG (1998) On the inverse of the covariance matrix in portfolio analysis. J. Finance 53(5):1821-1827.Crossref, Google Scholar · doi:10.1111/0022-1082.00074
[53] Torri G, Giacometti R, Paterlini S (2019) Sparse precision matrices for minimum variance portfolios. Comput. Management Sci. 16(3):375-400.Crossref, Google Scholar · Zbl 07097398 · doi:10.1007/s10287-019-00344-6
[54] Touloumis A (2015) Nonparametric Stein-type shrinkage covariance matrix estimators in high-dimensional settings. Comput. Statist. Data Anal. 83(March):251-261.Crossref, Google Scholar · Zbl 1507.62168 · doi:10.1016/j.csda.2014.10.018
[55] Tseng P, Yun S (2009) A coordinate gradient descent method for nonsmooth separable minimization. Math. Programming 117(1-2):387-423.Crossref, Google Scholar · Zbl 1166.90016 · doi:10.1007/s10107-007-0170-0
[56] Tütüncü RH, Toh KC, Todd MJ (2003) Solving semidefinite-quadratic-linear programs using SDPT3. Math. Programming 95(2):189-217.Crossref, Google Scholar · Zbl 1030.90082 · doi:10.1007/s10107-002-0347-5
[57] van der Vaart HR (1961) On certain characteristics of the distribution of the latent roots of a symmetric random matrix under general conditions. Ann. Math. Statist. 32(3):864-873.Crossref, Google Scholar · Zbl 0121.14102 · doi:10.1214/aoms/1177704979
[58] Wiesel A, Eldar Y, Hero A (2010) Covariance estimation in decomposable Gaussian graphical models. IEEE Trans. Signal Process. 58(3):1482-1492.Crossref, Google Scholar · Zbl 1392.94526 · doi:10.1109/TSP.2009.2037350
[59] Wiesemann W, Kuhn D, Sim M (2014) Distributionally robust convex optimization. Oper. Res. 62(6):1358-1376.Link, Google Scholar · Zbl 1327.90158
[60] Won J-H, Lim J, Kim S-J, Rajaratnam B (2013) Condition number regularized covariance estimation. J. Roy. Statist. Soc. Ser. B Statist. Methodol. 75(3):427-450.Crossref, Google Scholar · Zbl 1411.62146 · doi:10.1111/j.1467-9868.2012.01049.x
[61] Yang R, Berger JO (1994) Estimation of a covariance matrix using the reference prior. Ann. Statist. 22(3):1195-1211.Crossref, Google Scholar · Zbl 0819.62013 · doi:10.1214/aos/1176325625
[62] Zhao C, Guan Y (2018) Data-driven risk-averse stochastic optimization with Wasserstein metric. Oper. Res. Lett. 46(2):262-267.Crossref, Google Scholar · Zbl 1525.90316 · doi:10.1016/j.orl.2018.01.011
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.