×

Precision matrix estimation under the horseshoe-like prior-penalty dual. (English) Zbl 07823206

Summary: Precision matrix estimation in a multivariate Gaussian model is fundamental to network estimation. Although there exist both Bayesian and frequentist approaches to this, it is difficult to obtain good Bayesian and frequentist properties under the same prior-penalty dual. To bridge this gap, our contribution is a novel prior-penalty dual that closely approximates the graphical horseshoe prior and penalty, and performs well in both Bayesian and frequentist senses. A chief difficulty with the graphical horseshoe prior is a lack of closed form expression of the density function, which we overcome in this article. In terms of theory, we establish posterior convergence rate of the precision matrix that matches the convergence rate of the frequentist graphical lasso estimator, in addition to the frequentist consistency of the MAP estimator at the same rate. In addition, our results also provide theoretical justifications for previously developed approaches that have been unexplored so far, e.g. for the graphical horseshoe prior. Computationally efficient EM and MCMC algorithms are developed respectively for the penalized likelihood and fully Bayesian estimation problems. In numerical experiments, the horseshoe-based approaches echo their superior theoretical properties by comprehensively outperforming the competing methods. A protein-protein interaction network estimation in B-cell lymphoma is considered to validate the proposed methodology.

MSC:

62H12 Estimation in multivariate analysis
62F12 Asymptotic properties of parametric estimators
62F15 Bayesian inference

Software:

BDgraph; glasso; GHS

References:

[1] Banerjee, S., Castillo, I., and Ghosal, S. (2021). Bayesian inference in high-dimensional models. arXiv preprint arXiv:2101.04491. MathSciNet: MR3251280
[2] Banerjee, S. and Ghosal, S. (2014). Posterior convergence rates for estimating large precision matrices using graphical models. Electronic Journal of Statistics, 8(2):2111-2137. MathSciNet: MR3273620 · Zbl 1302.62124
[3] Banerjee, S. and Ghosal, S. (2015). Bayesian structure learning in graphical models. Journal of Multivariate Analysis, 136:147-162. MathSciNet: MR3321485 · Zbl 1308.62119
[4] Barndorff-Nielsen, O., Kent, J., and Sørensen, M. (1982). Normal variance-mean mixtures and z distributions. International Statistical Review, pages 145-159. MathSciNet: MR0678296 · Zbl 0497.62019
[5] Bhadra, A., Datta, J., Polson, N. G., and Willard, B. (2016). Default Bayesian analysis with global-local shrinkage priors. Biometrika, 103(4):955-969. MathSciNet: MR3620450 · Zbl 1506.62343
[6] Bhadra, A., Datta, J., Polson, N. G., and Willard, B. T. (2019a). The horseshoe-like regularization for feature subset selection. Sankhya B, pages 1-30. MathSciNet: MR4256316
[7] Bhadra, A., Datta, J., Polson, N. G., and Willard, B. T. (2019b). Lasso meets horseshoe: A survey. Statistical Science, 34(3):405-427. MathSciNet: MR4017521 · Zbl 1429.62308
[8] Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). Dirichlet-Laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110(512):1479-1490. MathSciNet: MR3449048 · Zbl 1373.62368
[9] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1):199-227. MathSciNet: MR2387969 · Zbl 1132.62040
[10] Brualdi, R. A. and Mellendorf, S. (1994). Regions in the complex plane containing the eigenvalues of a matrix. The American Mathematical Monthly, 101(10):975-985. MathSciNet: MR1304322 · Zbl 0838.15010
[11] Cai, T., Liu, W., and Luo, X. (2011). A constrained \(\ell_1\) minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association, 106(494):594-607. MathSciNet: MR2847973 · Zbl 1232.62087
[12] Callot, L., Caner, M., Önder, A. Ö., and Ulaşan, E. (2019). A nodewise regression approach to estimating large portfolios. Journal of Business & Economic Statistics, pages 1-12. MathSciNet: MR4235193
[13] Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. IEEE Transactions on Information Theory, 56(5):2053-2080. MathSciNet: MR2723472 · Zbl 1366.15021
[14] Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2):465-480. MathSciNet: MR2650751 · Zbl 1406.62021
[15] Castillo, I., Schmidt-Hieber, J., and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. The Annals of Statistics, 43(5):1986-2018. MathSciNet: MR3375874 · Zbl 1486.62197
[16] Dawid, A. P., Stone, M., and Zidek, J. V. (1973). Marginalization paradoxes in Bayesian and structural inference. Journal of the Royal Statistical Society: Series B, 35(2):189-213. MathSciNet: MR0365805 · Zbl 0271.62009
[17] Fan, J., Feng, Y., and Wu, Y. (2009). Network exploration via the adaptive lasso and SCAD penalties. The Annals of Applied Statistics, 3(2):521-541. MathSciNet: MR2750671 · Zbl 1166.62040
[18] Fan, J. and Li, R. (2001a). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456):1348-1360. MathSciNet: MR1946581 · Zbl 1073.62547
[19] Fan, J. and Li, R. (2001b). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348-1360. MathSciNet: MR1946581 · Zbl 1073.62547
[20] Fan, J., Liao, Y., and Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. The Econometrics Journal, 19(1):C1-C32. MathSciNet: MR3501529 · Zbl 1521.62083
[21] Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432-441. · Zbl 1143.62076
[22] Friedman, J., Hastie, T., and Tibshirani, R. (2018). glasso: Graphical Lasso: Estimation of Gaussian Graphical Models. R package version 1.10. MathSciNet: MR4107668
[23] Gan, L., Narisetty, N. N., and Liang, F. (2019). Bayesian regularization for graphical models with unequal shrinkage. Journal of the American Statistical Association, 114(527):1218-1231. MathSciNet: MR4011774 · Zbl 1428.62225
[24] Ghosal, S., Ghosh, J. K., and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. The Annals of Statistics, 28(2):500-531. MathSciNet: MR1790007 · Zbl 1105.62315
[25] Ha, M. J., Banerjee, S., Akbani, R., Liang, H., Mills, G. B., Do, K.-A., and Baladandayuthapani, V. (2018). Personalized integrated network modeling of the cancer proteome atlas. Scientific Reports, 8(1):1-14.
[26] Hartwell, L. H., Hopfield, J. J., Leibler, S., and Murray, A. W. (1999). From molecular to modular cell biology. Nature, 402(6761):C47-C52.
[27] He, X. and Zhang, J. (2006). Why do hubs tend to be essential in protein networks? PLoS Genetics, 2(6):e88.
[28] Huynh-Thu, V. A. and Sanguinetti, G. (2019). Gene regulatory network inference: an introductory survey. In Gene Regulatory Networks, pages 1-23. Springer.
[29] Jeong, H., Mason, S. P., Barabási, A.-L., and Oltvai, Z. N. (2001). Lethality and centrality in protein networks. Nature, 411(6833):41-42.
[30] Kuismin, M. O., Kemppainen, J. T., and Sillanpää, M. J. (2017). Precision matrix estimation with ROPE. Journal of Computational and Graphical Statistics, 26(3):682-694. MathSciNet: MR3698677
[31] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 37(6B):4254. MathSciNet: MR2572459 · Zbl 1191.62101
[32] Lauritzen, S. L. (1996). Graphical models. Oxford University Press. MathSciNet: MR1419991 zbMATH: 0907.62001 · Zbl 0907.62001
[33] Leclerc, R. D. (2008). Survival of the sparsest: robust gene networks are parsimonious. Molecular Systems Biology, 4(1):213.
[34] Lee, K., Jo, S., and Lee, J. (2022). The beta-mixture shrinkage prior for sparse covariances with near-minimax posterior convergence rate. Journal of Multivariate Analysis, 192:105067. MathSciNet: MR4450889 · Zbl 1520.62054
[35] Lee, K. and Lee, J. (2021). Estimating large precision matrices via modified Cholesky decomposition. Statistica Sinica, 31(2021):173-196. MathSciNet: MR4270383 · Zbl 1464.62294
[36] Li, Y., Craig, B. A., and Bhadra, A. (2019). The graphical horseshoe estimator for inverse covariance matrices. Journal of Computational and Graphical Statistics, 28(3):747-757. MathSciNet: MR4007755 · Zbl 07499091
[37] Liu, C. and Martin, R. (2019). An empirical \(g\)-Wishart prior for sparse high-dimensional Gaussian graphical models. arXiv preprint arXiv:1912.03807. MathSciNet: MR4138128
[38] Makalic, E. and Schmidt, D. F. (2015). A simple sampler for the horseshoe estimator. IEEE Signal Processing Letters, 23(1):179-182.
[39] Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80(2):267-278. MathSciNet: MR1243503 · Zbl 0778.62022
[40] Mohammadi, A. and Wit, E. C. (2015a). Bayesian structure learning in sparse gaussian graphical models. Bayesian Analysis, 10(1):109-138. MathSciNet: MR3420899 · Zbl 1335.62056
[41] Mohammadi, R. and Wit, E. C. (2015b). Bdgraph: An R package for bayesian structure learning in graphical models. arXiv preprint arXiv:1501.05108.
[42] Park, T. and Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association, 103(482):681-686. MathSciNet: MR2524001 · Zbl 1330.62292
[43] Piironen, J. and Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2):5018-5051. MathSciNet: MR3738204 · Zbl 1459.62141
[44] Pourahmadi, M. (2011). Covariance Estimation: the GLM and Regularization Perspectives. Statistical Science, 26(3):369-387. MathSciNet: MR2917961 zbMATH: 1246.62139 · Zbl 1246.62139
[45] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., and Barabási, A.-L. (2002). Hierarchical organization of modularity in metabolic networks. Science, 297(5586):1551-1555.
[46] Rothman, A. J., Bickel, P. J., Levina, E., and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2:494-515. MathSciNet: MR2417391 · Zbl 1320.62135
[47] Roverato, A. (2000). Cholesky decomposition of a hyper inverse wishart matrix. Biometrika, 87(1):99-112. MathSciNet: MR1766831 · Zbl 0974.62047
[48] Roverato, A. (2002). Hyper inverse wishart distribution for non-decomposable graphs and its application to bayesian inference for gaussian graphical models. Scandinavian Journal of Statistics, 29(3):391-411. MathSciNet: MR1925566 · Zbl 1036.62027
[49] Ryali, S., Chen, T., Supekar, K., and Menon, V. (2012). Estimation of functional connectivity in fmri data using stability selection-based sparse partial correlation with elastic net penalty. NeuroImage, 59(4):3852-3861.
[50] Song, Q. and Liang, F. (2017). Nearly optimal Bayesian shrinkage for high dimensional regression. arXiv preprint arXiv:1712.08964. MathSciNet: MR4535982
[51] Tang, X., Xu, X., Ghosh, M., and Ghosh, P. (2018). Bayesian variable selection and estimation based on global-local shrinkage priors. Sankhya A, 80(2):215-246. MathSciNet: MR3850065
[52] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58:267-288. MathSciNet: MR1379242 · Zbl 0850.62538
[53] van den Boom, W., Beskos, A., and De Iorio, M. (2022). The g-wishart weighted proposal algorithm: Efficient posterior computation for gaussian graphical models. Journal of Computational and Graphical Statistics, 31(4):1215-1224. MathSciNet: MR4513382 · Zbl 07633315
[54] van der Pas, S., Szabó, B., and van der Vaart, A. (2017). Uncertainty quantification for the horseshoe (with discussion). Bayesian Analysis, 12(4):1221-1274. MathSciNet: MR3724985 · Zbl 1384.62155
[55] Van Wieringen, W. N. and Peeters, C. F. (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Computational Statistics & Data Analysis, 103:284-303. MathSciNet: MR3522633 · Zbl 1466.62204
[56] Wang, C., Pan, G., Tong, T., and Zhu, L. (2015). Shrinkage estimation of large dimensional precision matrix using random matrix theory. Statistica Sinica, 25:993-1008. MathSciNet: MR3409734 · Zbl 1415.62035
[57] Wang, H. (2012). Bayesian graphical lasso models and efficient posterior computation. Bayesian Analysis, 7(4):867-886. MathSciNet: MR3000017 · Zbl 1330.62041
[58] Wang, H. (2014). Coordinate descent algorithm for covariance graphical lasso. Statistics and Computing, 24(4):521-529. MathSciNet: MR3223538 · Zbl 1325.62136
[59] Wang, H. (2015). Scaling it up: Stochastic search structure learning in graphical models. Bayesian Analysis, 10(2):351-377. MathSciNet: MR3420886 · Zbl 1335.62068
[60] Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J. M. (2013). The cancer genome atlas pan-cancer analysis project. Nature Genetics, 45(10):1113-1120.
[61] Xiang, R., Khare, K., and Ghosh, M. (2015). High dimensional posterior convergence rates for decomposable graphical models. Electronic Journal of Statistics, 9(2):2828-2854. MathSciNet: MR3439186 · Zbl 1329.62152
[62] Xie, X., Kou, S. C., and Brown, L. (2016). Optimal shrinkage estimation of mean parameters in family of distributions with quadratic variance. The Annals of Statistics, 44(2):564-597. MathSciNet: MR3476610 · Zbl 1347.60017
[63] Yang, E., Ravikumar, P., Allen, G. I., and Liu, Z. (2012). Graphical models via generalized linear models. In NIPS, volume 25, pages 1367-1375.
[64] Zhang, T. and Zou, H. (2014). Sparse precision matrix estimation via lasso penalized d-trace loss. Biometrika, 101(1):103-120. MathSciNet: MR3180660 · Zbl 1285.62063
[65] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301-320. MathSciNet: MR2137327 · Zbl 1069.62054
[66] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36(4):1509-1533. MathSciNet: MR2435443 · Zbl 1142.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.