×

On variable ordination of Cholesky-based estimation for a sparse covariance matrix. (English. French summary) Zbl 07759580

Summary: Estimation of a large sparse covariance matrix is of great importance for statistical analysis, especially in high-dimensional settings. The traditional approach such as the sample covariance matrix performs poorly due to the high dimensionality. The modified Cholesky decomposition (MCD) is a commonly used method for sparse covariance matrix estimation. However, the MCD method relies on the order of variables, which often is not available or cannot be pre-determined in practice. In this work, we solve this order issue by obtaining a set of covariance matrix estimates based on assuming different orders of variables used in the MCD. Then we consider an ensemble estimator as the “centre” of such a set of covariance matrix estimates with respect to the Frobenius norm. Our proposed method not only ensures that the estimator is positive definite, but also captures the underlying sparse structure of the covariance matrix. Under some regularity conditions, we establish both algorithmic and asymptotic convergence of the proposed method. Its merits are illustrated via simulation studies and a practical example using data from a prostate cancer study.
{© 2020 Statistical Society of Canada / Société statistique du Canada}

MSC:

62-XX Statistics

Software:

spcov; glasso

References:

[1] Aubry, A., De Maio, A., Pallotta, L., & Farina, A. (2012). Maximum likelihood estimation of a structured covariance matrix with a condition number constraint. IEEE Transactions on Signal Processing, 60, 3004-3021. · Zbl 1393.94166
[2] Bickel, P. J. & Levina, E. (2009). Covariance regularization by thresholding. The Annals of Statistics, 36, 2577-2604. · Zbl 1196.62062
[3] Bien, J. & Tibshirani, R. J. (2011). Sparse estimation of a covariance matrix. Biometrika, 98, 807-820. · Zbl 1228.62063
[4] Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3, 1-122. · Zbl 1229.90122
[5] Cai, T. & Yuan, M. (2012). Adaptive covariance matrix estimation through block thresholding. The Annals of Statistics, 40, 2014-2042. · Zbl 1257.62060
[6] Cai, T., Ren, Z., & Zhou, H. H. (2016). Estimating structured high‐dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electronic Journal of Statistics, 10, 1-59. · Zbl 1331.62272
[7] Chang, C. & Tsay, R. (2010). Estimation of covariance matrix via the sparse Cholesky factor with Lasso. Journal of Statistical Planning and Inference, 140, 3858-3873. · Zbl 1233.62118
[8] Dellaportas, P. & Pourahmadi, M. (2012). Cholesky‐GARCH models with applications to finance. Statistics and Computing, 22, 849-855. · Zbl 1252.91080
[9] Deng, X. & Tsui, K. W. (2013). Penalized covariance matrix estimation using a matrix‐logarithm transformation. Journal of Computational and Graphical Statistics, 22, 494-512.
[10] Deng, X. & Yuan, M. (2009). Large Gaussian covariance matrix estimation with Markov structure. Journal of Computational and Graphical Statistics, 18, 640-657.
[11] Dey, D. K. & Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss. The Annals of Statistics, 13, 1581-1591. · Zbl 0582.62042
[12] Fan, J., Liao, Y., & Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. The Econometrics Journal, 19, 1-32. · Zbl 1521.62083
[13] Fan, J., Liao, Y., & Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. Journal of the Royal Statistical Society Series B, 75, 603-680. · Zbl 1411.62138
[14] Friedman, J., Hastie, T., & Tibshirani, T. (2008). Sparse inverse covariance estimation with the graphical Lasso. Biostatistics, 9, 432-441. · Zbl 1143.62076
[15] Glaab, E., Bacardit, J., Garibaldi, J. M., & Krasnogor, N. (2012). Using rule‐based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One, 7, e39932.
[16] Guo, J., Levina, E., Michailidis, G., & Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika, 98, 1-15. · Zbl 1214.62058
[17] Haff, L. R. (1991). The variational form of certain Bayes estimators. The Annals of Statistics, 19, 1163-1190. · Zbl 0739.62046
[18] Huang, C., Farewell, D., & Pan, J. (2017). A calibration method for non‐positive definite covariance matrix in multivariate data analysis. Journal of Multivariate Analysis, 157, 45-52. · Zbl 1362.62136
[19] Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 93, 85-98. · Zbl 1152.62346
[20] Jiang, X.2012. Joint estimation of covariance matrix via Cholesky decomposition, Ph.D. dissertation, Department of Statistics and Applied Probability, National University of Singapore.
[21] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics, 29, 295-327. · Zbl 1016.62078
[22] Kang, X. & Deng, X. (2020). An improved modified Cholesky decomposition approach for precision matrix estimation. Journal of Statistical Computation and Simulation, 90, 443-464. · Zbl 07194294
[23] Kang, X., Xie, C., & Wang, M. (2020). A Cholesky‐based estimation for large‐dimensional covariance matrices. Journal of Applied Statistics, 47, 1017-1030. · Zbl 1521.62368
[24] Kang, X., Deng, X., Tsui, K. W., & Pourahmadi, M. (2020). On variable ordination of modified Cholesky decomposition for estimating time‐varying covariance matrices. International Statistical Review, 88, 616-641. · Zbl 1528.62026
[25] Karush, W.1939. Minima of functions of several variables with inequalities as side conditions, Master’s dissertation, Department of Mathematics, University of Chicago, Chicago, IL.
[26] Kuhn, H. & Tucker, A.Nonlinear programming. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. Neyman, J. (ed.) (ed), 481-492. University of California Press: Berkeley; 1951. · Zbl 0044.05903
[27] Lam, C. & Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics, 37, 4254-4278. · Zbl 1191.62101
[28] Ledoit, O. & Wolf, M. (2004). A well‐conditioned estimator for large‐dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365-411. · Zbl 1032.62050
[29] Liu, H., Wang, L., & Zhao, T. (2014). Sparse covariance matrix estimation with eigenvalue constraints. Journal of Computational and Graphical Statistics, 23, 439-459.
[30] Pourahmadi, M. (1999). Joint mean‐covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika, 86, 677-690. · Zbl 0949.62066
[31] Pourahmadi, M. (2013). High‐Dimensional Covariance Estimation with High‐Dimensional Data. John Wiley & Sons, Chichester. · Zbl 1276.62031
[32] Pourahmadi, M., Daniels, M. J., & Park, T. (2007). Simultaneous modelling of the Cholesky decomposition of several covariance matrices. Journal of Multivariate Analysis, 98, 568-587. · Zbl 1107.62043
[33] Rajaratnam, B. & Salzman, J. (2013). Best permutation analysis. Journal of Multivariate Analysis, 121, 193-223. · Zbl 1328.62341
[34] Rocha, G. V., Zhao, P., & Yu, B.2008. A path following algorithm for sparse pseudo‐likelihood inverse covariance estimation, Technical report, Statistics Department, UC Berkeley, Berkeley, CA.
[35] Rothman, A., Bickel, P., Levina, E., & Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2, 494-515. · Zbl 1320.62135
[36] Rothman, A. J., Levina, E., & Zhu, J. (2009). Generalized thresholding of large covariance matrices. Journal of the American Statistical Association, 104, 177-186. · Zbl 1388.62170
[37] Rothman, A. J., Levina, E., & Zhu, J. (2010). A new approach to Cholesky‐based covariance regularization in high dimensions. Biometrika, 97, 539-550. · Zbl 1195.62089
[38] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 58, 267-288. · Zbl 0850.62538
[39] Wagaman, A. & Levina, E. (2009). Discovering sparse covariance structures with the Isomap. Journal of Computational and Graphical Statistics, 18, 551-572.
[40] Won, J. H., Lim, J., Kim, S. J., & Rajaratnam, B. (2013). Condition number regularized covariance estimation. Journal of the Royal Statistical Society Series B, 75, 427-450. · Zbl 1411.62146
[41] Wu, W. B. & Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831-844. · Zbl 1436.62347
[42] Xiao, L., Zipunnikov, V., Ruppert, D., & Crainiceanu, C. (2016). Fast covariance estimation for high‐dimensional functional data. Statistics and Computing, 26, 409-421. · Zbl 1342.62094
[43] Xue, L., Ma, S., & Zou, H. (2012). Positive‐definite L_1‐penalized estimation of large covariance matrices. Journal of the American Statistical Association, 107, 1480-1491. · Zbl 1258.62063
[44] Yu, P. L. H., Wang, X., & Zhu, Y. (2017). High dimensional covariance matrix estimation by penalizing the matrix‐logarithm transformed likelihood. Computational Statistics and Data Analysis, 114, 12-25. · Zbl 1464.62193
[45] Yuan, M. (2008). Efficient computation of the ℓ_1 regularized solution path in Gaussian graphical models. Journal of Computational and Graphical Statistics, 17, 809-826.
[46] Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 11, 2261-2286. · Zbl 1242.62043
[47] Yuan, M. & Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94, 19-35. · Zbl 1142.62408
[48] Zheng, H., Tsui, K., Kang, X., & Deng, X. (2017). Cholesky‐based model averaging for covariance matrix estimation. Statistical Theory and Related Fields, 1, 48-58. · Zbl 07660528
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.