×

Positive-definite thresholding estimators of covariance matrices with zeros. (English) Zbl 07723936

Summary: A positive definite estimator of a covariance matrix with zero entries provides a valid covariance matrix that can be used an input in almost any area of multivariate statistical analysis. However, most current approaches do not yet guarantee positive definiteness or deal with the asymptotic efficiency of the covariance estimator. Focusing on the classical setting when the number of Gaussian variables is fixed and the sample size increases, we construct a positive definite and asymptotically efficient estimator by the iterative conditional fitting algorithm (Chaudhuri et al., 2007) when the location of the zero entries is known. If the location of the zero entries is unknown, we further construct a positive definite thresholding estimator by combining the iterative conditional fitting algorithm with thresholding. We prove our thresholding estimator is asymptotically efficient with probability tending to one. In simulation studies, we show our estimator more closely matches the true covariance and more correctly identifies the non-zero entries than competing estimators. We apply our estimator to a neuroimaging study of Huntington disease to detect non-zero correlations among brain regional volumes. Such correlations are timely for ongoing treatment studies to inform how different brain regions are likely to be affected by these treatments.

MSC:

62Hxx Multivariate analysis
62H12 Estimation in multivariate analysis
62F12 Asymptotic properties of parametric estimators

Software:

spcov; ElemStatLearn
Full Text: DOI

References:

[1] Abadir, K. M.; Magnus, J. R., Matrix Algebra, Econometric Exercises, vol. 1 (2005), Cambridge University Press: Cambridge University Press New York · Zbl 1084.15001
[2] Anderson, T. W., Asymptotically efficient estimation of covariance matrices with linear structure, Ann. Statist., 1, 1, 135-141 (1973) · Zbl 0296.62022
[3] Anderson, T. W., An Introduction To Multivariate Statistical Analysis, 3rd Edition (2003), John Wiley & Sons, New York · Zbl 1039.62044
[4] Bickel, P. J.; Levina, E., Covariance regularization by thresholding, Ann. Statist., 36, 6, 2577-2604 (2008) · Zbl 1196.62062
[5] Bickel, P. J.; Levina, E., Regularized estimation of large covariance matrices, Ann. Statist., 36, 1, 199-227 (2008) · Zbl 1132.62040
[6] Bien, J.; Tibshirani, R. J., Sparse estimation of a covariance matrix, Biometrika, 98, 4, 807-820 (2011) · Zbl 1228.62063
[7] Cai, T.; Liu, W., Adaptive thresholding for sparse covariance matrix estimation, J. Amer. Statist. Assoc., 106, 494, 672-684 (2011) · Zbl 1232.62086
[8] Chaudhuri, S.; Drton, M.; Richardson, T. S., Estimation of a covariance matrix with zeros, Biometrika, 94, 1, 199-216 (2007) · Zbl 1143.62032
[9] Coppen, E. M.; van der Grond, J.; Hafkemeijer, A.; Rombouts, S. A.; Roos, R. A., Early grey matter changes in structural covariance networks in Huntington’s disease, NeuroImage Clin., 12, 806-814 (2016)
[10] Davis, R. A.; Zang, P.; Zheng, T., Sparse vector autoregressive modeling, J. Comput. Graph. Statist., 25, 4, 1077-1096 (2016)
[11] Degnan, A. J.; Levy, L. M., Neuroimaging of rapidly progressive dementias, part 1: neurodegenerative etiologies, Am. J. Neuroradiol., 35, 3, 418-423 (2014)
[12] Drton, M.; Perlman, M. D., Multiple testing and error control in Gaussian graphical model selection, Statist. Sci., 22, 3, 430-449 (2007) · Zbl 1246.62143
[13] El Karoui, N., Operator norm consistent estimation of large-dimensional sparse covariance matrices, Ann. Statist., 36, 6, 2717-2756 (2008) · Zbl 1196.62064
[14] Fan, J.; Liao, Y.; Liu, H., An overview of the estimation of large covariance and precision matrices, Econom. J., 19, 1, C1-C32 (2016) · Zbl 1521.62083
[15] Van de Geer, S.; Bühlmann, P.; Zhou, S., The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso), Electron. J. Stat., 5, 688-749 (2011) · Zbl 1274.62471
[16] Hastie, T.; Tibshirani, R.; Friedman, J. H.; Friedman, J. H., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, vol. 2 (2009), Springer, New York · Zbl 1273.62005
[17] Hsu, H.-L.; Ing, C.-K.; Tong, H., On model selection from a finite family of possibly misspecified time series models, Ann. Statist., 47, 2, 1061-1087 (2019) · Zbl 1418.62333
[18] Huang, J. Z.; Liu, N.; Pourahmadi, M.; Liu, L., Covariance matrix selection and estimation via penalised normal likelihood, Biometrika, 93, 1, 85-98 (2006) · Zbl 1152.62346
[19] Li, D.; Zou, H., SURE information criteria for large covariance matrix estimation and their asymptotic properties, IEEE Trans. Inform. Theory, 62, 4, 2153-2169 (2016) · Zbl 1359.94270
[20] Liu, H.; Wang, L.; Zhao, T., Sparse covariance matrix estimation with eigenvalue constraints, J. Comput. Graph. Statist., 23, 2, 439-459 (2014)
[21] Lv, J.; Liu, J. S., Model selection principles in misspecified models, J. R. Stat. Soc. Ser. B Stat. Methodol., 141-167 (2014) · Zbl 1411.62218
[22] Milovanovic, N.; Damjanovic, A.; Milovanovic, S.; Duisin, D.; Malis, M.; Stankovic, G.; Rankovic, A.; Latas, M.; F Filipovic, B.; R Filipovic, B., Reliability of the bicaudate parameter in the revealing of the enlarged lateral Ventricles in schizophrenia patients, Psychiatria Danubina, 30, 2, 150-156 (2018)
[23] Minkova, L.; Eickhoff, S. B.; Abdulkadir, A.; Kaller, C. P.; Peter, J.; Scheller, E.; Lahr, J.; Roos, R. A.; Durr, A.; Leavitt, B. R., Large-scale brain network abnormalities in H untington’s disease revealed by structural covariance, Hum. Brain Mapping, 37, 1, 67-80 (2016)
[24] Monahan, J. F., A Primer on Linear Models (2008), CRC Press: CRC Press New York · Zbl 1152.62043
[25] Pourahmadi, M., High-Dimensional Covariance Estimation: With High-Dimensional Data, vol. 882 (2013), John Wiley & Sons · Zbl 1276.62031
[26] Qiu, Y.; Liyanage, J. S., Threshold selection for covariance estimation, Biometrics, 75, 3, 895-905 (2019) · Zbl 1436.62622
[27] Reiner, A.; Dragatsis, I.; Dietrich, P., Genetics and neuropathology of Huntington’s disease, Int. Rev. Neurobiol., 98, 325-372 (2011)
[28] Rodrigues, F. B.; Byrne, L. M.; Tortelli, R.; Johnson, E. B.; Wijeratne, P. A.; Arridge, M.; De Vita, E.; Ghazaleh, N.; Houghton, R.; Furby, H., Longitudinal dynamics of mutant huntingtin and neurofilament light in Huntington’s disease: the prospective HD-CSF study (2020), MedRxiv
[29] Rothman, A. J., Positive definite estimators of large covariance matrices, Biometrika, 99, 3, 733-740 (2012) · Zbl 1437.62595
[30] Rothman, A. J.; Levina, E.; Zhu, J., Generalized thresholding of large covariance matrices, J. Amer. Statist. Assoc., 104, 485, 177-186 (2009) · Zbl 1388.62170
[31] Stone, M., An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion, J. R. Stat. Soc. Ser. B Stat. Methodol., 39, 1, 44-47 (1977) · Zbl 0355.62002
[32] Tabrizi, S. J.; Leavitt, B. R.; Landwehrmeyer, G. B.; Wild, E. J.; Saft, C.; Barker, R. A.; Blair, N. F.; Craufurd, D.; Priller, J.; Rickards, H.; Rosser, A.; Kordasiewicz, H. B.; Czech, C.; Swayze, E. E.; Norris, D. A.; Baumann, T.; Gerlach, I.; Schobel, S. A.; Paz, E.; Smith, A. V.; Bennett, F.; Lane, R. M., Targeting huntingtin expression in patients with Huntington’s disease, N. Engl. J. Med., 380, 24, 2307-2316 (2019)
[33] G. Tarmast, Multivariate log-normal distribution, in: International Statistical Institute: Seoul 53rd Session, vol. 210, 2001.
[34] Wang, M.; Allen, G. I., Thresholded graphical Lasso adjusts for latent variables, Biometrika (2022)
[35] White, H., Maximum likelihood estimation of misspecified models, Econometrica, 1-25 (1982) · Zbl 0478.62088
[36] Williams, M., An introduction to the caudate in schizophrenia, CNS J., 2, 40-42 (2016)
[37] Xue, L.; Ma, S.; Zou, H., Positive-definite \(\ell 1\)-penalized estimation of large covariance matrices, J. Amer. Statist. Assoc., 107, 500, 1480-1491 (2012) · Zbl 1258.62063
[38] Yu, D.; Zhang, X.; Yau, K. K., Asymptotic properties and information criteria for misspecified generalized linear mixed models, J. R. Stat. Soc. Ser. B Stat. Methodol., 80, 4, 817-836 (2018) · Zbl 1398.62199
[39] Zwiernik, P.; Uhler, C.; Richards, D., Maximum likelihood estimation for linear Gaussian covariance models, J. R. Stat. Soc. Ser. B Stat. Methodol., 79, 4, 1269-1292 (2017) · Zbl 1373.62267
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.