×

Posterior consistency of factor dimensionality in high-dimensional sparse factor models. (English) Zbl 07809908

Summary: Factor models aim to describe a dependence structure among high-dimensional random variables in terms of a low-dimensional unobserved random vector called a factor. One of the major practical issues of applying the factor model is to determine the factor dimensionality. In this paper, we propose a computationally feasible nonparametric prior distribution which achieves the posterior consistency of the factor dimensionality. We also derive the posterior contraction rate of the covariance matrix which is optimal when the factor dimensionality of the true covariance matrix is bounded. We conduct numerical studies that illustrate our theoretical results.

MSC:

62-XX Statistics

References:

[1] Ahn, S. C. and Horenstein, A. R. (2013). “Eigenvalue ratio test for the number of factors.” Econometrica, 81(3): 1203-1227. · Zbl 1274.62403 · doi:10.3982/ECTA8968
[2] Bai, J. and Ng, S. (2002). “Determining the number of factors in approximate factor models.” Econometrica, 70(1): 191-221. · Zbl 1103.91399 · doi:10.1111/1468-0262.00273
[3] Bai, J. and Ng, S. (2007). “Determining the number of primitive shocks in factor models.” Journal of Business & Economic Statistics, 25(1): 52-60. · doi:10.1198/073500106000000413
[4] Bernanke, B. S., Boivin, J., and Eliasz, P. (2005). “Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach.” The Quarterly Journal of Economics, 120(1): 387-422.
[5] Bhattacharya, A. and Dunson, D. B. (2011). “Sparse Bayesian infinite factor models.” Biometrika, 98(2): 291-306. · Zbl 1215.62025 · doi:10.1093/biomet/asr013
[6] Bunea, F., Giraud, C., Luo, X., Royer, M., and Verzelen, N. (2020). “Model-assisted variable clustering: minimax-optimal recovery and algorithms.” The Annals of Statistics. · Zbl 1441.62164 · doi:10.1214/18-AOS1794
[7] Cai, T., Ma, Z., and Wu, Y. (2015). “Optimal estimation and rank detection for sparse spiked covariance matrices.” Probability Theory and Related Fields, 161(3-4): 781-815. · Zbl 1314.62130 · doi:10.1007/s00440-014-0562-z
[8] Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q., and West, M. (2008). “High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics.” Journal of the American Statistical Association, 103(484): 1438-1456. · Zbl 1286.62091 · doi:10.1198/016214508000000869
[9] Castillo, I., Schmidt-Hieber, J., Van der Vaart, A., et al. (2015). “Bayesian linear regression with sparse priors.” The Annals of Statistics, 43(5): 1986-2018. · Zbl 1486.62197 · doi:10.1214/15-AOS1334
[10] Castillo, I. and Van Der Vaart, A. (2012). “Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences.” The Annals of Statistics, 40(4): 2069-2101. · Zbl 1257.62025 · doi:10.1214/12-AOS1029
[11] Chen, B., Chen, M., Paisley, J., Zaas, A., Woods, C., Ginsburg, G. S., Hero, A., Lucas, J., Dunson, D., and Carin, L. (2010). “Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies.” BMC Bioinformatics, 11(1): 552.
[12] Fan, J., Fan, Y., and Lv, J. (2008). “High dimensional covariance matrix estimation using a factor model.” Journal of Econometrics, 147(1): 186-197. · Zbl 1429.62185 · doi:10.1016/j.jeconom.2008.09.017
[13] Fan, J., Han, X., and Gu, W. (2012). “Estimating false discovery proportion under arbitrary covariance dependence.” Journal of the American Statistical Association, 107(499): 1019-1035. · Zbl 1395.62219 · doi:10.1080/01621459.2012.720478
[14] Fan, J., Ke, Y., Sun, Q., and Zhou, W. X. (2019). “FarmTest: Factor-Adjusted Robust Multiple Testing With Approximate False Discovery Control.” Journal of the American Statistical Association, 114(528): 1880-1893. · Zbl 1428.62345 · doi:10.1080/01621459.2018.1527700
[15] Fan, J., Liao, Y., and Mincheva, M. (2011). “High dimensional covariance matrix estimation in approximate factor models.” The Annals of Statistics, 39(6): 3320-3356. · Zbl 1246.62151 · doi:10.1214/11-AOS944
[16] Fan, J., Liao, Y., and Mincheva, M. (2013). “Large covariance estimation by thresholding principal orthogonal complements.” Journal of the Royal Statistical Society. Series B, Statistical methodology, 75(4). · Zbl 1411.62138
[17] Fan, J., Liu, H., Wang, W., et al. (2018). “Large covariance estimation through elliptical factor models.” The Annals of Statistics, 46(4): 1383-1414. · Zbl 1402.62124 · doi:10.1214/17-AOS1588
[18] Fan, J., Xue, L., and Yao, J. (2017). “Sufficient forecasting using factor models.” Journal of Econometrics, 201(2): 292-306. · Zbl 1377.62185 · doi:10.1016/j.jeconom.2017.08.009
[19] Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2003). “Do financial variables help forecasting inflation and real activity in the euro area?” Journal of Monetary Economics, 50(6): 1243-1255.
[20] Gao, C. and Zhou, H. H. (2015). “Rate-optimal posterior contraction for sparse PCA.” The Annals of Statistics, 43(2): 785-818. · Zbl 1312.62078 · doi:10.1214/14-AOS1268
[21] Ghahramani, Z. and Griffiths, T. L. (2005). “Infinite latent feature models and the Indian buffet process.” In Advances in Neural Information Processing Systems, 475-482.
[22] Goldberg, L. R. (1990). “An alternative “description of personality”: the big-five factor structure.” Journal of Personality and Social Psychology, 59(6): 1216.
[23] Hochreiter, S., Clevert, D.-A., and Obermayer, K. (2006). “A new summarization method for Affymetrix probe level data.” Bioinformatics, 22(8): 943-949.
[24] Kneip, A. and Sarda, P. (2011). “Factor models and variable selection in high-dimensional regression analysis.” The Annals of Statistics, 39(5): 2410-2447. · Zbl 1231.62131 · doi:10.1214/11-AOS905
[25] Knowles, D., Ghahramani, Z., et al. (2011). “Nonparametric Bayesian sparse factor models with application to gene expression modeling.” The Annals of Applied Statistics, 5(2B): 1534-1552. · Zbl 1223.62013 · doi:10.1214/10-AOAS435
[26] Lam, C. and Yao, Q. (2012). “Factor modeling for high-dimensional time series: inference for the number of factors.” The Annals of Statistics, 40(2): 694-726. · Zbl 1273.62214 · doi:10.1214/12-AOS970
[27] Latala, R. (2005). “Some estimates of norms of random matrices.” Proceedings of the American Mathematical Society, 133(5): 1273-1282. · Zbl 1067.15022 · doi:10.1090/S0002-9939-04-07800-1
[28] Leek, J. T. and Storey, J. D. (2008). “A general framework for multiple testing dependence.” Proceedings of the National Academy of Sciences, 105(48): 18718-18723. · Zbl 1359.62202
[29] Li, Z., Wang, Q., Yao, J., et al. (2017). “Identifying the number of factors from singular values of a large sample auto-covariance matrix.” The Annals of Statistics, 45(1): 257-288. · Zbl 1426.62262 · doi:10.1214/16-AOS1452
[30] Martin, R., Mess, R., and Walker, S. G. (2017). “Empirical Bayes posterior concentration in sparse high-dimensional linear models.” Bernoulli, 23(3): 1822-1847. · Zbl 1450.62085 · doi:10.3150/15-BEJ797
[31] McCrae, R. R. and John, O. P. (1992). “An introduction to the five-factor model and its applications.” Journal of Personality, 60(2): 175-215.
[32] Ning, B. (2021). “Spike and slab Bayesian sparse principal component analysis.” arXiv preprint arXiv:2102.00305.
[33] Ohn, I. and Kim, Y. (2021). “Supplementary Material to “Posterior consistency of factor dimensionality in high-dimensional sparse factor models”.” Bayesian Analysis. · doi:10.1214/21-BA1261SUPP
[34] Onatski, A. (2010). “Determining the number of factors from empirical distribution of eigenvalues.” The Review of Economics and Statistics, 92(4): 1004-1016.
[35] Paisley, J. and Carin, L. (2009). “Nonparametric factor analysis with beta process priors.” In Proceedings of the 26th Annual International Conference on Machine Learning, 777-784. ACM.
[36] Pati, D., Bhattacharya, A., Pillai, N. S., and Dunson, D. (2014). “Posterior contraction in sparse Bayesian factor models for massive covariance matrices.” The Annals of Statistics, 42(3): 1102-1130. · Zbl 1305.62124 · doi:10.1214/14-AOS1215
[37] Rockova, V. and George, E. I. (2016). “Fast Bayesian factor analysis via automatic rotations to sparsity.” Journal of the American Statistical Association, 111(516): 1608-1622. · doi:10.1080/01621459.2015.1100620
[38] Rousseau, J. and Szabo, B. (2017). “Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator.” The Annals of Statistics, 45(2): 833-865. · Zbl 1371.62048 · doi:10.1214/16-AOS1469
[39] Silva, A. P. D. (2011). “Two-group classification with high-dimensional correlated data: A factor model approach.” Computational Statistics & Data Analysis, 55(11): 2975-2990. · Zbl 1218.62064 · doi:10.1016/j.csda.2011.05.002
[40] Srivastava, S., Engelhardt, B. E., and Dunson, D. B. (2017). “Expandable factor analysis.” Biometrika, 104(3): 649-663. · Zbl 07072233 · doi:10.1093/biomet/asx030
[41] Stock, J. H. and Watson, M. W. (2002). “Forecasting Using Principal Components from a Large Number of Predictors.” Journal of the American Statistical Association, 97(460): 1167-1179. · Zbl 1041.62081 · doi:10.1198/016214502388618960
[42] Teh, Y. W., Grür, D., and Ghahramani, Z. (2007). “Stick-breaking construction for the Indian buffet process.” In Artificial Intelligence and Statistics, 556-563.
[43] Xie, F., Xu, Y., Priebe, C. E., and Cape, J. (2018). “Bayesian estimation of sparse spiked covariance matrices in high dimensions.” arXiv preprint arXiv:1808.07433
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.