×

Learning from a lot: empirical Bayes for high-dimensional model-based prediction. (English) Zbl 1417.62018

Empirical Bayes (EB) is understood as a collection of methods to estimate the tuning parameters or prior parameters in a Bayesian framework, from data. In the article one refers to high-dimensional prediction methods, i.e., \(p>n\) where p is the number of predictors and \(n\) is the number of independent samples. The focus is on model-based prediction. Several versions of EB methodologies are reviewed: MMLU EB, that means maximize the marginal likelihood product derived from univariate models, or MMIJ EB, that means maximize marginal likelihood derived from a joint model with different variants to estimate the corresponding integral, direct EB, Laplace EB, Markov chain Monte Carlo EB, variational Bayes-EB. The cross-validation as an alternative to EB and hybrid solutions are discussed. Shown is also a simple EB estimator for linear regression. Finally two prediction examples on how EB can be applied when co-data are used are presented.

MSC:

62C12 Empirical decision procedures; empirical Bayes procedures
62J05 Linear regression; mixed models

References:

[1] Bar, H. Y., & Schifano, E. (2011). Empirical and fully Bayesian approaches for random effects models in microarray data analysis. Statistical Modelling, 11, 71-88. · Zbl 07256838
[2] Barber, R. F., Drton, M., & Tan, K. M. (2016). Laplace approximation in high‐dimensional Bayesian regression. In A.Frigessi (ed.), P.Bühlmann (ed.), I.Glad (ed.), M.Langaas (ed.), S.Richardson (ed.), & M.Vannucci (ed.) (Eds.), Statistical analysis for high‐dimensional data. Abel Symposium, vol. 11. Cham, Switzerland: Springer International Publishing Switzerland. · Zbl 1341.62038
[3] Basu, R., Ghosh, J. K., & Mukerjee, R. (2003). Empirical Bayes prediction intervals in a normal regression model: Higher order asymptotics. Statistics & Probability Letters, 63, 197-203. · Zbl 1116.62305
[4] Beal, M. J., & Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: With application to scoring graphical model structures. In Bayesian statistics 7 (pp. 453-464). Oxford, UK: Oxford University Press.
[5] Belitser, E., & Nurushev, N. (2015). Needles and straw in a haystack: Robust confidence for possibly sparse sequences. Unpublished manuscript. arXiv:1511.01803. Retrieved from: https://arxiv.org/abs/1511.01803 · Zbl 1441.62110
[6] Bergersen, L. C., Glad, I. K., & Lyng, H. (2011). Weighted lasso with data integration. Statistical Applications in Genetics and Molecular Biology, 10, 1-29. · Zbl 1296.92017
[7] Bickel, P. J., & Levina, E. (2004). Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli, 10, 989-1010. · Zbl 1064.62073
[8] Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association, 112, 859-877.
[9] Booth, J. G., & Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61, 265-285. · Zbl 0917.62058
[10] Boulesteix, A.‐L., De Bin, R., Jiang, X., & Fuchs, M. (2017). IPF‐LASSO: Integrative L_1‐penalized regression with penalty factors for prediction based on multi‐omics data. Computational and Mathematical Methods in Medicine, 2017. · Zbl 1370.92016
[11] Carbonetto, P., & Stephens, M. (2012). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis, 7, 73-108. · Zbl 1330.62089
[12] Carlin, B. P., & Louis, T. A. (2000). Bayes and Empirical Bayes methods for data analysis (2nd ed.). New York, NY: Chapman and Hall/CRC. · Zbl 1017.62005
[13] Casella, G. (2001). Empirical Bayes Gibbs sampling. Biostatistics, 4, 485-500. · Zbl 1097.62505
[14] Castillo, I., & van derVaart, A. W. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Annals of Statistics, 40, 2069-2101. · Zbl 1257.62025
[15] Cule, E., & De Iorio, M. (2013). Ridge regression in prediction problems: Automatic choice of the ridge parameter. Genetic Epidemiology, 37, 704-714.
[16] Dicker, L. H., & Zhao, S. D. (2016). High‐dimensional classification via nonparametric empirical Bayes and maximum likelihood inference. Biometrika, 103, 21-34. · Zbl 1452.62440
[17] Duin, R. P. W. (2000). Classifiers in almost empty spaces. In IEEE Proceedings of the 15th International Conference on Pattern Recognition, pp. 1-7.
[18] Efron, B. (2009). Empirical Bayes estimates for large‐scale prediction problems. Journal of the American Statistical Association, 104, 1015-1028. · Zbl 1388.62009
[19] Efron, B. (2010). Large‐scale inference: Empirical Bayes methods for estimation, testing, and prediction. Cambridge, UK: Cambridge University Press. · Zbl 1277.62016
[20] Efron, B., & Morris, C. (1975). Data analysis using Stein’s estimator and its generalizations. Journal of the American Statistical Association, 70, 311-319. · Zbl 0319.62018
[21] Gelfand, A. E., & Smith, A. F. M. (1990). Sampling‐based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398-409. · Zbl 0702.62020
[22] George, E. I., & Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika, 87, 731-747. · Zbl 1029.62008
[23] Heisterkamp, S., vanHouwelingen, J., & Downs, A. (1999). Empirical Bayesian estimators for a Poisson process propagated in time. Biometrical Journal, 41, 385-400. · Zbl 0944.62009
[24] Hoerl, A. E., Kennard, R. W., & Baldwin, K. F. (1975). Ridge regression: Some simulations. Communications in Statistics, 4, 105-123. · Zbl 0296.62062
[25] Ishwaran, H., & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Annals of Statistics, 33, 730-773. · Zbl 1068.62079
[26] Jiang, J., Li, C., Debashis, P., Yang, C., & Zhao, H. (2016a). On high‐dimensional misspecified mixed model analysis in genome‐wide association study. Annals of Statistics, 44, 2127-2160. · Zbl 1358.62095
[27] Jiang, Y., He, Y., & Zhang, H. (2016b). Variable selection with prior information for generalized linear models via the prior lasso method. Journal of the American Statistical Association, 111, 355-376.
[28] Johnstone, I., & Silverman, B. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Annals of Statistics, 32, 1594-1649. · Zbl 1047.62008
[29] Joo, L. (2017), Bayesian lasso: An extension for genome‐wide association study (PhD thesis). New York University, New York, NY. Retrieved from https://search.proquest.com/openview/197063cc5f32e5e0aaac827bbfa7791a/
[30] Karabatsos, G. (2017). Marginal maximum likelihood estimation methods for the tuning parameters of ridge, power ridge, and generalized ridge regression. Communications in Statistics ‐ Simulation and Computation Advance online publication https://doi.org/10.1080/03610918.2017.1321119 · Zbl 07550058 · doi:10.1080/03610918.2017.1321119
[31] Kuhn, E., & Lavielle, M. (2004). Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: Probability and Statistics, 8, 115-131. · Zbl 1155.62420
[32] Le Cessie, S., & vanHouwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41, 191-201. · Zbl 0825.62593
[33] Leday, G. G. R., deGunst, M., Kpogbezan, G. B., van derVaart, A. W., vanWieringen, W. N., & van deWiel, M. A. (2017). Gene network reconstruction using global‐local shrinkage priors. Annals of Applied Statistics, 11, 41-68. · Zbl 1366.62227
[34] Levine, R. A., & Casella, G. (2001). Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics, 10, 422-439.
[35] Li, Q., & Lin, N. (2010). The Bayesian elastic net. Bayesian Analysis, 5, 151-170. · Zbl 1330.65026
[36] Meier, L., van deGeer, S., & Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 53-71. · Zbl 1400.62276
[37] Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications. Journal of the American Statistical Association, 78, 47-55. · Zbl 0506.62005
[38] Neuenschwander, B., Roychoudhury, S., & Schmidli, H. (2016). On the use of co‐data in clinical trials. Statistics in Biopharmaceutical Research, 8, 345-354.
[39] Newcombe, P. J., Raza Ali, H., Blows, F. M., Provenzano, E., Pharoah, P. D., Caldas, C., & Richardson, S. (2014). Weibull regression with Bayesian variable selection to identify prognostic tumour markers of breast cancer survival. Statistical Methods in Medical Research, 26, 1-23.
[40] Novianti, P. W., Snoek, B. C., Wilting, S. M., & van deWiel, M. A. (2017). Better diagnostic signatures from RNAseq data through use of auxiliary co‐data. Bioinformatics, 33, 1572-1574.
[41] O’Hara, R. B., & Sillanpää, M. J. (2009). A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis, 4, 85-117. · Zbl 1330.62291
[42] Park, T., & Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association, 103, 681-686. · Zbl 1330.62292
[43] Peltola, T., Marttinen, P., & Vehtari, A. (2012). Finite adaptation and multistep moves in the Metropolis‐Hastings algorithm for variable selection in genome‐wide association analysis. PLoS One, 7, e49445.
[44] Polson, N. G., Scott, J. G., & Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. Journal of the American Statistical Association, 108, 1339-1349. · Zbl 1283.62055
[45] Quintana, M. A., & Conti, D. V. (2013). Integrative variable selection via Bayesian model uncertainty. Statistics in Medicine, 32, 4938-4953.
[46] Ročková, V., & George, E. I. (2014). EMVS: The EM approach to Bayesian variable selection. Journal of the American Statistical Association, 109, 828-846. · Zbl 1367.62049
[47] Rousseau, J., & Szabo, B. (2017). Asymptotic behaviour of the empirical Bayes posteriors associated to maximum marginal likelihood estimator. Annals of Statistics, 45, 833-865. · Zbl 1371.62048
[48] Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 319-392. · Zbl 1248.62156
[49] Scott, J. G., & Berger, J. O. (2010). Bayes and empirical‐Bayes multiplicity adjustment in the variable‐selection problem. Annals of Statistics, 38, 2587-2619. · Zbl 1200.62020
[50] Shun, Z., & McCullagh, P. (1995). Laplace approximation of high dimensional integrals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 57, 749-760. · Zbl 0826.41026
[51] Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse‐group lasso. Journal of Computational and Graphical Statistics, 22, 231-245.
[52] Stingo, F. C., Chen, Y. A., Tadesse, M. G., & Vannucci, M. (2011). Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes. Annals of Applied Statistics, 5, 1202-1214.
[53] Taddy, M., Chen, C., & Yun, J. (2015). Bayesian and empirical Bayesian forests. Unpublished manuscript. arXiv:1502.02312. Retrieved from https://arxiv.org/abs/1502.02312
[54] Tai, F., & Pan, W. (2007). Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms. Bioinformatics, 23, 1775-1782.
[55] Te Beest, D. E, Mes, S. W., Wilting, S. M., Brakenhoff, R. H., & van deWiel, M. A. (2017). Improved high‐dimensional prediction with random forests by the use of co‐data. BMC Bioinformatics, 18, 584.
[56] Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. In Proceedings of the National Academy of Sciences of the United States of America, 99, 6567-6572.
[57] Waldron, L., Pintilie, M., Tsao, M. S., Shepherd, F. A., Huttenhower, C., & Jurisica, I. (2011). Optimized application of penalized regression methods to diverse genomic data. Bioinformatics, 27, 3399-3406.
[58] Wei, G. C. G., & Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American statistical Association, 85, 699-704.
[59] van deWiel, M. A., Leday, G. G. R., Pardo, L., Rue, H., van derVaart, A. W., & Van Wieringen, W. N. (2012). Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics, 14, 113-128.
[60] van deWiel, M. A., Lien, T. G., Verlaat, W., vanWieringen, W. N., & Wilting, S. M. (2016). Better prediction by use of co‐data: Adaptive group‐regularized ridge regression. Statistics in Medicine, 35, 368-381.
[61] Van Houwelingen, H. C. (2014). The role of empirical Bayes methodology as a leading principle in modern medical statistics. Biometrical Journal, 56, 919-932. · Zbl 1441.62519
[62] Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.