×

Bayesian approximate kernel regression with variable selection. (English) Zbl 1409.62132

Summary: Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this article, we propose a novel framework that provides an effect size analog for each explanatory variable in Bayesian kernel regression models when the kernel is shift-invariant – for example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. This projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion, we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e., phenotypic prediction) and association mapping (i.e., inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.

MSC:

62J02 General nonlinear regression
62F15 Bayesian inference
62J12 Generalized linear models (logistic models)

Software:

LDAK; BGLR; Kernlab

References:

[1] Bach, F., On the Equivalence Between Kernel Quadrature Rules and Random Feature Expansions, Journal of Machine Learning Research, 18, 1-38 (2017) · Zbl 1435.65045
[2] Băzăvan, E. G.; Li, F.; Sminchisescu, C., Fourier Kernel Learning, 459-473 (2012), New York: Springer, New York
[3] Bochner, S., A Theorem on Fourier-Stieltjes Integrals, Bulletin of the American Mathematical Society, 40, 271-276 (1934) · JFM 60.0221.03
[4] Chakraborty, S., Bayesian Binary Kernel Probit Model for Microarray Based Cancer Classification and Gene Selection, Computational Statistics & Data Analysis, 53, 4198-4209 (2009) · Zbl 1453.62061
[5] Chakraborty, S.; Ghosh, M.; Mallick, B. K., Bayesian Non-Linear Regression for Large p Small n Problems, Journal of Multivariate Analysis, 108, 28-40 (2012) · Zbl 1238.62076
[6] Chakraborty, S.; Mallick, B. K.; Ghosh, D.; Ghosh, M.; Dougherty, E., Gene Expression-Based Glioma Classification Using Hierarchical Bayesian Vector Machines, Sankhyā: The Indian Journal of Statistics, 69, 514-547 (2007) · Zbl 1193.62187
[7] Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S., Choosing Multiple Parameters for Support Vector Machines, Machine Learning, 46, 131-159 (2002) · Zbl 0998.68101
[8] Cotter, A.; Keshet, J.; Srebro, N., Explicit Approximations of the Gaussian Kernel, (2011)
[9] Darnell, G.; Georgiev, S.; Mukherjee, S.; Engelhardt, B. E., Adaptive Randomized Dimension Reduction on Massive Data,, The Journal of Machine Learning Research, 18, 5134-5163 (2017) · Zbl 1442.62153
[10] De Los Campos, G.; Gianola, D.; Rosa, G. J. M.; Weigel, K. A.; Crossa, J., Semi-Parametric Genomic-Enabled Prediction of Genetic Values using Reproducing Kernel Hilbert Spaces Methods, Genetics Research (Cambridge), 92, 295-308 (2010)
[11] De Ridder, L.; Weersma, R. K.; Dijkstra, G.; Van Der Steege, G.; Benninga, M. A.; Nolte, I. M.; Taminiau, J. A.; Hommes, D. W.; Stokkers, P. C., Genetic Susceptibility has a More Important Role in Pediatric-Onset Crohn’s Disease Than in Adult-Onset Crohn’s Disease, Inflammatory Bowel Diseases, 13, 1083-1092 (2007)
[12] Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R., Least Angle Regression, Annals of Statistics, 32, 407-499 (2004) · Zbl 1091.62054
[13] George, E.; Mcculloch, R., Variable Selection via Gibbs Sampling, Journal of the American Statistical Association, 88, 881-889 (1993)
[14] Gray-Davies, T.; Holmes, C. C.; Caron, F., Scalable Bayesian Nonparametric Regression via a Plackett-Luce Model for Conditional Ranks, Electronic Journal of Statistics, 10, 1807-1828 (2016) · Zbl 1397.62148
[15] Habier, D.; Fernando, R. L.; Kizilkaya, K.; Garrick, D. J., Extension of the Bayesian Alphabet for Genomic Selection, BMC Bioinformatics, 12, 1-12 (2011)
[16] Hahn, P.; Carvalho, C.; Mukherjee, S., Partial Factor Modeling: Predictor-Dependent Shrinkage for Linear Regression, Journal of the American Statistical Association, 808, 999-1008 (2013) · Zbl 06224982
[17] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2001), New York: Springer-Verlag, New York · Zbl 0973.62007
[18] Hemani, G.; Knott, S.; Haley, C., An Evolutionary Perspective on Epistasis and the Missing Heritability, PLoS Genetics, 9, e1003295 (2013)
[19] Hoerl, A. E.; Kennard, R. W., Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, 42, 80-86 (2000)
[20] Hoti, F.; Sillanpaa, M. J., Bayesian Mapping of Genotype Expression Interactions in Quantitative and Qualitative Traits, Heredity, 97, 4-18 (2006)
[21] Jiang, Y.; Reif, J. C., Modeling Epistasis in Genomic Selection, Genetics, 201, 759-768 (2015)
[22] Jones, M. C.; Marron, J. S.; Sheather, S. J., A Brief Survey of Bandwidth Selection for Density Estimation, Journal of the American Statistical Association, 91, 401-407 (1996) · Zbl 0873.62040
[23] Karatzoglou, A.; Smola, A.; Hornik, K., Kernlab - An S4 Package for Kernel Methods in R, Journal of Statistical Software, 11, 1-20 (2004)
[24] Mackay, T. F. C., Epistasis and Quantitative Traits: Using Model Organisms to Study Gene-Gene Interactions, Nature Reviews Genetics, 15, 22-33 (2014)
[25] Mallick, B. K.; Ghosh, D.; Ghosh, M., Bayesian Classification of Tumours by Using Gene Expression Data,, Journal of the Royal Statistical Society, 67, 219-234 (2005) · Zbl 1069.62100
[26] Mercer, J., Functions of Positive and Negative Type and Their Connection with the Theory of Integral Equations, Philosophical Transactions of the Royal Society, London A, 209, 415-446 (1909) · JFM 40.0408.02
[27] Park, T.; Casella, G., The Bayesian Lasso, Journal of the American Statistical Association, 103, 681-686 (2008) · Zbl 1330.62292
[28] Pérez, P.; De Los Campos, G., Genome-Wide Regression and Prediction with the BGLR Statistical Package, Genetics, 198, 483 (2014)
[29] Pillai, N. S.; Wu, Q.; Liang, F.; Mukherjee, S.; Wolpert, R., Characterizing the Function Space for Bayesian Kernel Models, Journal of Machine Learning Research, 8, 1769-1797 (2007) · Zbl 1222.62039
[30] Rahimi, A.; Recht, B., Random Features for Large-Scale Kernel Machines, Neural Information Processing Systems (NIPS), 3, 5 (2007)
[31] Rakotomamonjy, A., Variable Selection using SVM-Based Criteria, Journal of Machine Learning Research, 3, 1357-1370 (2003) · Zbl 1102.68583
[32] Rasmussen, C. E.; Williams, C. K. I., Gaussian Processes for Machine Learning (2006), Cambridge, MA: MIT Press, Cambridge, MA · Zbl 1177.68165
[33] Roman, K.; Speed, T., A Model Selection Approach for the Identification of Quantitative Trait Loci in Experimental Crosses” (with discussion),, Journal of the Royal Statistical Society, 64, 737-775 (2002)
[34] Rosasco, L.; Villa, S.; Mosci, S.; Santoro, M.; Verri, A., Nonparametric Sparsity and Regularization, Journal of Machine Learning Research, 14, 1665-1714 (2013) · Zbl 1317.68183
[35] Rudi, A.; Camoriano, R.; Rosasco, L., Advances in Neural Information Processing Systems, Less is More: Nyström Computational Regularization,, 1657-1665 (2015)
[36] Schölkopf, B.; Herbrich, R.; Smola, A. J., Proceedings of the 14th Annual Conference on Computational Learning Theory and 5th European Conference on Computational Learning Theory, A Generalized Representer Theorem,, 416-426 (2001), London, UK: Springer-Verlag, London, UK · Zbl 0992.68088
[37] Schölkopf, B.; Smola, A. J., Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (2002), Cambridge, MA: MIT Press, Cambridge, MA
[38] Sharp, K.; Wiegerinck, W.; Arias-Vasquez, A.; Franke, B.; Marchini, J.; Albers, C. A.; Kappen, H. J., Explaining Missing Heritability Using Gaussian Process Regression, (2016)
[39] Snoek, J.; Rippel, O.; Swersky, K.; Kiros, R.; Satish, N.; Sundaram, N.; Patwary, M.; Prabhat, M.; Adams, R., Proceedings of the 32nd International Conference on Machine Learning, 37, Scalable Bayesian Optimization Using Deep Neural Networks,, 2171-2180 (2015), PMLR
[40] Speed, D.; Balding, D. J., MultiBLUP: Improved SNP-Based Prediction for Complex Traits, Genome Research, 24, 1550-1557 (2014)
[41] Stephens, M.; Balding, D. J., Bayesian Statistical Methods for Genetic Association Studies, Nature Reviews Genetics, 10, 681-690 (2009)
[42] The Wellcome Trust Case Control Consortium., Genome-Wide Association Study of 14, 000 Cases of Seven Common Diseases and 3,000 Shared Controls, Nature, 447, 661-678 (2007)
[43] Tibshirani, R., Regression Shrinkage and Selection via the Lasso,, Journal of the Royal Statistical Society, 58, 267-288 (1996) · Zbl 0850.62538
[44] Valdar, W.; Solberg, L. C.; Gauguier, D.; Burnett, S.; Klenerman, P.; Cookson, W. O.; Taylor, M. S.; Rawlins, J. N. P.; Mott, R.; Flint, J., Genome-Wide Genetic Association of Complex Traits in Heterogeneous Stock Mice, Nature Genetics, 38, 879-887 (2006)
[45] Wahba, G., Series in Applied Mathematics, 59, Splines Models for Observational Data, (1990), Philadelphia, PA: SIAM, Philadelphia, PA · Zbl 0813.62001
[46] ———, Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized GACV, Neural Information Processing Systems (NIPS), 6, 69-87 (1997)
[47] Wan, X.; Yang, C.; Yang, Q.; Xue, H.; Fan, X.; Tang, N. L. S.; Yu, W., BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies, The American Journal of Human Genetics, 87, 325-340 (2010)
[48] West, M., Bayesian Factor Regression Models in the “Large p, Small n” Paradigm, Bayesian Statistics, 7, 733-742 (2003)
[49] Zhang, Y., A Novel Bayesian Graphical Model for Genome-Wide Multi-SNP Association Mapping, Genetic Epidemiology, 36, 36-47 (2012)
[50] Zhang, Z.; Dai, G.; Jordan, M. I., Bayesian Generalized Kernel Mixed Models, Journal of Machine Learning Research, 12, 111-139 (2011) · Zbl 1280.68221
[51] Zhou, X.; Carbonetto, P.; Stephens, M., Polygenic Modeling with Bayesian Sparse Linear Mixed Models, PLoS Genetics, 9, e1003264 (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.