×

SNP selection for predicting a quantitative trait. (English) Zbl 1514.62883

Summary: Molecular markers combined with powerful statistical tools have made it possible to detect and analyze multiple loci on the genome that are responsible for the phenotypic variation in quantitative traits. The objectives of the study presented in this paper are to identify a subset of single nucleotide polymorphism (SNP) markers that are associated with a particular trait and to construct a model that can best predict the value of the trait given the genotypic information of the SNPs using a three-step strategy. In the first step, a genome-wide association test is performed to screen SNPs that are associated with the quantitative trait of interest. SNPs with \(p\)-values of less than 5% are then analyzed in the second step. In the second step, a large number of randomly selected models, each consisting of a fixed number of randomly selected SNPs, are analyzed using the least angle regression method. This step will further remove redundant SNPs due to the complicated association among SNPs. A subset of SNPs that are shown to have a significant effect on the response trait more often than by chance are considered for the third step. In the third step, two alternative methods are considered: the least angle shrinkage and selection operation and sparse partial least squares regression. For both methods, the predictive ability of the fitted model is evaluated by an independent test set. The performance of the proposed method is illustrated by the analysis of a real data set on Canadian Holstein cattle.

MSC:

62-XX Statistics

Software:

bootstrap; alr3; R
Full Text: DOI

References:

[1] Banos, G., Woolliams, J. A., Woodward, B. W., Forbes, A. B. and Coffey, M. P. 2008. Impact of single nucleotide polymorphisms in leptin, leptin receptor, growth hormone receptor, and diacylglycerol acyltransferase (dgat1) gene loci on milk production, feed, and body energy traits of UK dairy cows. J. Dairy Sci., 91: 3190-3200. (doi:10.3168/jds.2007-0930) · doi:10.3168/jds.2007-0930
[2] Chun, H. and Keles, S. 2010. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Statist. Soc. B, 72(1): 3-25. (doi:10.1111/j.1467-9868.2009.00723.x) · Zbl 1411.62184 · doi:10.1111/j.1467-9868.2009.00723.x
[3] Daetwyler, H. D., Schenkel, F. S., Sargolzaei, M. and Robinson, J. A.B. 2008. A genome scan to detect quantative trait loci for economically important traits in holstein cattle using two methods and a dense single nucleotide polymorphism map. J. Dairy Sci., 91: 3225-3236. (doi:10.3168/jds.2007-0333) · doi:10.3168/jds.2007-0333
[4] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. 2004. Least angle regression. Ann. Statist., 32(2): 407-499. (doi:10.1214/009053604000000067) · Zbl 1091.62054 · doi:10.1214/009053604000000067
[5] Efron, B. and Tibshirani, R. 1993. An Introduction to the Bootstrap, New York: Chapman and Hall. · Zbl 0835.62038 · doi:10.1007/978-1-4899-4541-9
[6] Feng, Z., Wong, W., Gao, X. and Schenkel, F. 2011. Generalized genetic association study with samples of related subjects. Ann. Appl. Stat., Available at http://www.imstat.org/aoas/next_issue.html. · Zbl 1228.62140
[7] Hoerl, A. E. 1962. Application of ridge analysis to regression problems. Chem. Eng. Prog., 58: 54-59.
[8] Johnson, B. A. 2009. On LASSO for censored data. EJS, 3: 485-506. · Zbl 1326.62201
[9] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. 2003. A modified principal component technique based on the lass. J. Comput. Graph. Stat., 12(3): 531-547. (doi:10.1198/1061860032148)
[10] Lee, S. H., van der Werf, J. H.J., Hayes, B. J., Goddard, M. E. and Visscher, P. M. 2008. Predicting unobserved phenotypes for complex traits from whole-genome SNP data. PloS Genet., 4(10) p. e1000231, doi:10.1371/journal. pgen.1000231.
[11] Ma, S., Song, S. and Huang, J. 2007. Supervised group Lasso with applications to microarray data analysis. BMC Bioinf., 8 p. 60, doi:10.1186/1471-2105-8-60.
[12] R Development Core Team. R: A Language and Environment for Statistical Computing software available at http://www.R-project.org.
[13] Schenkel, F. S., Sargolzaei, M., Kistemaker, G., Jansen, G. B., Sullivan, P., Van Doormaal, B. J., VanRaden, P. M. and Wiggans, G. R. Reliability of genomic evaluation of holstein cattle in Canada. Proceedings of the Interbull International Workshop - Genomic Information in Genetic Evaluations. Sweden. pp.26-29.
[14] Smaragdov, M. G. 2009. Genomic selection as a possible accelerator of traditional selection. Genetika, 45(2): 633-636.
[15] Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 58: 267-288. · Zbl 0850.62538 · doi:10.1111/j.2517-6161.1996.tb02080.x
[16] Weisberg, S. 1980. Applied Linear Regression, New York: Wiley. · Zbl 0529.62054
[17] Wold, H. 1996. Estimation of Principal Components and Related Models by Iterative Least Squares, New York: Academic Press.
[18] Xu, S. 2007. An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics, 63(2): 513-521. (doi:10.1111/j.1541-0420.2006.00711.x) · Zbl 1136.62403 · doi:10.1111/j.1541-0420.2006.00711.x
[19] Zou, H., Hastie, T. and Tibshirani, R. 2006. Sparse principal component analysis. J. Comput. Graph. Statist., 15: 265-286. (doi:10.1198/106186006X113430)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.