×

Tree-based quantitative trait mapping in the presence of external covariates. (English) Zbl 1360.92015

Summary: A central goal in biological and biomedical sciences is to identify the molecular basis of variation in morphological and behavioral traits. Over the last decade, improvements in sequencing technologies coupled with the active development of association mapping methods have made it possible to link single nucleotide polymorphisms (SNPs) and quantitative traits. However, a major limitation of existing methods is that they are often unable to consider complex, but biologically-realistic, scenarios. Previous work showed that association mapping method performance can be improved by using the evolutionary history within each SNP to estimate the covariance structure among randomly-sampled individuals. Here, we propose a method that can be used to analyze a variety of data types, such as data including external covariates, while considering the evolutionary history among SNPs, providing an advantage over existing methods. Existing methods either do so at a computational cost, or fail to model these relationships altogether. By considering the broad-scale relationships among SNPs, the proposed approach is both computationally-feasible and informed by the evolutionary history among SNPs. We show that incorporating an approximate covariance structure during analysis of complex data sets increases performance in quantitative trait mapping, and apply the proposed method to deer mice data.

MSC:

92B15 General biostatistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
Full Text: DOI

References:

[1] Balding, D. J. (2006): “A tutorial on statistical methods for population association studies,” Nat. Rev. Genet., 7, 781-791.; Balding, D. J., A tutorial on statistical methods for population association studies, Nat. Rev. Genet., 7, 781-791 (2006)
[2] Besenbacher, S., T. Mailund and M. H. Schierup (2009): “Local phylogeny mapping of quantitative traits: higher accuracy and better ranking than single-marker association in genomewide scans,” Genetics, 181, 747-753.; Besenbacher, S.; Mailund, T.; Schierup, M. H., Local phylogeny mapping of quantitative traits: higher accuracy and better ranking than single-marker association in genomewide scans, Genetics, 181, 747-753 (2009)
[3] Browning, S. R. and B. L. Browning (2007): “Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering,” Am. J. Hum. Genet., 81, 1084-1097.; Browning, S. R.; Browning, B. L., Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering, Am. J. Hum. Genet., 81, 1084-1097 (2007) · Zbl 0960.92021
[4] Domingues, V. S., Y.-P. Poh, B. K. Peterson, P. S. Pennings, J. D. Jensen and H. E. Hoekstra (2012): “Evidence of adaptation from ancestral variation in young populations of beach mice,” Evolution, 66, 3209-3223.; Domingues, V. S.; Poh, Y.-P.; Peterson, B. K.; Pennings, P. S.; Jensen, J. D.; Hoekstra, H. E., Evidence of adaptation from ancestral variation in young populations of beach mice, Evolution, 66, 3209-3223 (2012)
[5] González, J. R., L. Armengol, X. Solé, E. Guinó, J. M. Mercader, X. Estivill and V. Moreno (2007): “SNPassoc: an R package to perform whole genome association studies,” Bioinformatics, 23, 644-645.; González, J. R.; Armengol, L.; Solé, X.; Guinó, E.; Mercader, J. M.; Estivill, X.; Moreno, V., SNPassoc: an R package to perform whole genome association studies, Bioinformatics, 23, 644-645 (2007)
[6] Guan, Y. and M. Stephens (2011): “Bayesian variable selection regression for genome-wide association studies and other large-scale problems,” Ann. Appl. Stat., 5, 1780-1815.; Guan, Y.; Stephens, M., Bayesian variable selection regression for genome-wide association studies and other large-scale problems, Ann. Appl. Stat., 5, 1780-1815 (2011) · Zbl 1229.62145
[7] Hirschhorn, J. N. and M. J. Daly (2005): “Genome-wide association studies for common diseases and complex traits,” Nat. Rev. Genet., 6, 95-108.; Hirschhorn, J. N.; Daly, M. J., Genome-wide association studies for common diseases and complex traits, Nat. Rev. Genet., 6, 95-108 (2005)
[8] Hudson, R. R. (2002): “Generating samples under a wright-fisher neutral model of genetic variation,” Bioinformatics, 18, 337-338.; Hudson, R. R., Generating samples under a wright-fisher neutral model of genetic variation, Bioinformatics, 18, 337-338 (2002)
[9] Kang, H. M., J. H. Sul, S. K. Service, N. A. Zaitlen, S. Kong, N. B. Freimer, C. Sabatti and E. Eskin (2010): “Variance component model to account for sample structure in genome-wide association studies,” Nat. Genet., 42, 348-354.; Kang, H. M.; Sul, J. H.; Service, S. K.; Zaitlen, N. A.; Kong, S.; Freimer, N. B.; Sabatti, C.; Eskin, E., Variance component model to account for sample structure in genome-wide association studies, Nat. Genet., 42, 348-354 (2010)
[10] Kass, R. E. and A. E. Raftery (1995): “Bayes factors,” J. Am. Statist. Assoc., 90, 773-795.; Kass, R. E.; Raftery, A. E., Bayes factors, J. Am. Statist. Assoc., 90, 773-795 (1995) · Zbl 0846.62028
[11] King, C. R., P. J. Rathouz and D. L. Nicolae (2010): “An evolutionary framework for association testing in resequencing studies,” PLoS Genet., 6, e1001202.; King, C. R.; Rathouz, P. J.; Nicolae, D. L., An evolutionary framework for association testing in resequencing studies, PLoS Genet., 6, e1001202 (2010)
[12] Laird, N., S. Horvath and X. Xu (2000): “Implementing a unified approach to family based tests of association,” Genet. Epidemiol., 19, S36-S42.; Laird, N.; Horvath, S.; Xu, X., Implementing a unified approach to family based tests of association, Genet. Epidemiol., 19, S36-S42 (2000)
[13] Linnen, C. R., E. P. Kingsley, J. D. Jensen and H. E. Hoekstra (2009): “On the origin and spread of an adaptive allele in deer mice,” Science, 325, 1095-1098.; Linnen, C. R.; Kingsley, E. P.; Jensen, J. D.; Hoekstra, H. E., On the origin and spread of an adaptive allele in deer mice, Science, 325, 1095-1098 (2009)
[14] Linnen, C. R., Y.-P. Poh, B. K. Peterson, R. D. H. Barrett, J. G. Larson, J. D. Jensen and H. E. Hoekstra (2013): “Adaptive evolution of multiple traits through multiple mutations at a single gene,” Science, 339, 1312-1316.; Linnen, C. R.; Poh, Y.-P.; Peterson, B. K.; Barrett, R. D. H.; Larson, J. G.; Jensen, J. D.; Hoekstra, H. E., Adaptive evolution of multiple traits through multiple mutations at a single gene, Science, 339, 1312-1316 (2013)
[15] Lynch, M. and B. Walsh (Ed.) (1998): Genetics and analysis of quantitative traits, chapter 26. Sunderland, MA, USA: Sinauer Associates, Inc.; Lynch, M.; Walsh, B., Genetics and analysis of quantitative traits, chapter 26 (1998)
[16] Mailund, T., S. Besenbacher and M. H. Schierup (2006): “Whole genome association mapping by incompatibilities and local perfect phylogenies,” BMC Bioinform., 7, 454.; Mailund, T.; Besenbacher, S.; Schierup, M. H., Whole genome association mapping by incompatibilities and local perfect phylogenies, BMC Bioinform., 7, 454 (2006)
[17] Moore, J. H., F. W. Asselbergs and S. M. Williams (2010): “Bioinformatics challenges for genome-wide association studies,” Bioinformatics, 26, 445-455.; Moore, J. H.; Asselbergs, F. W.; Williams, S. M., Bioinformatics challenges for genome-wide association studies, Bioinformatics, 26, 445-455 (2010)
[18] Naylor, M. G., S. T. Weiss and C. Lange (2010): “A bayesian approach to genetic association studies with family-based designs,” Genet. Epidemiol., 34, 569-574.; Naylor, M. G.; Weiss, S. T.; Lange, C., A bayesian approach to genetic association studies with family-based designs, Genet. Epidemiol., 34, 569-574 (2010)
[19] Newton, M. A. and A. E. Raftery (1994): “Approximate bayesian inference with the weighted likelihood bootstrap,” J. R. Stat. Soc. Series B Methodol., 56, 3-48.; Newton, M. A.; Raftery, A. E., Approximate bayesian inference with the weighted likelihood bootstrap, J. R. Stat. Soc. Series B Methodol., 56, 3-48 (1994) · Zbl 0788.62026
[20] Ott, J., Y. Kamatani and M. Lathrop (2011): “Family-based designs for genome-wide association studies,” Nat. Rev. Genet., 12, 465-474.; Ott, J.; Kamatani, Y.; Lathrop, M., Family-based designs for genome-wide association studies, Nat. Rev. Genet., 12, 465-474 (2011)
[21] Pan, F., L. McMillan, F. Pardo-Manuel de Villena, D. Threadgill and W. Wang (2009): “TreeQA”: Quantitative genome wide association mapping using local perfect phylogeny trees, Pac. Symp. Biocomput., 415-426.; Pan, F.; McMillan, L.; Pardo-Manuel de Villena, F.; Threadgill, D.; Wang, W., “TreeQA”: Quantitative genome wide association mapping using local perfect phylogeny trees, Pac. Symp. Biocomput., 415-426 (2009)
[22] Patterson, N., A. L. Price and D. Reich (2006): “Population structure and eigenanalysis,” PLoS Genet., 2, e190.; Patterson, N.; Price, A. L.; Reich, D., Population structure and eigenanalysis, PLoS Genet., 2, e190 (2006)
[23] Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Maller, P. Sklar, P. I. de Bakker, M. J. Daly and P. C. Sham (2007): “PLINK”: A tool set for whole-genome association and population-based linkage analyses, Am. J Hum. Genet., 81, 559-575.; Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M. A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P. I.; Daly, M. J.; Sham, P. C., “PLINK”: A tool set for whole-genome association and population-based linkage analyses, Am. J Hum. Genet., 81, 559-575 (2007)
[24] Ried, J. S., A. Döring, K. Oexle, C. Meisinger, J. Winkelmann, N. Klopp, T. Meitinger, A. Peters, K. Suhre, H.-E. Wichmann and C. Gieger (2012): “PSEA:” Phenotype set enrichment analysis-a new method for analysis of multiple phenotypes, Genetic Epidemiol., 36, 244-252.; Ried, J. S.; Döring, A.; Oexle, K.; Meisinger, C.; Winkelmann, J.; Klopp, N.; Meitinger, T.; Peters, A.; Suhre, K.; Wichmann, H.-E.; Gieger, C., “PSEA:” Phenotype set enrichment analysis-a new method for analysis of multiple phenotypes, Genetic Epidemiol., 36, 244-252 (2012)
[25] Rogers, J. S. and D. L. Swofford (1998): “A fast method for approximating maximum likelihoods of phylogenetic trees from nucleotide sequences,” Syst. Biol., 47, 77-89.; Rogers, J. S.; Swofford, D. L., A fast method for approximating maximum likelihoods of phylogenetic trees from nucleotide sequences, Syst. Biol., 47, 77-89 (1998)
[26] Schaid, D. J., C. M. Rowland, D. E. Tines, R. M. Jacobson and G. A. Poland (2002): “Score tests for association between traits and haplotypes when linkage phase is ambiguous,” Am. J. Hum. Genet., 70, 425-434.; Schaid, D. J.; Rowland, C. M.; Tines, D. E.; Jacobson, R. M.; Poland, G. A., Score tests for association between traits and haplotypes when linkage phase is ambiguous, Am. J. Hum. Genet., 70, 425-434 (2002)
[27] Sinnwell, J. P. and D. J. Schaid (2009): haplo.stats: Statistical analysis of haplotypes with traits and covariates when linkage phase is ambiguous, , r package version 1.4.4.; Sinnwell, J. P.; Schaid, D. J., haplo.stats: Statistical analysis of haplotypes with traits and covariates when linkage phase is ambiguous (2009)
[28] Solé, X., E. Guino, J. Valls, R. Iniesta and V. Moreno (2006): “SNPStats”: a web tool for the analysis of association studies, Bioinformatics, 22, 1928-1929.; Solé, X.; Guino, E.; Valls, J.; Iniesta, R.; Moreno, V., “SNPStats”: a web tool for the analysis of association studies, Bioinformatics, 22, 1928-1929 (2006)
[29] Stephens, M. and D. J. Balding (2009): “Bayesian statistical methods for genetic association studies,” Nat. Rev. Genet., 10, 681-690.; Stephens, M.; Balding, D. J., Bayesian statistical methods for genetic association studies, Nat. Rev. Genet., 10, 681-690 (2009)
[30] Stranger, B. E., E. a Stahl and T. Raj (2011): “Progress and promise of genome-wide association studies for human complex trait genetics,” Genetics, 187, 367-383.; Stranger, B. E.; a Stahl, E.; Raj, T., Progress and promise of genome-wide association studies for human complex trait genetics, Genetics, 187, 367-383 (2011)
[31] Thompson, K. L. and L. S. Kubatko (2013): “Using ancestral information to detect and localize quantitative trait loci in genome-wide association studies,” BMC Bioinform., 14, 200.; Thompson, K. L.; Kubatko, L. S., Using ancestral information to detect and localize quantitative trait loci in genome-wide association studies, BMC Bioinform., 14, 200 (2013)
[32] Tzeng, J.-Y., C.-H. Wang, J.-T. Kao and C. K. Hsiao (2006): “Regression-based association analysis with clustered haplotypes through use of genotypes,” Am. J. Hum. Genet., 78, 231-242.; Tzeng, J.-Y.; Wang, C.-H.; Kao, J.-T.; Hsiao, C. K., Regression-based association analysis with clustered haplotypes through use of genotypes, Am. J. Hum. Genet., 78, 231-242 (2006)
[33] van der Sluis, S., D. Posthuma and C. V. Dolan (2013): “TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies,” PLoS Genet., 9, e1003235.; van der Sluis, S.; Posthuma, D.; Dolan, C. V., TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies, PLoS Genet., 9, e1003235 (2013)
[34] Wood, S. (Ed.) (2006): Generalized additive models: an introduction with R, chapter 6. Boca Raton, FL, USA: Chapman and Hall/CRC.; Wood, S., Generalized additive models: an introduction with R, chapter 6 (2006) · Zbl 1087.62082
[35] Yan, Q., D. E. Weeks, J. C. Celedón, H. K. Tiwari, B. Li, X. Wang, W.-Y. Lin, X.-Y. Lou, G. Gao, W. Chen and N. Liu (2015): “Associating multivariate quantitative phenotypes with genetic variants in family samples with a novel kernel machine regression method,” Genetics, 201, 1329-1339.; Yan, Q.; Weeks, D. E.; Celedón, J. C.; Tiwari, H. K.; Li, B.; Wang, X.; Lin, W.-Y.; Lou, X.-Y.; Gao, G.; Chen, W.; Liu, N., Associating multivariate quantitative phenotypes with genetic variants in family samples with a novel kernel machine regression method, Genetics, 201, 1329-1339 (2015)
[36] Yu, J., G. Pressoir, W. H. Briggs, I. V. Bi, M. Yamasaki, J. F. Doebley, M. D. McMullen, B. S. Gaut, D. M. Nielsen, J. B. Holland, S. Kresovich and E. S. Buckler (2006): “A unified mixed-model method for association mapping that accounts for multiple levels of relatedness,” Nat. Genet., 38, 203-208.; Yu, J.; Pressoir, G.; Briggs, W. H.; Bi, I. V.; Yamasaki, M.; Doebley, J. F.; McMullen, M. D.; Gaut, B. S.; Nielsen, D. M.; Holland, J. B.; Kresovich, S.; Buckler, E. S., A unified mixed-model method for association mapping that accounts for multiple levels of relatedness, Nat. Genet., 38, 203-208 (2006)
[37] Zhang, W., R. Korstanje, J. Thaisz, F. Staedtler, N. Harttman, L. Xu, M. Feng, L. Yanas, H. Yang, W. Valdar, G. A. Churchill and K. DiPetrillo (2012a): “Genome-wide association mapping of quantitative traits in outbred mice,” G3 (Bethesda), 2, 167-174.; Zhang, W.; Korstanje, R.; Thaisz, J.; Staedtler, F.; Harttman, N.; Xu, L.; Feng, M.; Yanas, L.; Yang, H.; Valdar, W.; Churchill, G. A.; DiPetrillo, K., Genome-wide association mapping of quantitative traits in outbred mice, G3 (Bethesda), 2, 167-174 (2012)
[38] Zhang, Z., X. Zhang and W. Wang (2012b): “HTreeQA: Using semi-perfect phylogeny trees in quantitative trait loci study on genotype data,” G3 (Bethesda), 2, 175-189.; Zhang, Z.; Zhang, X.; Wang, W., HTreeQA: Using semi-perfect phylogeny trees in quantitative trait loci study on genotype data, G3 (Bethesda), 2, 175-189 (2012)
[39] Zhu, X., S. Li, R. S. Cooper and R. C. Elston (2008): “A unified association analysis approach for family and unrelated samples correcting for stratification,” Am. J. Hum. Genet., 82, 352-365.; Zhu, X.; Li, S.; Cooper, R. S.; Elston, R. C., A unified association analysis approach for family and unrelated samples correcting for stratification, Am. J. Hum. Genet., 82, 352-365 (2008)
[40] Zöllner, S. and J. K. Pritchard (2005): “Coalescent-based association mapping and fine mapping of complex trait loci,” Genetics, 169, 1071-1092.; Zöllner, S.; Pritchard, J. K., Coalescent-based association mapping and fine mapping of complex trait loci, Genetics, 169, 1071-1092 (2005)
[41] Zöllner, S., X. Wen and J. K. Pritchard (2005): “Association mapping and fine mapping with TreeLD,” Bioinformatics, 21, 3168-3170.; Zöllner, S.; Wen, X.; Pritchard, J. K., Association mapping and fine mapping with TreeLD, Bioinformatics, 21, 3168-3170 (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.