×

An analytical framework in the general coalescent tree setting for analyzing polymorphisms created by two mutations. (English) Zbl 1345.92097

Summary: This paper presents an analytical framework for analyzing polymorphisms created by two mutation events in samples of DNA sequences modeled in the general coalescent tree setting. I developed the framework by deriving analytical formulas for the numbers of the topologies of the genealogies with two mutation events. This approach gives an advantage to analyze polymorphisms in large samples of DNA sequences at a non-recombining locus under vicarious evolutionary scenarios. Particularly the framework allows to estimate the probability of polymorphism data created by two mutation events as well as the ages of the events. Based on these results I extended the definition of the site frequency spectrum by classifying pairs of polymorphic sites into groups and presented analytical expressions for computing the expected sizes of these groups. Within the framework I also designed a Bayesian approach for inferring the haplotype of the most recent common ancestor at two polymorphic sites. Lastly, the framework was applied to polymorphism data from human APOE gene region under various demographic scenarios for ancestral human population and explored the signature of linkage disequilibrium for inferring the ancestral haplotype at two polymorphic sites. Interestingly enough, the results show that the most frequent haplotype at two completely linked polymorphic sites is not always the most likely candidate for the haplotype of the most recent common ancestor.

MSC:

92D10 Genetics and epigenetics
92D15 Problems related to evolution
62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
Full Text: DOI

References:

[1] Cann R, Stoneking M, Wilson A (1987) Mitochondrial DNA and human evolution. Nature 325:31-6 · doi:10.1038/325031a0
[2] Coop G, Griffiths RC (2004) Ancestral inference on gene trees under selection. Theor Popul Biol 66(3):219-232 · doi:10.1016/j.tpb.2004.06.006
[3] Corbo RM, Scacchi R (1999) Apolipoprotein E (APOE) allele distribution in the world. Is APOE*4 a ’thrifty’ allele? Ann Hum Genet 63(Pt 4):301-310 · doi:10.1046/j.1469-1809.1999.6340301.x
[4] Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC, Small GW, Roses AD, Haines JL, Pericak-Vance MA (1993) Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science 261(5123):921-923 · doi:10.1126/science.8346443
[5] Davignon J, Gregg RE, Sing CF (1988) Apolipoprotein E polymorphism and atherosclerosis. Arteriosclerosis 8(1):1-21 · doi:10.1161/01.ATV.8.1.1
[6] de Knijff P, van den Maagdenberg AM, Frants RR, Havekes LM (1994) Genetic heterogeneity of apolipoprotein E and its influence on plasma lipid and lipoprotein levels. Hum Mutat 4(3):178-194 · doi:10.1002/humu.1380040303
[7] Evans SN, Shvets Y, Slatkin M (2007) Non-equilibrium theory of the allele frequency spectrum. Theor Popul Biol 71(1):109-119 · Zbl 1118.92041 · doi:10.1016/j.tpb.2006.06.005
[8] Feller W (1970) An introduction to probability and its applications, 3rd edn. Wiley, New York · Zbl 0158.34902
[9] Felsenstein J, Kuhner MK, Yamato J, Beerli P (1999) Likelihoods on coalescents: a Monte Carlo sampling approach to inferring parameters from population samples of molecular data. In: Statistics in Molecular Biology and Genetics, IMS Lecture Notes Monogr. Ser., vol 33. Institute of Mathematical Statistics, Hayward, pp 163-185
[10] Forster P (2004) Ice ages and the mitochondrial DNA chronology of human dispersals: a review. Philos Trans R Soc Lond B Biol Sci 359(1442):255-264 discussion 264 · doi:10.1098/rstb.2003.1394
[11] Forster P, Matsumura S (2005) Evolution. Did early humans go north or south? Science 308(5724):965-966 · doi:10.1126/science.1113261
[12] Fu YX (1995) Statistical properties of segregating sites. Theor Popul Biol 48(2):172-197 · Zbl 0854.92014 · doi:10.1006/tpbi.1995.1025
[13] Fullerton SM, Clark AG, Weiss KM, Nickerson DA, Taylor SL, Stengrd JH, Salomaa V, Vartiainen E, Perola M, Boerwinkle E, Sing CF (2000) Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. Am J Hum Genet 67(4):881-900 · doi:10.1086/303070
[14] Griffiths RC (2003) The frequency spectrum of a mutation, and its age, in a general defusion model. Theor Popul Biol 64:241-251 · Zbl 1104.92045 · doi:10.1016/S0040-5809(03)00075-3
[15] Griffiths RC, Tavaré S (1994) Sampling theory for neutral alleles in a varying environment. Philos Trans R Soc Lond B 344:403-410 · doi:10.1098/rstb.1994.0079
[16] Griffiths RC, Tavaré S (1995) Unrooted genealogical tree probabilities in the infinitely-many-sites model. Math Biosci 127:77-98 · Zbl 0818.92010 · doi:10.1016/0025-5564(94)00044-Z
[17] Griffiths RC, Tavaré S (1998) The age of a mutation in a general coalescent tree. Commun Stat Stoch Models 14:273-295 · Zbl 0889.92017 · doi:10.1080/15326349808807471
[18] Griffiths RC, Tavaré S (1999) The ages of mutations in gene trees. Ann Appl Prob 9(3):567-590 · Zbl 0948.92016 · doi:10.1214/aoap/1029962804
[19] Griffiths, RC; Tavaré, S.; Green, PJ (ed.); Hjort, NL (ed.); Richardson, S. (ed.), The genealogy of a neutral mutation, 393-413 (2003), Oxford
[20] Hammer MF (1995) A recent common ancestry for Human Y chromosomes. Nature 378:376-8 · doi:10.1038/378376a0
[21] Hobolth A, Uyenoyama M, Wiuf C (2008) Importance sampling for the infinite sites model. Stat Appl Genet Mol Biol 7:32 · Zbl 1276.62074
[22] Hobolth A, Wiuf C (2009) The genealogy, site frequency spectrum and ages of two nested mutant alleles. Theor Popul Biol 75:260-265 · Zbl 1213.92040 · doi:10.1016/j.tpb.2009.02.001
[23] Hudson RR (1983) Testing the constant-rate neutral allele model with protein sequence data. Evolution 37:203-217 · doi:10.2307/2408186
[24] Hudson, RR; Futuyma, D. (ed.); Antonovics, J. (ed.), Gene genealogies and the coalescent process, No. 7, 1-44 (1991), Oxford
[25] Ingman M, Kaessmann H, Pääbo S, Gyllensten U (2000) Mitochondrial genome variation and the origin of modern humans. Nature 408:708-13 · doi:10.1038/35047064
[26] Jenkins PA, Song Y (2011) The effect of recurrent mutation on the frequency spectrum of a segregating site and the age of an allele. Theor Popul Biol 80(2):158-173 · Zbl 1297.92054 · doi:10.1016/j.tpb.2011.04.001
[27] Jobling M, Tyler-Smith C (2003) The human Y chromosome: an evolutionary marker comes of age. Nature Rev Genet 4:598-612 · doi:10.1038/nrg1124
[28] Kimmel M, Chakraborty R, King JP, Bamshad M, Watkins WS, Jorde LB (1998) Signatures of population expansion in microsatellite repeat data. Genetics 148:1921-30
[29] Kimura M, Ohta T (1973) The age of a neutral mutant persisting in a finite population. Genetics 75:199-212
[30] Kingman JFC (1982a) Exchangeability and the evolution of large populations. In: Koch G, Spizzichino F (eds) Exchangeability in Probability and Statistics. North Holland Publishing Company, Amsterdam, pp 97-112 · Zbl 0494.92011
[31] Kingman JFC (1982b) On the genealogy of large populations. J Appl Prob 19A:27-43 · Zbl 0516.92011 · doi:10.2307/3213548
[32] Kingman JFC (1982c) The coalescent. Stoch Process Appl 13:235-248 · Zbl 0491.60076 · doi:10.1016/0304-4149(82)90011-4
[33] Kuhner MK, Yamato J, Felsenstein J (1995) Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140:1421-1430
[34] Kuhner MK, Yamato J, Felsenstein J (1998) Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149:429-434
[35] Maca-Meyer N, Gonzalez A, Larruga J, Flores C, Cabrera V (2001) Major genomic mitochondrial lineages delineate early human expansions. BMC Genet 2:13 · doi:10.1186/1471-2156-2-13
[36] Machado CA, Kliman RM, Markert JA, Hey J (2002) Inferring the history of speciation from multilocus DNA sequence data: the case of Drosophila pseudoobscura and close relatives. Mol Biol Evol 19(4):472-488 · doi:10.1093/oxfordjournals.molbev.a004103
[37] Mellars P (2004) Neanderthals and the modern human colonization of europe. Nature 432(7016):461-465 · doi:10.1038/nature03103
[38] Mellars P (2006) A new radiocarbon revolution and the dispersal of modern humans in eurasia. Nature 439(7079):931-935 · doi:10.1038/nature04521
[39] Merriwether DA, Clark AG, Ballinger SW, Schurr TG, Soodyall H, Jenkins T, Sherry ST, Wallace DC (1991) The structure of human mitochondrial DNA variation. J Mol Evol 33:543-555 · doi:10.1007/BF02102807
[40] Nee S, May RM, Harvey PH (1994) The reconstructed evolutionary process. Philos Trans R Soc B 344:305-311 · doi:10.1098/rstb.1994.0068
[41] Nielsen R (2000) Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154(2):931-942
[42] Nordborg M (2001) Coalescent theory. In: Balding D, Bishop M, Cannings C (eds) Handbook of Statistical Genetics. Wiley, Chichester
[43] Pakendorf B, Stoneking M (2005) Mitochondrial DNA and human evolution. Annu Rev Genomics Hum Genet 6:165-183 · doi:10.1146/annurev.genom.6.080604.162249
[44] Pritchard JK, Seielstand MT, Perez-Lezaun A, Feldman MW (1999) Population growth of human Y chromosomes: a study of Y chromosome. Mol Biol Evol 16:1791-1798 · doi:10.1093/oxfordjournals.molbev.a026091
[45] Rannala B (1997) Gene genealogy in a population of variable size. J Hered. 78:417-423 · doi:10.1038/hdy.1997.65
[46] Sargsyan O (2006) Analytical and simulation results for the general coalescent. PhD dissertation, University of Southern California
[47] Sargsyan O (2010) Topologies of the conditional ancestral trees and full-likelihood-based inference in the general coalescent tree framework. Genetics 185:1355-68 · doi:10.1534/genetics.109.112847
[48] Sargsyan O, Wakeley J (2008) A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theor Popul Biol 74:104-114 · Zbl 1210.92028 · doi:10.1016/j.tpb.2008.04.009
[49] Sawyer SA, Hartl DL (1992) Population genetics of polymorphism and divergence. Genetics 132(4):1161-1176 · Zbl 0753.76176
[50] Slatkin M, Hudson RR (1991) Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129(2):555-562
[51] Slatkin M, Rannala B (1997) Estimating the age of alleles by use of interaallelic variability. Am J Hum Genet 60:447-458
[52] Stengrd JH, Zerba KE, Pekkanen J, Ehnholm C, Nissinen A, Sing CF (1995) Apolipoprotein E polymorphism predicts death from coronary heart disease in a longitudinal study of elderly Finnish men. Circulation 91(2):265-269 · doi:10.1161/01.CIR.91.2.265
[53] Stephens M (2000) Times on trees, and the age of an allele. Theor Popul Biol 57:109-119 · Zbl 0974.92023 · doi:10.1006/tpbi.1999.1442
[54] Stephens M, Donnelly P (2000) Inference in molecular population genetics. J R Stat Soc B 62:605-655 · Zbl 0962.62107 · doi:10.1111/1467-9868.00254
[55] Stephens M, Donnelly P (2003) Ancestral inference in population genetics models with selection (with discussion). Aust N Z J Stat 45:395-430 · Zbl 1064.62115 · doi:10.1111/1467-842X.00295
[56] Stringer C (2002) Modern human origins: progress and prospects. Philos Trans R Soc Lond B 357:563-579 · doi:10.1098/rstb.2001.1057
[57] Strittmatter WJ, Saunders AM, Schmechel D, Pericak-Vance M, Enghild J, Salvesen GS, Roses AD (1993) Apolipoprotein E: high-avidity binding to beta-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc Natl Acad Sci USA 90(5):1977-1981 · doi:10.1073/pnas.90.5.1977
[58] Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437-460
[59] Takahata N (1993) Allelic genealogy and human evolution. Mol Biol Evol 10(1):2-22
[60] Tavaré, S.; Zeitouni, O.; Picard, J. (ed.), Ancestral inference in population genetics, No. 1837, 1-188 (2004), New York · Zbl 1062.92046
[61] Thompson EA (1975) Humman evolutionary trees. Cambridge University Press, Cambridge
[62] Thomson R, Pritchard JK, Shen P, Oefner PJ, Feldman MW (2000) Recent common ancestry of human Y chromosomes Evidence from DNA sequence data. Proc Natl Acad Sci USA 97:7360-7365 · doi:10.1073/pnas.97.13.7360
[63] Vigilant L, Stoneking M, Harpending H, Hawkes K, Wilson A (1991) African populations and the evolution of human mitochondrial DNA. Science 253:1503-7 · doi:10.1126/science.1840702
[64] Wakeley J (2008) An introduction to coalescent theory. Roberts & Co, Boulder · Zbl 1366.92001
[65] Watterson GA (1975) On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7:256-276 · Zbl 0294.92011 · doi:10.1016/0040-5809(75)90020-9
[66] Weiss G, von Haeseler A (1998) Inference of population history using a likelihood approach. Genetics 149:1539-1546 · Zbl 0934.93016
[67] Wiuf C, Donnelly P (1999) Conditional genealogies and the age of a neutral mutant. Theor Popul Biol 56:183-201 · Zbl 0982.92027 · doi:10.1006/tpbi.1998.1411
[68] Xie X (2011) The site-frequency spectrum of linked sites. Bull Math Biol 73(3):459-494 · Zbl 1226.92051 · doi:10.1007/s11538-010-9534-3
[69] Zannis VI, Nicolosi RJ, Jensen E, Breslow JL, Hayes KC (1985) Plasma and hepatic apoE isoproteins of nonhuman primates. Differences in apoE among humans, apes, and New and Old World monkeys. J Lipid Res 26(12):1421-1430
[70] Zivkovic D, Wiehe T (2008) Second-order moments of segregating sites under variable population size. Genetics 180(1):341-357 · doi:10.1534/genetics.108.091231
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.