×

The ages of mutations in gene trees. (English) Zbl 0948.92016

Summary: Under the infinitely many sites mutation model, the mutational history of a sample of DNA sequences can be described by a unique gene tree. We show how to find the conditional distribution of the ages of the mutations and the time to the most recent common ancestor of the sample, given this gene tree. Explicit expressions for such distributions seem impossible to find for the sample sizes of interest in practice. We resort to a Monte Carlo method to approximate these distributions. We use this method to study the effects of variable population size and variable mutation rates, the distribution of the time to the most recent common ancestor of the population and the distribution of other functionals of the underlying coalescent process, conditional on the sample gene tree.

MSC:

92D15 Problems related to evolution
60J70 Applications of Brownian motions and diffusion theory (population genetics, absorption problems, etc.)
92D10 Genetics and epigenetics
62M05 Markov processes: estimation; hidden Markov models
92D20 Protein sequences, DNA sequences
65C05 Monte Carlo methods
Full Text: DOI

References:

[1] Bahlo, M. and Griffiths, R. C. (2000). Gene trees in subdivided populations. Theoret. Population Biol. · Zbl 0984.92020
[2] Donnelly, P. and Tavaré, S. (1995). Coalescents and genealogical structure under neutrality. Ann. Rev. Genet. 29 401-421.
[3] Ethier, S. and Griffiths, R. C. (1987). The infinitely-many-sites-model as a measure valued diffusion. Ann. Probab. 15 515-545. · Zbl 0634.92007 · doi:10.1214/aop/1176992157
[4] Ethier, S. and Shiga, T. (1994). Neutral allelic genealogy. In Measure-valued Processes, Stochastic PDEs, and Interacting Sy stems 87-97. Amer. Math. Soc., Providence, RI. · Zbl 0806.92013
[5] Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Population Biol. 3 87-112. · Zbl 0245.92009 · doi:10.1016/0040-5809(72)90035-4
[6] Forsy the, G. E. and Leibler, R. A. (1950). Matrix inversion by the Monte Carlo method. Math. Comp. 26 127-129. · doi:10.2307/2002508
[7] Fu, Y.-X. and Li, W.-H. (1997). Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14 195-199.
[8] Fullerton, S. M., Harding, R. M., Boy ce, A. J. and Clegg, J. B. (1994). Molecular and population genetic analysis of allelic sequence diversity at the human -globin locus. Proc. Nat. Acad. Sci. U.S.A. 91 1805-1809.
[9] Griffiths, R. C. (1989). Genealogical-tree probabilities in the infinitely-many-site model. J. Math. Biol. 27 667-680. · Zbl 0716.92012 · doi:10.1007/BF00276949
[10] Griffiths, R. C. and Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comp. Biol. 3 479-502. Griffiths, R. C. and Tavaré, S. (1994a). Simulating probability distributions in the coalescent. Theoret. Population Biol. 46 131-159. Griffiths, R. C. and Tavaré, S. (1994b). Ancestral inference in population genetics. Statist. Sci. 9 307-319. Griffiths, R. C. and Tavaré, S. (1994c). Sampling theory for neutral alleles in a varying environment. Proc. Roy. Soc. London Ser. B 344 403-410. · Zbl 0955.62644
[11] Griffiths, R. C. and Tavaré, S. (1995). Unrooted genealogical tree probabilities in the infinitely many-sites model. Math. Biosci. 127 77-98. · Zbl 0818.92010 · doi:10.1016/0025-5564(94)00044-Z
[12] Griffiths, R. C. and Tavaré, S. (1998). The age of a mutation in a general coalescent tree. Stoch. Models 14 273-295. · Zbl 0889.92017 · doi:10.1080/15326349808807471
[13] Gusfield, D. (1991). Efficient algorithms for inferring evolutionary trees. Networks 21 19-28. · Zbl 0719.92015 · doi:10.1002/net.3230210104
[14] Halton, J. H. (1970). A retrospective and prospective study of the Monte Carlo method. SIAM Rev. 12 1-63. Harding, R. M., Fullerton, S. M., Griffiths, R. C., Bond, J., Cox, M. J., Schneider, J. A., · Zbl 0193.46901 · doi:10.1137/1012001
[15] Moulin, D. and Clegg, J. B. (1997). Archaic African and Asian lineages in the genetic ancestry of modern humans. Amer. J. Hum. Genet. 60 772-789.
[16] Harding, R. M., Fullerton, S. M., Griffiths, R. C. and Clegg, J. B. (1997). A gene tree for -globin sequences from Melanesia. J. Mol. Evol. 44 S133-S138.
[17] Hudson, R. R. (1983). Properties of a neutral allele model with intragenic recombination. Theoret. Population Biol. 23 183-201. · Zbl 0505.62090 · doi:10.1016/0040-5809(83)90013-8
[18] Hudson, R. R. (1991). Gene genealogies and the coalescent process. In Oxford Survey s in Evolutionary Biology (D. Futuy ma and J. Antonovics, eds.) 7 1-44. Oxford Univ. Press.
[19] Kimura, M. and Ohta, T. (1973). The age of a neutral mutant persisting in a finite population. Genetics 75 199-212. Kingman, J. F. C. (1982a). On the genealogy of large populations. J. Appl. Probab. 19A 27-43. Kingman, J. F. C. (1982b). The coalescent. Stochastic Process. Appl. 13 235-248. Kingman, J. F. C. (1982c). Exchangeability and the evolution of large populations. In Exchangeability in Probability and Statistics (G. Koch and F. Spizzichino, eds.) 97-112. NorthHolland, Amsterdam.
[20] Kuhner, M. K., Yamato, J. and Felsenstein, J. (1995). Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics 140 1421-1430.
[21] Saunders, I. W., Tavaré, S. and Watterson, G. A. (1984). On the genealogy of nested subsamples from a haploid population. Adv. in Appl. Probab. 16 471-491. JSTOR: · Zbl 0542.92016 · doi:10.2307/1427285
[22] Slatkin, M. and Hudson, R. R. (1991). Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129 555-562.
[23] Tajima, F. (1983). Evolutionary relationships of DNA sequences in finite populations. Genetics 105 437-460.
[24] Tavaré, S., Balding, D., Griffiths R. C. and Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics 145 505-518.
[25] Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theoret. Population Biol. 7 256-276. · Zbl 0294.92011 · doi:10.1016/0040-5809(75)90020-9
[26] Watterson, G. A. (1996). Motoo Kimura’s use of diffusion theory in population genetics. Theoret. Population Biol. 49 154-188. · Zbl 0845.92012 · doi:10.1006/tpbi.1996.0010
[27] Whitfield, L. S., Sulston, J. E. and Goodfellow, P. N. (1995). Sequence variation of the human Y chromosome. Nature 378 379-380.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.