×

Inductive determination of allele frequency spectrum probabilities in structured populations. (English) Zbl 1423.92208

Summary: We present a method for inductively determining exact allele frequency spectrum (AFS) probabilities for samples derived from a population comprising two demes under the infinite-allele model of mutation. This method builds on a labeled coalescent argument to extend the Ewens sampling formula (ESF) to structured populations. A key departure from the panmictic case is that the AFS conditioned on the number of alleles in the sample is no longer independent of the scaled mutation rate \((\theta)\). In particular, biallelic site frequency spectra, widely-used in explorations of genome-wide patterns of variation, depend on the mutation rate in structured populations. Variation in the rate of substitution across loci and through time may contribute to apparent distortions of site frequency spectra exhibited by samples derived from structured populations.

MSC:

92D10 Genetics and epigenetics
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D15 Problems related to evolution

References:

[1] Aldous, D. J., Exchangeability and related topics, (Aldous, D. J.; Ibragimov, I. A.; Jacod, J., Ecole d’Été de Probabilitiés de Saint-Flour XIII - 1983, volume 1117 (1985), Springer-Verlag: Springer-Verlag New York), 1-198 · Zbl 0562.60042
[2] Bustamante, C. D.; Nielsen, R.; Hartl, D. L., Maximum likelihood and Bayesian methods for estimating the distribution of selective effects among classes of mutations using DNA polymorphism data, Theor. Popul. Biol., 63, 91-103 (2003) · Zbl 1104.62118
[3] De Iorio, M.; Griffiths, R. C., Importance sampling on coalescent histories. II: Subdivided population models, Adv. Appl. Probab., 36, 434-454 (2004) · Zbl 1124.62317
[4] De Iorio, M.; Griffiths, R. C.; Leblois, R.; Rousset, F., Stepwise mutation likelihood computation by sequential importance sampling in subdivided population models, Theor. Popul. Biol., 68, 41-53 (2005) · Zbl 1101.62105
[5] Ewens, W. J., The sampling theory of selectively neutral alleles, Theor. Popul. Biol., 3, 87-112 (1972) · Zbl 0245.92009
[6] Felsenstein, J.; Kuhner, M. K.; Yamato, J.; Beerli, P., Likelihoods on coalescents: A Monte Carlo sampling approach to inferring parameters from population samples of molecular data, (Seillier-Moiseiwitsch, F., Statistics in Molecular Biology and Genetics (1999), Institute of Mathematical Statistics and American Mathematics Society: Institute of Mathematical Statistics and American Mathematics Society Haywood, CA), 163-185
[7] Fu, Y.-X., Statistical properties of segregating sites, Theor. Popul. Biol., 48, 172-197 (1995) · Zbl 0854.92014
[8] Ganapathy, G.; Uyenoyama, M. K., Site frequency spectra from genomic SNP surveys, Theor. Popul. Biol., 75, 346-354 (2009) · Zbl 1213.92032
[9] Griffiths, R. C.; Lessard, S., Ewens’ sampling formula and related formulae: Combinatorial proofs, extensions to variable population size and applications to ages of alleles, Theor. Popul. Biol., 68, 167-177 (2005) · Zbl 1085.92027
[10] Griffiths, R. C.; Tavaré, S., Simulating probability distributions in the coalescent, Theor. Popul. Biol., 46, 131-159 (1994) · Zbl 0807.92015
[11] Harpak, A.; Bhaskar, A.; Pritchard, J. K., Mutation rate variation is a primary determinant of the distribution of allele frequencies in humans, PLoS Genet., 12, Article e1006489 pp. (2016)
[12] Harris, K.; Pritchard, J. K., Rapid evolution of the human mutation spectrum, eLife, 194, Article e24284 pp. (2017)
[13] Hoppe, F. M., The sampling theory of neutral alleles and an urn model in population genetics, J. Math. Biol., 25, 123-159 (1987) · Zbl 0636.92007
[14] Hudson, R. R., Gene genealogies and the coalescent process, (Futuyma, D.; Antonovics, J., Oxford Surveys in Evolutionary Biology, volume 7 (1990), Oxford Univ. Press: Oxford Univ. Press New York), 1-44
[15] Hudson, R. R., Generating samples under a Wright-Fisher neutral model of genetic variation, Bioinformatics, 18, 337-338 (2002)
[16] Hudson, R. R., A new proof of the expected frequency spectrum under the standard neutral model, PLoS One, 10, Article e0118087 pp. (2015)
[17] Karlin, S.; McGregor, J., Addendum to the paper of W. Ewens, Theor. Popul. Biol., 3, 113-116 (1972) · Zbl 0245.92010
[18] Kim, S.; Plagnol, V.; Hu, T. T.; Toomajian, C.; Clark, R. M.; Ossowski, S.; Ecker, J. R.; Weigel, D.; Nordborg, M., Recombination and linkage disequilibrium in Arabidopsis thaliana, Nat. Genet., 39, 1151-1155 (2007)
[19] Kingman, J. F.C., Origins of the coalescent: 1974-1982, Genetics, 156, 1461-1463 (2000)
[20] Kumagai, S.; Uyenoyama, M. K., Genealogical histories in structured populations, Theor. Popul. Biol., 102, 3-15 (2015) · Zbl 1342.92129
[21] Lek, M.; Karczewski, K. J.; Minikel, E. V.; Samocha, K. E.; Banks, E.; Fennell, T.; O’Donnell-Luria, A. H.; Ware, J. S.; Hill, A. J.; Cummings, B. B.; Tukiainen, T.; Birnbaum, D. P.; Kosmicki, J. A.; Duncan, L. E.; Estrada, K.; Zhao, F.; Zou, J.; Pierce-Hoffman, E.; Berghout, J.; Cooper, D. N.; Deflaux, N.; DePristo, M.; Do, R.; Flannick, J.; Fromer, M.; Gauthier, L.; Goldstein, J.; Gupta, N.; Howrigan, D.; Kiezun, A.; Kurki, M. I.; Moonshine, A. L.; Natarajan, P.; Orozco, L.; Peloso, G. M.; Poplin, R.; Rivas, M. A.; Ruano-Rubio, V.; Rose, S. A.; Ruderfer, D. M.; Shakir, K.; Stenson, P. D.; Stevens, C.; Thomas, B. P.; Tiao, G.; Tusie-Luna, M. T.; Weisburd, B.; Won, H.-H.; Yu, D.; Altshuler, D. M.; Ardissino, D.; Boehnke, M.; Danesh, J.; Donnelly, S.; Elosua, R.; Florez, J. C.; Gabriel, S. B.; Getz, G.; Glatt, S. J.; Hultman, C. M.; Kathiresan, S.; Laakso, M.; McCarroll, S.; McCarthy, M. I.; McGovern, D.; McPherson, R.; Neale, B. M.; Palotie, A.; Purcell, S. M.; Saleheen, D.; Scharf, J. M.; Sklar, P.; Sullivan, P. F.; Tuomilehto, J.; Tsuang, M. T.; Watkins, H. C.; Wilson, J. G.; Daly, M. J.; MacArthur, D. G.; Exome Aggregation Consortium, Analysis of protein-coding genetic variation in 60,706 humans, Nature, 12, Article e1006489 pp. (2016)
[22] Leman, S. C.; Chen, Y.; Stajich, J. E.; Noor, M. A.F.; Uyenoyama, M. K., Likelihoods from summary statistics: Recent divergence between species, Genetics, 171, 1419-1436 (2005)
[23] Li, W.-H.; Gojobori, T.; Nei, M., Pseudogenes as a paradigm of neutral evolution, Nature, 292, 237-239 (1981)
[24] Redelings, B. D.; Kumagai, S.; Tatarenkov, A.; Wang, L.; Sakai, A. K.; Weller, S. G.; Culley, T. M.; Avise, J. C.; Uyenoyama, M. K., A Bayesian approach to inferring rates of selfing and locus-specific mutation, Genetics, 201, 1171-1188 (2015)
[25] Stephens, M.; Donnelly, P., Inference in molecular population genetics, J. R. Statis. Soc. B, 62, 605-635 (2000) · Zbl 0962.62107
[26] Tavaré, S.; Ewens, W. J., Multivariate Ewens distribution, (Johnson, N. L.; Kotz, S.; Balakrishnan, N., Discrete Multivariate Distributions, chapter 41 (1997), Wiley: Wiley New York), 232-246 · Zbl 0868.62048
[27] Taylor, H. M.; Karlin, S., An Introduction To Stochastic Modeling (1998), Academic Press: Academic Press New York · Zbl 0946.60002
[28] Wiuf, C.; Donnelly, P., Conditional genealogies and the age of a neutral mutant, Theor. Popul. Biol., 56, 183-201 (1999) · Zbl 0982.92027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.