×

Efficiently inferring the demographic history of many populations with allele count data. (English) Zbl 1441.62919

Summary: The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than previously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed “basal Eurasian” admixture event in human history. We implement and release our method in a new open-source software package momi2.

MSC:

62P25 Applications of statistics to social sciences
92D25 Population dynamics (general)

References:

[1] Baharian, S.; Gravel, S., “On the Decidability of Population Size Histories From Finite Allele Frequency Spectra,”, Theoretical Population Biology, 120, 42-51 (2018) · Zbl 1397.92447 · doi:10.1016/j.tpb.2017.12.008
[2] Beaumont, M. A.; Nichols, R. A., “Evaluating Loci for Use in the Genetic Analysis of Population Structure, Proceedings of the Royal Society of London, Series B, 263, 1619-1626 (1996)
[3] Bhaskar, A.; Song, Y. S., “Descartes’ Rule of Signs and the Identifiability of Population Demographic Models From Genomic Variation Data,”, Annals of Statistics, 42, 2469-2493 (2014) · Zbl 1305.62027 · doi:10.1214/14-AOS1264
[4] Bhaskar, A.; Wang, Y. X. R.; Song, Y. S., “Efficient Inference of Population Size Histories and Locus-Specific Mutation Rates From Large-Sample Genomic Variation Data,”, Genome Research, 25, 268-279 (2015) · doi:10.1101/gr.178756.114
[5] Boyko, A. R.; Williamson, S. H.; Indap, A. R.; Degenhardt, J. D.; Hernandez, R. D.; Lohmueller, K. E.; Adams, M. D.; Schmidt, S.; Sninsky, J. J.; Sunyaev, S. R.; White, T. J., “Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome,”, PLoS Genetics, 4, e1000083 (2008) · doi:10.1371/journal.pgen.1000083
[6] Bryant, D.; Bouckaert, R.; Felsenstein, J.; Rosenberg, N. A.; RoyChoudhury, A., “Inferring Species Trees Directly From Biallelic Genetic Markers: Bypassing Gene Trees in a Full Coalescent Analysis,”, Molecular Biology and Evolution, 29, 1917-1932 (2012) · doi:10.1093/molbev/mss086
[7] Chen, H., “The Joint Allele Frequency Spectrum of Multiple Populations: A Coalescent Theory Approach,”, Theoretical Population Biology, 81, 179-195 (2012) · Zbl 1322.92044 · doi:10.1016/j.tpb.2011.11.004
[8] Corliss, G.; Faure, C.; Griewank, A.; Hascoet, L.; Naumann, U., Automatic Differentiation of Algorithms: From Simulation to Optimization, 1 (2002), New York: Springer Science & Business Media, New York
[9] Coventry, A.; Bull-Otterson, L. M.; Liu, X.; Clark, A. G.; Maxwell, T. J.; Crosby, J.; Hixson, J. E.; Rea, T. J.; Muzny, D. M.; Lewis, L. R.; Wheeler, D. A., “Deep Resequencing Reveals Excess Rare Recent Variants Consistent With Explosive Population Growth,”, Nature Communications, 1, 131 (2010) · doi:10.1038/ncomms1130
[10] Dabney, J.; Meyer, M.; Pääbo, S., “Ancient DNA Damage,”, Cold Spring Harbor Perspectives in Biology, 5, a012567 (2013) · doi:10.1101/cshperspect.a012567
[11] De Iorio, M.; Griffiths, R. C., “Importance Sampling on Coalescent Histories. II: Subdivided Population Models,”, Advances in Applied Probability, 36, 434-454 (2004) · Zbl 1124.62317 · doi:10.1239/aap/1086957580
[12] De Maio, N.; Schrempf, D.; Kosiol, C., “PoMo: An Allele Frequency-Based Approach for Species Tree Estimation,”, Systematic Biology, 64, 1018-1031 (2015) · doi:10.1093/sysbio/syv048
[13] Donnelly, P.; Kurtz, T., “A Countable Representation of the Fleming-Viot Measure-Valued Diffusion,”, The Annals of Probability, 24, 698-742 (1996) · Zbl 0869.60074 · doi:10.1214/aop/1039639359
[14] Donnelly, P.; Kurtz, T. G., “Particle Representations for Measure-Valued Population Models,”, The Annals of Probability, 27, 166-205 (1999) · Zbl 0956.60081 · doi:10.1214/aop/1022677258
[15] Durrett, R., Probability Models for DNA Sequence Evolution (2008), New York: Springer, New York · Zbl 1311.92007
[16] Ewens, W. J., Mathematical Population Genetics: I. Theoretical Introduction (2004), New York: Springer Science + Business Media, Inc, New York · Zbl 1060.92046
[17] Excoffier, L.; Dupanloup, I.; Huerta-Sánchez, E.; Sousa, V. C.; Foll, M., “Robust Demographic Inference From Genomic and SNP Data,”, PLoS Genetics, 9, e1003905 (2013) · doi:10.1371/journal.pgen.1003905
[18] Excoffier, L.; Foll, M., “Fastsimcoal: A Continuous-Time Coalescent Simulator of Genomic Diversity Under Arbitrarily Complex Evolutionary Scenarios,”, Bioinformatics, 27, 1332-1334 (2011) · doi:10.1093/bioinformatics/btr124
[19] Fay, J. C.; Wu, C. I., “Hitchhiking Under Positive Darwinian Selection,”, Genetics, 155, 1405-1413 (2000)
[20] Felsenstein, J., “Evolutionary Trees From DNA Sequences: A Maximum Likelihood Approach,”, Journal of Molecular Evolution, 17, 368-376 (1981) · doi:10.1007/BF01734359
[21] Fu, Q.; Li, H.; Moorjani, P.; Jay, F.; Slepchenko, S. M.; Bondarev, A. A.; Johnson, P. L.; Aximu-Petri, A.; Prüfer, K.; de Filippo, C.; Meyer, M., “Genome Sequence of a 45,000-Year-Old Modern Human From Western Siberia,”, Nature, 514, 445 (2014) · doi:10.1038/nature13810
[22] Gazave, E.; Ma, L.; Chang, D.; Coventry, A.; Gao, F.; Muzny, D.; Boerwinkle, E.; Gibbs, R. A.; Sing, C. F.; Clark, A. G.; Keinan, A., “Neutral Genomic Regions Refine Models of Recent Rapid Human Population Growth,”, Proceedings of the National Academy of Sciences of the United States of America, 111, 757-762 (2014) · doi:10.1073/pnas.1310398110
[23] Gravel, S.; Henn, B. M.; Gutenkunst, R. N.; Indap, A. R.; Marth, G. T.; Clark, A. G.; Yu, F.; Gibbs, R. A.; Bustamante, C. D.; Altshuler, D. L.; Durbin, R. M., “Demographic History and Rare Allele Sharing Among Human Populations,”, Proceedings of the National Academy of Sciences of the United States of America, 108, 11983-11988 (2011) · doi:10.1073/pnas.1019276108
[24] Green, R. E.; Krause, J.; Briggs, A. W.; Maricic, T.; Stenzel, U.; Kircher, M.; Patterson, N.; Li, H.; Zhai, W.; Fritz, M. H.-Y.; Hansen, N. F., “A Draft Sequence of the Neandertal Genome,”, Science, 328, 710-722 (2010) · doi:10.1126/science.1188021
[25] Griffiths, R.; Tavaré, S., “The Age of a Mutation in a General Coalescent Tree,”, Communications in Statistics. Stochastic Models, 14, 273-295 (1998) · Zbl 0889.92017 · doi:10.1080/15326349808807471
[26] Griffiths, R. C.; Tavaré, S.; Donnelly, P.; Tavaré, S., Progress in Population Genetics and Human Evolution, 87, “Computational Methods for the Coalescent,”, 165-182 (1997), Berlin: Springer-Verlag, Berlin · Zbl 0893.92021
[27] Gutenkunst, R. N.; Hernandez, R. D.; Williamson, S. H.; Bustamante, C. D., “Inferring the Joint Demographic History of Multiple Populations From Multidimensional SNP Frequency Data,”, PLoS Genetics, 5, e1000695 (2009) · doi:10.1371/journal.pgen.1000695
[28] Haak, W.; Lazaridis, I.; Patterson, N.; Rohland, N.; Mallick, S.; Llamas, B.; Brandt, G.; Nordenfelt, S.; Harney, E.; Stewardson, K.; Fu, Q., “Massive Migration From the Steppe Was a Source for Indo-European Languages in Europe,”, Nature, 522, 207-211 (2015) · doi:10.1038/nature14317
[29] Holsinger, K. E.; Weir, B. S., “Genetics in Geographically Structured Populations: Defining, Estimating and Interpreting FST,”, Nature Reviews Genetics, 10, 639-650 (2009) · doi:10.1038/nrg2611
[30] Jenkins, P. A.; Mueller, J. W.; Song, Y. S., “General Triallelic Frequency Spectrum Under Demographic Models With Variable Population Size,”, Genetics, 196, 295-311 (2014) · doi:10.1534/genetics.113.158584
[31] Jouganous, J.; Long, W.; Ragsdale, A. P.; Gravel, S., “Inferring the Joint Demographic History of Multiple Populations: Beyond the Diffusion Approximation,”, Genetics, 206, 1549-1567 (2017) · doi:10.1534/genetics.117.200493
[32] Kamm, J. A.; Terhorst, J.; Song, Y. S., “Efficient Computation of the Joint Sample Frequency Spectra for Multiple Populations,”, Journal of Computational and Graphical Statistics, 26, 182-194 (2017) · doi:10.1080/10618600.2016.1159212
[33] Kelleher, J.; Etheridge, A. M.; McVean, G., “Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes,”, PLoS Computational Biology, 12, e1004842 (2016) · doi:10.1371/journal.pcbi.1004842
[34] Kimura, M., “The Number of Heterozygous Nucleotide Sites Maintained in a Finite Population Due to Steady Flux of Mutations,”, Genetics, 61, 893-903 (1969)
[35] Kingman, J. F. C., “The Coalescent,”, Stochastic Processes and Their Applications, 13, 235-248 (1982) · Zbl 0491.60076 · doi:10.1016/0304-4149(82)90011-4
[36] Koller, D.; Friedman, N., Probabilistic Graphical Models: Principles and Techniques (2009), Cambridge, MA: MIT Press, Cambridge, MA · Zbl 1183.68483
[37] Lauritzen, S. L.; Spiegelhalter, D. J., “Local Computations With Probabilities on Graphical Structures and Their Application to Expert Systems,”, Journal of the Royal Statistical Society, Series B, 50, 157-224 (1988) · Zbl 0684.68106 · doi:10.1111/j.2517-6161.1988.tb01721.x
[38] Lazaridis, I.; Nadel, D.; Rollefson, G.; Merrett, D. C.; Rohland, N.; Mallick, S.; Fernandes, D.; Novak, M.; Gamarra, B.; Sirak, K.; Connell, S., “Genomic Insights Into the Origin of Farming in the Ancient Near East,”, Nature, 536, 419-424 (2016) · doi:10.1038/nature19310
[39] Lazaridis, I.; Patterson, N.; Mittnik, A.; Renaud, G.; Mallick, S.; Kirsanow, K.; Sudmant, P. H.; Schraiber, J. G.; Castellano, S.; Lipson, M.; Berger, B., “Ancient Human Genomes Suggest Three Ancestral Populations for Present-Day Europeans,”, Nature, 513, 409-413 (2014) · doi:10.1038/nature13673
[40] Lukić, S.; Hey, J., “Demographic Inference Using Spectral Methods on SNP Data, With an Analysis of the Human Out-of-Africa Expansion,”, Genetics, 192, 619-639 (2012) · doi:10.1534/genetics.112.141846
[41] Maclaurin, D.; Duvenaud, D.; Adams, R. P., Autograd: Effortless Gradients in Numpy (2015)
[42] Mallick, S.; Li, H.; Lipson, M.; Mathieson, I.; Gymrek, M.; Racimo, F.; Zhao, M.; Chennagiri, N.; Nordenfelt, S.; Tandon, A.; Skoglund, P., “The Simons Genome Diversity Project: 300 Genomes From 142 Diverse Populations,”, Nature, 538, 201-206 (2016) · doi:10.1038/nature18964
[43] McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; DePristo, M. A., “The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing data,”, Genome Research, 20, 1297-1303 (2010) · doi:10.1101/gr.107524.110
[44] Meyer, M.; Arsuaga, J.-L.; de Filippo, C.; Nagel, S.; Aximu-Petri, A.; Nickel, B.; Martínez, I.; Gracia, A.; de Castro, J. M. B.; Carbonell, E.; Viola, B., “Nuclear DNA Sequences From the Middle Pleistocene Sima de los Huesos Hominins,”, Nature, 531, 504-507 (2016) · doi:10.1038/nature17405
[45] Myers, S.; Fefferman, C.; Patterson, N., “Can One Learn History From the Allelic Spectrum?,”, Theoretical Population Biology, 73, 342-348 (2008) · Zbl 1209.92045 · doi:10.1016/j.tpb.2008.01.001
[46] Nelson, M. R.; Wegmann, D.; Ehm, M. G.; Kessner, D.; Jean, P. S.; Verzilli, C.; Shen, J.; Tang, Z.; Bacanu, S.-A.; Fraser, D.; Warren, L., “An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People,”, Science, 337, 100-104 (2012) · doi:10.1126/science.1217876
[47] Nielsen, R., “Estimation of Population Parameters and Recombination Rates From Single Nucleotide Polymorphisms,”, Genetics, 154, 931-942 (2000)
[48] Notohara, M., “The Coalescent and the Genealogical Process in Geographically Structured Population,”, Journal of Mathematical Biology, 29, 59-75 (1990) · Zbl 0726.92014 · doi:10.1007/BF00173909
[49] Patterson, N.; Moorjani, P.; Luo, Y.; Mallick, S.; Rohland, N.; Zhan, Y.; Genschoreck, T.; Webster, T.; Reich, D., “Ancient Admixture in Human History,”, Genetics, 192, 1065-1093 (2012) · doi:10.1534/genetics.112.145037
[50] Pearl, J., “Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach, 133-136 (1982)
[51] Prüfer, K.; Racimo, F.; Patterson, N.; Jay, F.; Sankararaman, S.; Sawyer, S.; Heinze, A.; Renaud, G.; Sudmant, P. H.; De Filippo, C.; Li, H., “The Complete Genome Sequence of a Neanderthal From the Altai Mountains,”, Nature, 505, 43-49 (2014) · doi:10.1038/nature12886
[52] Raghavan, M.; Skoglund, P.; Graf, K. E.; Metspalu, M.; Albrechtsen, A.; Moltke, I.; Rasmussen, S.; Stafford, T. W. Jr.; Orlando, L.; Metspalu, E.; Karmin, M., “Upper Palaeolithic Siberian Genome Reveals Dual Ancestry of Native Americans,”, Nature, 505, 87-91 (2014) · doi:10.1038/nature12736
[53] Saitou, N.; Nei, M., “The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees,”, Molecular Biology and Evolution, 4, 406-425 (1987) · doi:10.1093/oxfordjournals.molbev.a040454
[54] Sawyer, S. A.; Hartl, D. L., “Population Genetics of Polymorphism and Divergence,”, Genetics, 132, 1161-1176 (1992)
[55] Scally, A., “The Mutation Rate in Human Evolution and Demographic Inference,”, Current Opinion in Genetics & Development, 41, 36-43 (2016) · doi:10.1016/j.gde.2016.07.008
[56] Schaffner, S. F.; Foo, C.; Gabriel, S.; Reich, D.; Daly, W. J.; Altshuler, D., “Calibrating a Coalescent Simulation of Human Genome Sequence Variation,”, Genome Research, 15, 1576-1583 (2005) · doi:10.1101/gr.3709305
[57] Stephens, M.; Donnelly, P., “Inference in Molecular Population Genetics,”, Journal of the Royal Statistical Society, Series B, 62, 605-655 (2000) · Zbl 0962.62107 · doi:10.1111/1467-9868.00254
[58] Tajima, F., “Statistical Method for Testing the Neutral Mutation Hypothesis by DNA Polymorphism,”, Genetics, 123, 585-595 (1989)
[59] Takahata, N., “The Coalescent in Two Partially Isolated Diffusion Populations,”, Genetics Research, 52, 213-222 (1988) · doi:10.1017/S0016672300027683
[60] Terhorst, J.; Kamm, J. A.; Song, Y. S., “Robust and Scalable Inference of Population History From Hundreds of Unphased Whole Genomes,”, Nature Genetics, 49, 303-309 (2017) · doi:10.1038/ng.3748
[61] Terhorst, J.; Song, Y. S., “Fundamental Limits on the Accuracy of Demographic Inference Based on the Sample Frequency Spectrum,”, Proceedings of the National Academy of Sciences of the United States of America, 112, 7677-7682 (2015) · doi:10.1073/pnas.1503717112
[62] Wakeley, J.; Hey, J., “Estimating Ancestral Population Parameters,”, Genetics, 145, 847-855 (1997)
[63] Watterson, G., “On the Number of Segregating Sites in Genetical Models Without Recombination,”, Theoretical Population Biology, 7, 256-276 (1975) · Zbl 0294.92011 · doi:10.1016/0040-5809(75)90020-9
[64] Wegmann, D.; Leuenberger, C.; Neuenschwander, S.; Excoffier, L., “ABCtoolbox: A Versatile Toolkit for Approximate Bayesian Computations,”, BMC Bioinformatics, 11, 116 (2010) · doi:10.1186/1471-2105-11-116
[65] Zeng, K.; Fu, Y.-X.; Shi, S.; Wu, C.-I., “Statistical Tests for Detecting Positive Selection by Utilizing High-Frequency Variants,”, Genetics, 174, 1431-1439 (2006) · doi:10.1534/genetics.106.061432
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.