×

Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent. (English) Zbl 1357.92057

Summary: The focus of this article is a Bayesian method for inferring both species delimitations and species trees under the multispecies coalescent model using molecular sequences from multiple loci. The species delimitation requires no a priori assignment of individuals to species, and no guide tree. The method is implemented in a package called STACEY for BEAST2, and is a extension of the author’s DISSECT package. Here we demonstrate considerable efficiency improvements by using three new operators for sampling from the posterior using the Markov chain Monte Carlo algorithm, and by using a model for the population size parameters along the branches of the species tree which allows these parameters to be integrated out. The correctness of the moves is demonstrated by tests of the implementation. The practice of using a pipeline approach to species delimitation under the multispecies coalescent, has been shown to have major problems on simulated data [M. Olave et al., “Upstream analyses create problems with DNA-based species delimitation”, Syst. Biol. 63, No. 2, 263–271 (2014; doi:10.1093/sysbio/syt106)]. The same simulated data set is used to demonstrate the accuracy and improved convergence of the present method. We also compare performance with *BEAST for a fixed delimitation analysis on a large data set, and again show improved convergence.

MSC:

92D15 Problems related to evolution
92D10 Genetics and epigenetics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference

Software:

CODA; BEAST; STRUCTURE

References:

[1] Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014) BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Comput Biol 10(4):e1003,537. doi:10.1371/journal.pcbi.1003537 · doi:10.1371/journal.pcbi.1003537
[2] Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24:332-340 · doi:10.1016/j.tree.2009.01.009
[3] Edwards SV (2009) Is a new and general theory of molecular systematics emerging? Evolution 63:1-19 · doi:10.1111/j.1558-5646.2008.00549.x
[4] Felsenstein J (2003) Inferring phylogenies. Sinauer Associates, Sunderland. doi:10.1016/S0022-0000(02)00003-X · Zbl 1058.68529 · doi:10.1016/S0022-0000(02)00003-X
[5] Flot JF (2015) Species delimitation’s coming of age. Syst Biol 64(6):897-899 · doi:10.1093/sysbio/syv071
[6] Giarla T, Esselstyn J (2015) The challenges of resolving a rapid, recent radiation: empirical and simulated phylogenomics of Philippine shrews. Syst Biol 64(5):727-740. doi:10.1093/sysbio/syv029 · doi:10.1093/sysbio/syv029
[7] Heled J, Drummond A (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27:570-580 · doi:10.1093/molbev/msp274
[8] Hey J, Nielsen R (2007) Integration within the felsenstein equation for improved markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci 104:2785-2790 · doi:10.1073/pnas.0611164104
[9] Höhna S, Defoin-Platel M, Drummond AJ (2008) Clock-constrained tree proposal operators in Bayesian phylogenetic inference. In: 8th IEEE international conference on bioinformatics and bioengineering, Athens, Greece, pp 1-7, 8-10 Oct 2008
[10] Huang H, He Q, Kubatko LS, Knowles LL (2010) Sources of error for species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. Syst Biol 59:573-583 · doi:10.1093/sysbio/syq047
[11] Huelsenbeck JP, Andolfatto P (2007) Inference of population structure under a Dirichlet process model. Genetics 175:1787-1802 · doi:10.1534/genetics.106.061317
[12] Jones G, Aydin Z, Oxelman B (2014) DISSECT: an assignment-free Bayesian discovery method for species delimitation under the multispecies coalescent. Bioinformatics. doi:10.1093/bioinformatics/btu770 · doi:10.1093/bioinformatics/btu770
[13] Liu L, Pearl DK, Brumfield RT, Edwards SV (2008) Estimating species trees using multiple allele DNA sequence data. Evolution 62(8):2080-2091 · doi:10.1111/j.1558-5646.2008.00414.x
[14] Olave M, Solà E, Knowles LL (2014) Upstream analyses create problems with DNA-based species delimitation. Syst Biol 63:263-271. doi:10.1093/sysbio/syt106 · doi:10.1093/sysbio/syt106
[15] Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6(1), 7-11. http://CRAN.R-project.org/doc/Rnews/
[16] Pritchard JK, Stephens M, Donnelly PJ (2000) Inference of population structure using multilocus genotype data. Genetics 155:945-959
[17] Rannala B (2015) The art and science of species delimitation. Curr Zool 61:846-853 · doi:10.1093/czoolo/61.5.846
[18] Rannala B, Yang Z (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164:1645-1656
[19] Rannala B, Yang Z (2013) Improved reversible jump algorithms for Bayesian species delimitation. Genetics 194:245-253 · doi:10.1534/genetics.112.149039
[20] Solís-Lemus C, Knowles LL, Ane C (2015) Bayesian species delimitation combining multiple genes and traits in a unified framework. Evolution 69:492-507 · doi:10.1111/evo.12582
[21] Yang Z (2002) Likelihood and Bayes estimation of ancestral population sizes in hominoids using data from multiple loci. Genetics 162:1811-1823
[22] Yang Z, Rannala B (2010) Bayesian species delimitation using multilocus sequence data. Proc Natl Acad Sci USA 107:9264-9269 · doi:10.1073/pnas.0913022107
[23] Yang Z, Rannala B (2014) Unguided species delimitation using DNA sequence data from multiple loci. Mol Biol Evol 31(12):3125-3135. doi:10.1093/molbev/msu279 · doi:10.1093/molbev/msu279
[24] Zhang C, Rannala B, Yang Z (2014) Bayesian species delimitation can be robust to guide-tree inference errors. Syst Biol 63:993-1004. doi:10.1093/sysbio/syu052 · doi:10.1093/sysbio/syu052
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.