×

Anomalous networks under the multispecies coalescent: theory and prevalence. (English) Zbl 1533.92146

Summary: Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the network multispecies coalescent model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of \(3_2\)-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.

MSC:

92D15 Problems related to evolution
05C90 Applications of graph theory
60J90 Coalescent processes
Full Text: DOI

References:

[1] Allman, ES; Degnan, JH; Rhodes, JA, Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent, J Math Biol, 62, 6, 833-862, (2011) · Zbl 1230.92033 · doi:10.1007/s00285-010-0355-7
[2] Allman, ES; Baños, H.; Rhodes, JA, NANUQ: a method for inferring species networks from gene trees under the coalescent model, Algor Mole Biol, (2019) · doi:10.1186/s13015-019-0159-2
[3] Allman, ES; Baños, H.; Mitchell, JD; Rhodes, JA, The tree of blobs of a species network: identifiability under the coalescent, J Math Biol, 86, 1, 10, (2023) · Zbl 1505.92135 · doi:10.1007/s00285-022-01838-9
[4] Ané C (2023) QuartetNetworkGoodnessFit: a Julia package for phylogenetic networks analyses using four-taxon subsets, v0.5.0. https://github.com/cecileane/QuartetNetworkGoodnessFit.jl
[5] Baños, H., Identifying species network features from gene tree quartets under the coalescent model, Bull Math Biol, 81, 2, 494-534, (2019) · Zbl 1410.92070 · doi:10.1007/s11538-018-0485-4
[6] Bernardini, G.; van Iersel, L.; Julien, E.; Stougie, L., Constructing phylogenetic networks via cherry picking and machine learning, Algor Mole Biol, 18, 13, (2023) · doi:10.1186/s13015-023-00233-3
[7] Blair, C.; Ané, C., Phylogenetic trees and networks can serve as powerful and complementary approaches for analysis of genomic data, Syst Biol, 69, 3, 593-601, (2020) · doi:10.1093/sysbio/syz056
[8] Bokma, F.; van den Brink, V.; Stadler, T., Unexpectedly many extinct hominins, Evolution, 66, 9, 2969-2974, (2012) · doi:10.1111/j.1558-5646.2012.01660.x
[9] Bordewich, M.; Huber, KT; Moulton, V.; Semple, C., Recovering normal networks from shortest inter-taxa distance information, J Math Biol, 77, 3, 571-594, (2018) · Zbl 1406.05098 · doi:10.1007/s00285-018-1218-x
[10] Byrne S, Lai R (2022)et al. RCall: a Julia package to call R from Julia, v0.13.13. https://github.com/JuliaInterop/RCall.jl
[11] Cardona, G.; Rosselló, F.; Valiente, G., Extended newick: it is time for a standard representation of phylogenetic networks, BMC Bioinform, 9, 1, 532, (2008) · doi:10.1186/1471-2105-9-532
[12] Chifman, J.; Kubatko, L., Quartet inference from SNP data under the coalescent model, Bioinformatics, 30, 23, 3317-3324, (2014) · doi:10.1093/bioinformatics/btu530
[13] Degnan, JH; Rosenberg, NA, Discordance of species trees with their most likely gene trees, PLoS Genet, 2, 5, 1-7, (2006) · doi:10.1371/journal.pgen.0020068
[14] Degnan, JH; Salter, LA, Gene tree distributions under the coalescent process, Evolution, 59, 1, 24-37, (2005) · doi:10.1111/j.0014-3820.2005.tb00891.x
[15] Elworth RAL, Ogilvie HA, Zhu J, Nakhleh L (2019) Advances in computational methods for phylogenetic networks in the presence of hybridization. In T. Warnow, editor, Bioinformatics and Phylogenetics: seminal Contributions of Bernard Moret, pages 317-360, Cham. Springer International Publishing. doi:10.1007/978-3-030-10837-3_13
[16] Fogg, J.; Allman, ES; Ané, C., PhyloCoalSimulations: a simulator for network multispecies coalescent models, including a new extension for the inheritance of gene flow, Syst Biol, 72, 5, 1171-1179, (2023) · doi:10.1093/sysbio/syad030
[17] Gerard, D.; Gibbs, HL; Kubatko, L., Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling, BMC Evolut Biol, (2011) · doi:10.1186/1471-2148-11-291
[18] Hahn MW (2018) Molecular population genetics. Sinauer Associates/Oxford University Press, ISBN 978-0878939657
[19] Hartmann, K.; Wong, D.; Stadler, T., Sampling trees from evolutionary models, Syst Biol, 59, 4, 465-476, (2010) · doi:10.1093/sysbio/syq026
[20] Huber, KT; Moulton, V., Phylogenetic networks from multi-labelled trees, J Math Biol, 52, 5, 613-632, (2006) · Zbl 1110.92027 · doi:10.1007/s00285-005-0365-z
[21] Huber, KT; Moulton, V.; Steel, M.; Wu, T., Folding and unfolding phylogenetic trees and networks, J Math Biol, 73, 6, 1761-1780, (2016) · Zbl 1348.05200 · doi:10.1007/s00285-016-0993-5
[22] Huson DH, Rupp R, Scornavacca C (2010) Phylogenetic networks: concepts, algorithms and applications. Cambridge University Press, Cambridge, doi:10.1017/CBO9780511974076
[23] Jiao, X.; Yang, Z., Defining species when there is gene flow, Syst Biol, 70, 1, 108-119, (2021) · doi:10.1093/sysbio/syaa052
[24] Justison JA, Heath TA (2022) Exploring the distribution of phylogenetic networks generated under a birth-death-hybridization process. bioRxiv, doi:10.1101/2022.11.10.516033
[25] Justison, JA; Solís-Lemus, C.; Heath, TA, SiPhyNetwork: an R package for simulating phylogenetic networks, Methods Ecol Evol, (2023) · doi:10.1111/2041-210X.14116
[26] Kong S, Swofford DL, Kubatko LS (2022) Inference of phylogenetic networks from sequence data using composite likelihood. bioRxiv, doi:10.1101/2022.11.14.516468
[27] Kubatko, LS; Degnan, JH, Inconsistency of phylogenetic estimates from concatenated data under coalescence, Syst Biol, 56, 1, 17-24, (2007) · doi:10.1080/10635150601146041
[28] Larget, B.; Kotha, S.; Dewey, C.; Ané, C., BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis, Bioinformatics, 26, 22, 2910-2911, (2010) · doi:10.1093/bioinformatics/btq539
[29] Long, C.; Kubatko, L., The effect of gene flow on coalescent-based species-tree inference, Syst Biol, 67, 5, 770-785, (2018) · doi:10.1093/sysbio/syy020
[30] Lutteropp, S.; Scornavacca, C.; Kozlov, AM; Morel, B.; Stamatakis, A., NetRAX: accurate and fast maximum likelihood phylogenetic network inference, Bioinformatics, 38, 15, 3725-3733, (2022) · doi:10.1093/bioinformatics/btac396
[31] Maddison, WP, Gene trees in species trees, Syst Biol, 46, 3, 523-536, (1997) · doi:10.1093/sysbio/46.3.523
[32] Maier R, Flegontov P, Flegontova O, Changmai P, Reich D (2022) On the limits of fitting complex models of population history to genetic data. bioRxiv, doi:10.1101/2022.05.08.491072
[33] Oldman, J.; Wu, T.; van Iersel, L.; Moulton, V., TriLoNet: piecing together small networks to reconstruct reticulate evolutionary histories, Mol Biol Evol, 33, 8, 2151-2162, (2016) · doi:10.1093/molbev/msw068
[34] Pamilo, P.; Nei, M., Relationships between gene trees and species trees, Mol Biol Evol, 5, 5, 568-583, (1988)
[35] Patterson, N.; Moorjani, P.; Luo, Y.; Mallick, S.; Rohland, N.; Zhan, Y.; Genschoreck, T.; Webster, T.; Reich, D., Ancient admixture in human history, Genetics, 192, 3, 1065-1093, (2012) · doi:10.1534/genetics.112.145037
[36] Rabier, C-E; Berry, V.; Stoltz, M.; Santos, JD; Wang, W.; Glaszmann, J-C; Pardi, F.; Scornavacca, C., On the inference of complex phylogenetic networks by Markov chain Monte-Carlo, PLoS Comput Biol, 17, 1-39, (2021) · doi:10.1371/journal.pcbi.1008380
[37] Simmons, MP; Gatesy, J., Coalescence vs. concatenation: sophisticated analyses vs. first principles applied to rooting the angiosperms, Mole Phylogen Evol, 91, 98-122, (2015) · doi:10.1016/j.ympev.2015.05.011
[38] Solís-Lemus, C.; Ané, C., Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting, PLoS Genet, 12, 3, e1005896, (2016) · doi:10.1371/journal.pgen.1005896
[39] Solís-Lemus, C.; Yang, M.; Ané, C., Inconsistency of species tree methods under gene flow, Syst Biol, 65, 5, 843-851, (2016) · doi:10.1093/sysbio/syw030
[40] Solís-Lemus, C.; Bastide, P.; Ané, C., PhyloNetworks: a package for phylogenetic networks, Mol Biol Evol, 34, 12, 3292-3298, (2017) · doi:10.1093/molbev/msx235
[41] Stadler, T.; Degnan, JH; Rosenberg, NA, Does gene tree discordance explain the mismatch between macroevolutionary models and empirical patterns of tree shape and branching times?, Syst Biol, 65, 4, 628-639, (2016) · doi:10.1093/sysbio/syw019
[42] Steel, M., Phylogeny: discrete and random processes in evolution, Soci Ind Appl Math, 10, 1137-1, 9781611974485, (2016) · Zbl 1361.92001
[43] Tarjan, R., Depth-first search and linear graph algorithms, SIAM J Comput, 1, 2, 146-160, (1972) · Zbl 0251.05107 · doi:10.1137/0201010
[44] Tricou, T.; Tannier, E.; de Vienne, DM, Ghost lineages highly influence the interpretation of introgression tests, Syst Biol, 71, 5, 1147-1158, (2022) · doi:10.1093/sysbio/syac011
[45] Van Iersel, L.; Jones, M.; Scornavacca, C., Improved maximum parsimony models for phylogenetic networks, Syst Biol, 67, 3, 518-542, (2018) · doi:10.1093/sysbio/syx094
[46] Wakeley J (2008) Coalescent theory: an introduction, volume 58. Roberts and Company Publishers, ISBN 0974707759. doi:10.1093/schbul/syp004
[47] Wu, Y., Inference of population admixture network from local gene genealogies: a coalescent-based maximum likelihood approach, Bioinformatics, 36, Supplement 1, i326-i334, (2020) · doi:10.1093/bioinformatics/btaa465
[48] Xu, J.; Ané, C., Identifiability of local and global features of phylogenetic networks from average distances, J Math Biol, 86, 1, 12, (2023) · Zbl 1505.92143 · doi:10.1007/s00285-022-01847-8
[49] Yu, Y.; Nakhleh, L., A maximum pseudo-likelihood approach for phylogenetic networks, BMC Gen., 16, 10, S10, (2015) · doi:10.1186/1471-2164-16-S10-S10
[50] Yu, Y.; Degnan, JH; Nakhleh, L., The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection, PLoS Genet, 8, 4, 1-10, (2012) · doi:10.1371/journal.pgen.1002660
[51] Yu, Y.; Dong, J.; Liu, KJ; Nakhleh, L., Maximum likelihood inference of reticulate evolutionary histories, Proc Natl Acad Sci, 111, 46, 16448-16453, (2014) · doi:10.1073/pnas.1407950111
[52] Zhang, C.; Ogilvie, HA; Drummond, AJ; Stadler, T., Bayesian inference of species networks from multilocus sequence data, Mol Biol Evol, 35, 2, 504-517, (2017) · doi:10.1093/molbev/msx307
[53] Zhang, C.; Rabiee, M.; Sayyari, E.; Mirarab, S., ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees, BMC Bioinform, 19, 6, 153, (2018) · doi:10.1186/s12859-018-2129-y
[54] Zhu, J.; Yu, Y.; Nakhleh, L., In the light of deep coalescence: revisiting trees within networks, BMC Bioinform, 17, Suppl 14, 415, (2016) · doi:10.1186/s12859-016-1269-1
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.