×

Distributions of topological tree metrics between a species tree and a gene tree. (English) Zbl 1400.62296

Summary: In order to conduct a statistical analysis on a given set of phylogenetic gene trees, we often use a distance measure between two trees. In a statistical distance-based method to analyze discordance between gene trees, it is a key to decide “biologically meaningful” and “statistically well-distributed” distance between trees. Thus, in this paper, we study the distributions of the three tree distance metrics: the edge difference, the path difference, and the precise \(K\) interval cospeciation distance, between two trees: First, we focus on distributions of the three tree distances between two random unrooted trees with \(n\) leaves (\(n \geq 4\)); and then we focus on the distributions the three tree distances between a fixed rooted species tree with \(n\) leaves and a random gene tree with \(n\) leaves generated under the coalescent process with the given species tree. We show some theoretical results as well as simulation study on these distributions.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
05C90 Applications of graph theory

Software:

Mesquite; kdetrees; ape

References:

[1] Allen, B., Steel, M. (2001). Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics, 5(1), 1-15. · Zbl 0978.05023
[2] Arnaoudova, E., Haws, D., Huggins, P., Jaromczyk, J. W., Moore, N., Schardl, C., et al. (2010). Statistical phylogenetic tree analysis using differences of means. Frontier Psychiatry, 1(47).
[3] Betancur, R., Li, C., Munroe, T., Ballesteros, J., Ortí, G. (2013). Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (teleostei: Pleuronectiformes). Systematic Biology,. doi:10.1093/sysbio/syt039.
[4] Bollback, J., Huelsenbeck, J. (2009). Parallel genetic evolution within and between bacteriophage species of varying degrees of divergence. Genetics, 181(1), 225-234.
[5] Brito, P., Edwards, S. (2009). Multilocus phylogeography and phylogenetics using sequence-based markers. Genetica, 135, 439-455. · Zbl 1040.92032
[6] Brodal, G., Fagerberg, R., Pedersen, C. N. (2001). Computing the quartet distance between evolutionary trees in time nlog2n. Algorithmica, 731-742. · Zbl 1077.92513
[7] Carling, M., Brumfield, R. (2008). Integrating phylogenetic and population genetic analyses of multiple loci to test species divergence hypotheses in passerina buntings. Genetics, 178, 363-377.
[8] Carstens, B. C., Knowles, L. L. (2007). Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from melanoplus grasshoppers. Systematic Biology, 56, 400-411.
[9] Coons, J. Rusinko, J. (2014). Combinatorics of k-interval cospeciation for cophylogeny. http://arxiv.org/pdf/1407.6605.pdf (preprint)
[10] Dasgupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L. (1997). On computing the nearest neighbor interchange distance. In Proceedings of DIMACS Workshop on Discrete Problems with Medical Applications (pp. 125-143) (press). · Zbl 1133.92347
[11] Degnan, J., Salter, L. (2005a). Gene tree distribtutions under the coalescent process. Evolution, 59(1), 24-37.
[12] Degnan, J. H., Salter, L. A. (2005b). Gene tree distributions under the coalescent process. Evolution, 59, 24-37.
[13] Edwards, S. (2009). Is a new and general theory of molecular systematics emerging? Evolution, 63, 1-19. · doi:10.1111/j.1558-5646.2008.00549.x
[14] Edwards, S., Liu, L., Pearl, D. (2007). High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences USA, 104, 5936-5941.
[15] Graham, M., Kennedy, J. (2010). A survey of multiple tree visualisation. Information Visualization, 9, 235-252.
[16] Heled, J., Drummond, A. (2011). Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution, 27(3), 570-580.
[17] Hickey, G., Dehne, F., Rau-Chaplin, A., Blouin, C. (2008). SPR distance computation for unrooted trees. Evolutionary Bioinformatics Online, 4, 17-27.
[18] Hillis, D. M., Heath, T. A., St. John, K. (2005). Analysis and visualization of tree space. Systematic Biology, 54(3), 471-482.
[19] Holmes, S. (2007). Statistical Approach to Tests Involving Phylogenies. New York: Oxford University Press. · Zbl 1090.62123
[20] Huggins, P., Owen, M., Yoshida, R. (2012). First steps toward the geometry of cophylogeny. In The Proceedings of the Second CREST-SBM International Conference “Harmony of Gröbner Bases and the Modern Industrial Society” (pp. 99-116). · Zbl 1338.92078
[21] Maddison, W. P. (1997). Gene trees in species trees. Systematic Biology, 46(3), 523-536. · doi:10.1093/sysbio/46.3.523
[22] Maddison, W. P., Knowles, L. L. (2006). Inferring phylogeny despite incomplete lineage sorting. Systematic Biology, 55, 21-30.
[23] Maddison, W. P. Maddison, D. R. (2011). Mesquite: a modular system for evolutionary analysis. version 2.75. · Zbl 0222.65016
[24] Mossel, E., Roch, S. (2010). Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(1), 166-171.
[25] Pamilo, P., Nei, M. (1988). Relationships between gene trees and species trees. Molecular Biology and Evolution, 5, 568-583. · Zbl 0555.92011
[26] Paradis, E., Claude, J., Strimmer, K. (2004). APE: analyses of phylogenetics and evolution in R language. Bioinformatics, 20, 289-290.
[27] Robinson, D. F., Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical Biosciences, 53, 131-147. · Zbl 0451.92006
[28] Rosenberg, N. (2002). The probability of topological concordance of gene trees and species trees. Theoretical Population Biology, 61, 225-247. · Zbl 1040.92032 · doi:10.1006/tpbi.2001.1568
[29] Rosenberg, N. A. (2003). The shapes of neutral gene genealogies in two species: probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution, 57, 1465-1477. · doi:10.1111/j.0014-3820.2003.tb00355.x
[30] RoyChoudhury, A., Felsenstein, J., Thompson, E. A. (2008). A two-stage pruning algorithm for likelihood computation for a population tree. Genetics, 180, 1095-1105.
[31] Semple, C. Steel, M. (2003). Phylogenetics, vol. 24 of Oxford Lecture Series in mathematics and its applications. Oxford: Oxford University Press. · Zbl 1043.92026
[32] Steel, M., Penny, D. (1993). Distributions of tree comparison metrics-some new results. Systematic Biology, 42(2), 126-141.
[33] Takahata, N. (1989). Gene genealogy in 3 related populations: consistency probability between gene and population trees. Genetics, 122, 957-966.
[34] Takahata, N., Nei, M. (1990). Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics, 124, 967-978.
[35] Tavaré, S. (1984). Line-of-descent and genealogical processes, and their applications in population genetics models. Theoretical Population Biology, 26, 119-164. · Zbl 0555.92011 · doi:10.1016/0040-5809(84)90027-3
[36] Thompson, K., Kubatko, L. (2013). Using ancestral information to detect and localize quantitative trait loci in genome-wide association studies. BMC Bioinformatics, 14, 200.
[37] Weyenberg, G., Huggins, P., Schardl, C., Howe, D., Yoshida, R. (2014). kdetrees: non-parametric estimation of phylogenetic tree distributions. Bioinformatics, 30(16), 2280-2287.
[38] Williams, W. T., Clifford, H. T. (1971). On the comparison of two classifications of the same set of elements. Taxon, 20, 519-522.
[39] Yu, Y., Warnow, T., Nakhleh, L. (2011). Algorithms for mdc-based multi-locus phylogeny inference: Beyond rooted binary gene trees on single alleles. Journal of Computational Biology, 18(11), 1543-1559. · Zbl 0555.92011
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.