×

Squaring within the Colless index yields a better balance index. (English) Zbl 1457.92123

Summary: The Colless index for bifurcating phylogenetic trees, introduced by D. Colless [“Review of phylogenetics: the theory and practice of phylogenetic systematics”, Syst. Zool. 31, 100-104 (1982)], is defined as the sum, over all internal nodes \(v\) of the tree, of the absolute value of the difference of the sizes of the clades defined by the children of \(v\). It is one of the most popular phylogenetic balance indices, because, in addition to measuring the balance of a tree in a very simple and intuitive way, it turns out to be one of the most powerful and discriminating phylogenetic shape indices. But it has some drawbacks. On the one hand, although its minimum value is reached at the so-called maximally balanced trees, it is almost always reached also at trees that are not maximally balanced. On the other hand, its definition as a sum of absolute values of differences makes it difficult to study analytically its distribution under probabilistic models of bifurcating phylogenetic trees. In this paper we show that if we replace in its definition the absolute values of the differences of clade sizes by the squares of these differences, all these drawbacks are overcome and the resulting index is still more powerful and discriminating than the original Colless index.

MSC:

92D15 Problems related to evolution

Software:

SackinMinimizer

References:

[1] Evolution, Science and Society: Evolutionary Biology and the National Research Agenda (1999), The State University of New Jersey
[2] Kubo, T.; Iwasa, Y., Inferring the rates of branching and extinction from molecular phylogenies, Evolution, 49, 694-704 (1995)
[3] Mooers, A. O.; Heard, S. B., Inferring evolutionary process from phylogenetic tree shape, Q. Rev. Biol., 72, 31-54 (1997)
[4] Stich, M.; Manrubia, S., Topological properties of phylogenetic trees in evolutionary models, Eur. Phys. J. B, 70, 583-592 (2009) · Zbl 1188.05154
[5] Felsenstein, J., Inferring Phylogenies (2004), Sinauer Associates
[6] Drummond, A.; Ho, S. Y.W., Relaxed phylogenetics and dating with confidence, PLoS Biol., 4, Article e88 pp. (2006)
[7] Brower, A.; Rindal, E., Reality check: A reply to smith, Cladistics, 29, 464-465 (2013)
[8] Hillis, D.; Bull, J.; White, M., Experimental phylogenetics: Generation of a known phylogeny, Science, 255, 589-592 (1992)
[9] Rindal, E.; Brower, A., Do model-based phylogenetic analyses perform better than parsimony? A test with empirical data, Cladistics, 27, 331-334 (2011)
[10] Fusco, G.; Cronk, Q. C., A new method for evaluating the shape of large phylogenies, J. Theoret. Biol., 175, 235-243 (1995)
[11] Shao, K.; Sokal, R., Tree balance, Syst. Zool., 39, 266-276 (1990)
[12] McKenzie, A.; Steel, M., Distributions of cherries for two models of trees, Math. Biosci., 164, 81-92 (2000) · Zbl 0947.92021
[13] Savage, H. M., The shape of evolution: Systematic tree topology, Biol. J. Linnean Soc., 20, 225-244 (1983)
[14] Slowinski, J., Probabilities of \(n\)-trees under two models: A demonstration that asymmetrical interior nodes are not improbable, Syst. Zool., 39, 89-94 (1990)
[15] Wu, T.; Choi, K., On joint subtree distributions under two evolutionary models, Theor. Popul. Biol., 108, 13-23 (2015) · Zbl 1343.92371
[16] Yule, G. U., A mathematical theory of evolution based on the conclusions of Dr J.C. Willis, Philos. Trans. R. Soc. London B, 213, 21-87 (1924)
[17] Nelson, M. I.; Holmes, E. C., The evolution of epidemic influenza, Nature Rev. Genet., 8, 196-205 (2007)
[18] Colless, D., Review of Phylogenetics: the theory and practice of phylogenetic systematics, Syst. Zool., 31, 100-104 (1982)
[19] Coronado, T. M.; Mir, A.; Rosselló, F.; Valiente, G., A balance index for phylogenetic trees based on rooted quartets, J. Math. Biol., 79, 1105-1148 (2019) · Zbl 1416.05069
[20] Fischer, M.; Liebscher, V., On the balance of unrooted trees (2015), arXiv preprint arXiv:1510.07882
[21] Kirkpatrick, M.; Slatkin, M., Searching for evolutionary patterns in the shape of a phylogenetic tree, Evolution, 47, 1171-1181 (1993)
[22] Mir, A.; Rosselló, F.; Rotger, L., A new balance index for phylogenetic trees, Math. Biosci., 241, 125-136 (2013) · Zbl 1303.92084
[23] Mir, A.; Rosselló, F.; Rotger, L., Sound Colless-like balance indices for multifurcating trees, PLoS ONE, 13, Article e0203401 pp. (2018)
[24] Sackin, M., Good and bad phenograms, Syst. Zool., 21, 225-226 (1972)
[25] Aldous, D., Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today, Statist. Sci., 16, 23-34 (2001) · Zbl 1127.60313
[26] Blum, M.; François, O., On statistical tests of phylogenetic tree imbalance: The Sackin and other indices revisited, Math. Biosci., 195, 141-153 (2005) · Zbl 1065.62183
[27] Duchene, S.; Bouckaert, R., Phylodynamic model adequacy using posterior predictive simulations, Syst. Biol., 68, 358-364 (2018)
[28] Purvis, A., Using interspecies phylogenies to test macroevolutionary hypotheses, (New Uses for New Phylogenies (1996), Oxford University Press), 153-168
[29] Verboom, G.; Boucher, F.; Ackerly, D., Species selection regime and phylogenetic tree shape, Syst. Biol., 69, 774-794 (2020)
[30] Colless, D., Relative symmetry of cladograms and phenograms: An experimental study, Syst. Biol., 44, 102-108 (1995)
[31] Farris, J.; Källersjö, M., Asymmetry and explanations, Cladistics, 14, 159-166 (1998)
[32] Holton, T.; Wilkinson, M.; Pisani, D., The shape of modern tree reconstruction methods, Syst. Biol., 63, 436-441 (2014)
[33] Sober, E., Experimental tests of phylogenetic inference methods, Syst. Biol., 42, 85-89 (1993)
[34] Stam, E., Does imbalance in phylogenies reflect only bias?, Evolution, 56, 1292-1295 (2002)
[35] Avino, M.; Garway, T. N., Tree shape-based approaches for the comparative study of cophylogeny, Ecol. Evol., 9, 6756-6771 (2019)
[36] Goloboff, P.; Arias, J.; Szumik, C., Comparing tree shapes: beyond symmetry, Zool. Scripta, 46, 637-648 (2017)
[37] Kayondo, H.; Mwalili, S.; Mango, J., Inferring multi-type birth-death parameters for a structured host population with application to HIV epidemic in Africa, Comput. Mol. Biosci., 9, 108-131 (2019)
[38] Poon, A. F., Phylodynamic inference with kernel ABC and its application to HIV epidemiology, Mol. Biol. Evol., 32, 2483-2495 (2015)
[39] Saulnier, E.; Alizon, S.; Gascuel, O., Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study, PLoS Comput. Biol., 13, Article e1005416 pp. (2017)
[40] Chalmandrier, L.; Albouy, C., Comparing spatial diversification and meta-population models in the Indo-Australian Archipelago, Royal Soc. Open Sci., 5, Article 171366 pp. (2018)
[41] Cunha, T.; G. Giribet, G., A congruent topology for deep gastropod relationships, Proc. R. Soc. B, 286, Article 20182776 pp. (2019)
[42] Metzig, C.; Ratmann, O.; Bezemer, D.; Colijn, C., Phylogenies from dynamic networks, PLoS Comput. Biol., 15, Article e1006761 pp. (2019)
[43] Purvis, A.; Fritz, S.; Rodríguez, J., The shape of mammalian phylogeny: Patterns, processes and scales, Phil. Trans. R. Soc. B, 366, 2462-2477 (2011)
[44] Agapow, P.; Purvis, A., Power of eight tree shape statistics to detect nonrandom diversification: A comparison by simulation of two models of cladogenesis, Syst. Biol., 51, 866-872 (2002)
[45] Matsen, F., A geometric approach to tree shape statistics, Syst. Biol., 55, 652-661 (2006)
[46] Hayati, M.; Shadgar, B.; Chindelevitch, L., A new resolution function to evaluate tree shape statistics, PLoS One, 14, Article e0224197 pp. (2019)
[47] Blum, M.; François, O.; Janson, S., The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance, Ann. Appl. Probab., 16, 2195-2214 (2006) · Zbl 1124.05025
[48] Cardona, G.; Mir, A.; Rosselló, F., Exact formulas for the variance of several balance indices under the Yule model, J. Math. Biol., 67, 1833-1846 (2013) · Zbl 1281.92051
[49] Ford, D., Probabilities on Cladograms: Introduction to the Alpha Model (2005), Stanford University, arXiv preprint arXiv:math/0511246
[50] Heard, S. B., Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees, Evolution, 46, 1818-1826 (1992)
[51] Coronado, T. M.; Fischer, M.; Herbst, L.; Rosselló, F.; Wicke, K., On the minimum value of the Colless index and the bifurcating trees that achieve it, J. Math. Biol., 80, 1993-2054 (2020) · Zbl 1443.92126
[52] Harding, E., The probabilities of rooted tree-shapes generated by random bifurcation, Adv. Appl. Probab., 3, 44-77 (1971) · Zbl 0241.92012
[53] Cavalli-Sforza, L. L.; Edwards, A., Phylogenetic analysis: Models and estimation procedures, Evolution, 21, 550-570 (1967)
[54] Rosen, D. E., Vicariant patterns and historical explanation in biogeography, Syst. Biol., 27, 159-188 (1978)
[55] Steel, M., Phylogeny: Discrete and random processes in evolution, SIAM (2016) · Zbl 1361.92001
[56] Rogers, J. S., Response of Colless’s tree imbalance to number of terminal taxa, Syst. Biol., 42, 102-105 (1993)
[57] Coronado, T. M.; Mir, A.; Rosselló, F.; Rotger, L., On Sackin’s original proposal: The variance of the leaves’ depths as a phylogenetic balance index, BMC Bioinformatics, 21, 154 (2020)
[58] Graham, R.; Knuth, D.; Patashnik, O., Concrete Mathematics (1994), Addison-Wesley · Zbl 0836.00001
[59] Aldous, D., Probability distributions on cladograms, (Aldous, D.; Pemantle, R., Random Discrete Structures (1996), Springer-Verlag), 1-18 · Zbl 0841.92015
[60] Bartoszek, K., Limit distribution of the quartet balance index for Aldous’s \(\beta \geqslant 0\)-model, Appl. Math., 47, 29-44 (2020) · Zbl 1447.05181
[61] Bartoszek, K., Exact and approximate limit behaviour of the Yule tree’s cophenetic index, Math. Biosci., 303, 26-45 (2018) · Zbl 1405.92190
[62] Fischer, M., Extremal values of the Sackin balance index for rooted binary trees (2018), arXiv preprint arXiv:1801.10418
[63] Wei, C.; Gong, D.; Wang, Q., Chu-Vandermonde convolution and harmonic number identities Chu-Vandermonde convolution and harmonic number identities, Integral Transforms Spec. Funct., 24, 324-330 (2013) · Zbl 1269.05010
[64] Knuth, D., (The Art of Computer Programming. The Art of Computer Programming, Vol. 1: Fundamental Algorithms (1997), Addison-Wesley) · Zbl 0895.68055
[65] Rösler, U., A limit theorem for quicksort, Inform. Theor. Appl., 25, 85-100 (1991) · Zbl 0718.68026
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.