×

On the consistency of the minimum evolution principle of phylogenetic inference. (English) Zbl 1023.62109

Summary: The goal of phylogenetic inference is the reconstruction of the evolutionary history of various biological entities (taxa) such as genes, proteins, viruses or species. Phylogenetic inference is of major importance in computational biology and has numerous applications ranging from the study of biodiversity to sequence analysis. Given a matrix of pairwise distances between taxa, the minimum evolution (ME) principle consists in selecting the tree whose length is minimal, where the tree length is estimated within the least-squares framework. The ME principle has been shown to be statistically consistent when using the ordinary least-squares criterion (OLS) and inconsistent with the more general weighted least-squares criterion (WLS). Unfortunately, the OLS + ME inference method can provide poor results since the variances of the input data are not taken into account.
We study a model which lies between OLS and WLS, classical in statistics and data analysis, and we prove that the ME principle is statistically consistent within this model. Our proof is inductive and relies on a time optimal recursive algorithm for estimating edge lengths. As a corollary, we obtain a different and simpler proof of the consistency result for OLS + ME.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92D15 Problems related to evolution
Full Text: DOI

References:

[1] Bandelt, H. J.; Dress, A., Reconstructing the shape of a tree from observed dissimilarity data, Adv. Appl. Math., 7, 309-343 (1986) · Zbl 0613.62083
[2] Bandelt, H. J.; Steel, M., Symmetric matrices representable by weighted trees over a cancellative abelian monoid, SIAM J. Discrete Math., 8, 517-525 (1995) · Zbl 0842.05024
[3] Bryant, D.; Waddell, P., Rapid evaluation of least-squares and minimum-evolution criteria on phylogenetic trees, Mol. Biol. Evol., 7, 1346-1359 (1998)
[4] Bulmer, M., Use of the method of generalized least squares in reconstructing phylogenies from sequence data, Mol. Biol. Evol., 8, 868-883 (1991)
[5] Buneman, P., The recovery of trees from measures of dissimilarity, (Hodson, F. R.; Kendall, D. G.; Tautu, P., Mathematics in the Archeological and Historical Sciences (1971), Edinburgh University Press: Edinburgh University Press Edinburgh), 387-395
[6] Buneman, P., A note on metric properties of trees, J. Combin. Theory, 17, B, 48-50 (1974) · Zbl 0286.05102
[7] Gascuel, O., Concerning the NJ algorithm and its unweighted version, UNJ, (Mirkin, B.; McMorris, F. R.; Roberts, . S.; Rzhetsky, A., Mathematical Hierarchies and Biology (1997), American Mathematical Society: American Mathematical Society Providence, RI), 149-170 · Zbl 0933.92026
[8] Gascuel, O., On the optimization principle in phylogenetic analysis and the minimum evolution criterion, Mol. Biol. Evol., 17, 401-405 (2000)
[9] O. Gascuel, D. Bryant, F. Denis, Strengths and limitations of the minimum-evolution principle, Syst. 50 (2001) 621-627.; O. Gascuel, D. Bryant, F. Denis, Strengths and limitations of the minimum-evolution principle, Syst. 50 (2001) 621-627.
[10] Kid, K. K.; Sgaramella-Zonta, L. A., Phylogenetic analysisconcepts and methods, Am. J. Human Genet., 23, 235-252 (1971)
[11] Lawson, C. M.; Hanson, R. J., Solving Least Squares Problems (1974), Prentice-Hall: Prentice-Hall Englewood Cliffs, NJ · Zbl 0860.65028
[12] Makarenkov, V.; Leclerc, B., An algorithm for the fitting of a tree metric according to a weighted least-squares criterion, J. Classif., 16, 3-26 (1999) · Zbl 0983.92020
[13] Rzhetsky, A.; Nei, M., Theoretical foundation of the minimum-evolution method of phylogenetic inference, Mol. Biol. Evol., 10, 1073-1095 (1993)
[14] Saitou, N.; Nei, M., The neighbor-joining methoda new method for reconstruction of phylogenetic trees, Mol. Biol. Evol., 4, 406-425 (1987)
[15] Sattath, S.; Tversky, A., Additive similarity trees, Psychometrika, 42, 319-345 (1977)
[16] Searl, S. R., Linear Models (1971), Wiley: Wiley New York · Zbl 0218.62071
[17] Simoes-Pereira, J. M.S., A note on tree realizability of a distance matrix, J. Combin. Theory, 6, B, 303-310 (1969) · Zbl 0177.26903
[18] Smolenskii, Y. A., A method for linear recording of graphs, USSR Comput. Math. Math. Phys., 2, 396-397 (1969)
[19] D.L. Swofford, G.L. Olsen, P.J. Waddell, D.M. Hillis, Phylogenetic inference, in: D.M. Hillis, C. Moritz, B.K. Mable (Eds.), Molecular Sytematics, 2nd Edition, Sinauer, Sunderland, MA, 1996 (Chapter 11).; D.L. Swofford, G.L. Olsen, P.J. Waddell, D.M. Hillis, Phylogenetic inference, in: D.M. Hillis, C. Moritz, B.K. Mable (Eds.), Molecular Sytematics, 2nd Edition, Sinauer, Sunderland, MA, 1996 (Chapter 11).
[20] Vach, W., Least-squares approximation of additive trees, (Opitz, O., Conceptual and Numerical Analysis of Data (1989), Springer: Springer Heidelberg), 230-238
[21] Zarestkii, K., Reconstructing a tree from the distances between its leaves, Uspekhi Mat. Nauk, 20, 90-92 (1965), (in Russian) · Zbl 0151.33302
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.