×

Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. (English) Zbl 1130.92039

Summary: The general Markov plus invariable sites \((\text{GM} + \text{I})\) model of biological sequence evolution is a two-class model in which an unknown proportion of sites are not allowed to change, while the remainder undergo substitutions according to a Markov process on a tree. For statistical use it is important to know if the model is identifiable; can both the tree topology and the numerical parameters be determined from a joint distribution describing sequences only at the leaves of the tree?
We establish that for generic parameters both the tree and all numerical parameter values can be recovered, up to clearly understood issues of ‘label swapping’. The method of analysis is algebraic, using phylogenetic invariants to study the variety defined by the model. Simple rational formulas, expressed in terms of determinantal ratios, are found for recovering numerical parameters describing the invariable sites.

MSC:

92D15 Problems related to evolution
93B30 System identification
05C90 Applications of graph theory
68W30 Symbolic computation and algebraic computation
60J99 Markov processes

Software:

SINGULAR

References:

[1] Buneman, P., The recovery of trees from measures of dissimilarity, (Mathematics in the Archeological and Historical Sciences (1971), Edinburgh University Press: Edinburgh University Press Edinburgh), 387
[2] Steel, M., Recovering a tree from the leaf colourations it generates under a Markov model, Appl. Math. Lett., 7, 2, 19 (1994) · Zbl 0794.60071
[3] Chang, J. T., Full reconstruction of Markov models on evolutionary trees: identifiability and consistency, Math. Biosci., 137, 1, 51 (1996) · Zbl 1059.92504
[4] Gu, X.; Li, W.-H., A general additive distance with time-reversibility and rate variation among sites, PNAS, 93, 4671 (1996)
[5] Waddell, P. J.; Steel, M., General time-reversible distances with unequal rates across sites: mixing \(Γ\) and inverse Gaussian distributions with invariant sites, Mol. Phylo. Evol., 8, 3, 398 (1997)
[6] Rogers, J. S., Maximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution, Syst. Biol., 50, 5, 713 (2001)
[7] C. Ané, personal communication (2005).; C. Ané, personal communication (2005).
[8] Allman, E. S.; Rhodes, J. A., The identifiability of tree topology for phylogenetic models, including covarion and mixture models, J. Comput. Biol., 13, 5, 1101 (2006)
[9] Tuffley, C.; Steel, M., Modeling the covarion hypothesis of nucleotide substitution, Math. Biosci., 147, 1, 63 (1998) · Zbl 0897.92025
[10] Baake, E., What can and what cannot be inferred from pairwise sequence comparisons?, Math. Biosci., 154, 1, 1 (1998) · Zbl 0940.92017
[11] Steel, M.; Huson, D.; Lockhart, P. J., Invariable sites models and their uses in phylogeny reconstruction, Syst. Biol., 49, 2, 225 (2000)
[12] Huson, D. H.; Bryant, D., Application of phylogenetic networks in evolutionary studies, Mol. Biol. Evol., 23, 2, 254 (2006)
[13] Jayaswal, V.; Robinson, J.; Jermiin, L., Estimation of phylogeny and invariant sites under the general Markov model of nucleotide sequence evolution, Syst. Biol., 56, 2, 155 (2007)
[14] Steel, M.; Székely, L.; Hendy, M., Reconstructing trees from sequences whose sites evolve at variable rates, J. Comput. Biol., 1, 2, 153 (1994)
[15] Allman, E. S.; Rhodes, J. A., Phylogenetic invariants for the general Markov model of sequence mutation, Math. Biosci., 186, 113 (2003) · Zbl 1031.92019
[16] E.S. Allman, J.A. Rhodes, Phylogenetic ideals and varieties for the general Markov model, Adv. Appl. Math. To appear, <arXiv:math.AG/0410604>; E.S. Allman, J.A. Rhodes, Phylogenetic ideals and varieties for the general Markov model, Adv. Appl. Math. To appear, <arXiv:math.AG/0410604> · Zbl 1131.92046
[17] Cavender, J. A.; Felsenstein, J., Invariants of phylogenies in a simple case with discrete states, J. Class., 4, 57 (1987) · Zbl 0612.62142
[18] Lake, J., A rate independent technique for analysis of nucleic acid sequences: evolutionary parsimony, Mol. Bio. Evol., 4, 2, 167 (1987)
[19] Allman, E. S.; Rhodes, J. A., Phylogenetic invariants, (Gascuel, O.; Steel, M., New Mathematical Models of Evolution (2007), Oxford University), 108-147
[20] Hagedorn, T. R., Determining the number and structure of phylogenetic invariants, Adv. Appl. Math., 24, 1, 1 (2000) · Zbl 1033.92023
[21] G.-M. Greuel, G. Pfister, H. Schönemann, singular<http://www.singular.uni-kl.de>; G.-M. Greuel, G. Pfister, H. Schönemann, singular<http://www.singular.uni-kl.de>
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.