×

Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. (English) Zbl 1059.92504

Summary: A Markov model of evolution of characters on a phylogenetic tree consists of a tree topology together with a specification of probability transition matrices on the edges of the tree. Previous work has shown that, under mild conditions, the tree topology may be reconstructed, in the sense that the topology is identifiable from knowledge of the joint distribution of character states at pairs of terminal nodes of the tree. Also, the method of maximum likelihood is statistically consistent for inferring the tree topology. In this article we answer the analogous questions for reconstructing the full model, including the edge transition matrices. Under mild conditions, such full reconstruction is achievable, not by using pairs of terminal nodes, but rather by using triples of terminal nodes. The identifiability result generalizes previous results that were restricted either to characters having two states or to transition matrices having special structure. The proof develops matrix relationships that may be exploited to identify the model. We also use the identifiability result to prove that the method of maximum likelihood is consistent for reconstructing the full model.

MSC:

92D15 Problems related to evolution
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
Full Text: DOI

References:

[1] Cavender, J. A., Taxonomy with Confidence, Math. Biosci., 40, 271-280 (1978) · Zbl 0391.92015
[2] Jukes, T. H.; Cantor, C. R., Evolution of protein molecules, (Mammalian Protein Metabolism (1969), Academic Press: Academic Press New York), 21-132
[3] Kimura, M., A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences, J. Mol. Evol., 16, 111-120 (1980)
[4] Tajima, F.; Nei, M., Estimation of evolutionary distance between nucleotide sequences, Mol. Biol. Evol., 1, 269-285 (1984)
[5] Barry, D.; Hartigan, J. A., Asynchronous distance between homologous DNA sequences, Biometrics, 43, 261-276 (1987) · Zbl 0622.92012
[6] Rzhetsky, A.; Nei, M., Tests of applicability of several substitution models for DNA sequence data, Mol. Biol. Evol., 12, 131-151 (1995)
[7] Chang, J. T.; Hartigan, J. A., Reconstruction of evolutionary trees from pairwise distributions on current species, (Keramidas, E. M., Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (1991), Interface Foundation: Interface Foundation Fairfax Station, VA), 254-257
[8] Steel, M.; Hendy, M. D.; Penny, D., Invertible models of sequence evolution, (Mathematical and Information Science report 93/02 (1993), Massey University: Massey University Palmerston North, New Zealand)
[9] Steel, M.; Hendy, M. D.; Penny, D., Reconstructing evolutionary trees from nucleotide pattern probabilities, (Forschungsscherpunkt Mathematisierung-Strukturbildungsporozesse (1995), Bielefeld University), Preprint XCIV · Zbl 0940.92018
[10] Pearl, J.; Tarsi, M., Structuring causal trees, J. Complex., 2, 60-77 (1986) · Zbl 0589.68060
[11] Hendy, D., The relationship between simple evolutionary tree models and observable sequence data, Syst. Zool., 38, 310-321 (1989)
[12] Lazarsfeld, P.; Henry, N., Latent Structure Analysis (1968), Houghton Mifflin: Houghton Mifflin Boston · Zbl 0182.52201
[13] Felsenstein, J., Cases in which parsimony and compatibility methods will be positively misleading, Syst. Zool., 27, 401-410 (1978)
[14] Felsenstein, J., Maximum likelihood and minimum-steps method for estimating evolutionary trees from data on discrete characters, Syst. Zool., 22, 240-249 (1973)
[15] Sober, E., Reconstructing the Past: Parsimony, Evolution, and Inference (1988), MIT Press: MIT Press Cambridge
[16] Buneman, P., The recovery of trees from measures of dissimilarity, (Mathematics in the Archaeological and Historical Sciences (1971), Edinburgh University Press: Edinburgh University Press Edinburgh), 387-395
[17] Cavender, J. A.; Felsenstein, J., Invariants of phylogenies in a simple case with discrete states, J. Class., 4, 57-71 (1987) · Zbl 0612.62142
[18] Felsenstein, J., Statistical inference of phylogenies, J. Roy. Statist. Soc. Ser. A, 146, 246-272 (1983) · Zbl 0528.62090
[19] Barry, D.; Hartigan, J. A., Statistical analysis of hominoid molecular evolution, Statist. Sci., 2, 191-210 (1987) · Zbl 0639.92010
[20] Bandelt, H.; Dress, A., Reconstructing the shape of a tree from observed dissimilarity data, Adv. Appl. Math., 7, 309-343 (1986) · Zbl 0613.62083
[21] Wald, A., Note on the consistency of the maximum likelihood estimate, Ann. Math. Statist., 20, 595-601 (1949) · Zbl 0034.22902
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.