Summary
Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed.
Similar content being viewed by others
References
Allison L, Yee CN (1990) Minimum message length and the comparison of macromoecules. Bull Math Biol 52:431–453
Bishop MJ, Thompson EA (1986) Maximum likelihood alignment of DNA sequences. J Mol Biol 190:159–165
Feller W (1968) An introduction to probability theory and its applications, vol I, 3rd ed. McGraw-Hill, New York, pp 480–481
Felsenstein J (1981a) A likelihood approach to character weighting and what it tells us about parsimony and compatibility. Biol J Linn Soc 16:183–196
Felsenstein J (1981b) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Fitch WM, Smith TF (1983) Optimal sequence alignments. Proc Natl Acad Sci USA 80:1382–1386
Gotoh O (1982) An improved algorithms for matching biological sequences. J Mol Biol 162:705–708
Hasegawa M, Kishino H, Yano T (1985) Dating of the humanape spliting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174
Hein J (1990) A unified approach to alignment and phylogenies. In: Doolittle RF (ed) Methods in enzymology, vol 183. Academic Press, San Diego, pp 626–645
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic Press, New York, pp 21–132
Kendall M, Stuart A (1973) The advanced theory of statistics, vol 2, ed 3. Charles Griffen, London, pp 45–46
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequences of two proteins. J Mol Biol 48:444–453
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7:308–313
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C. Cambridge University Press, New York, pp 305–309
Reichert TA, Cohen DN, Wong AKC (1973) An application of information theory to genetic mutations and the matching of polypeptide sequences. J Theor Biol 42:245–261
Sankoff D, Kruskal JB (eds) (1983) Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley, Reading MA
Schaaper RM, Danforth BN, Glickman BW (1986) Mechanisms of spontaneous mutagenesis: an analysis of the spectrum of spontaneous mutation inEscherichia coli lacI gene. J Mol Biol 189:273–284
Waterman MS (1983) Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. Proc Natl Acad Sci USA 80:3123–3124
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Thorne, J.L., Kishino, H. & Felsenstein, J. An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33, 114–124 (1991). https://doi.org/10.1007/BF02193625
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02193625