×

Global alignment of molecular sequences via ancestral state reconstruction. (English) Zbl 1250.92034

Summary: We consider the trace reconstruction problem on a tree (TRPT): a binary sequence is broadcast through a tree channel where we allow substitutions, deletions, and insertions; we seek to reconstruct the original sequence from the sequences received at the leaves. The TRPT is motivated by the multiple sequence alignment problem in computational biology. We give a simple recursive procedure giving strong reconstruction guarantees at low mutation rates. To our knowledge, this is the first rigorous trace reconstruction result on a tree in the presence of indels.

MSC:

92D15 Problems related to evolution
92C42 Systems biology, networks
60J85 Applications of branching processes
92-08 Computational methods for problems pertaining to biology

References:

[1] A. Andoni, M. Braverman, A. Hassidim, Phylogenetic reconstruction with insertions and deletions, Preprint, 2010.; A. Andoni, M. Braverman, A. Hassidim, Phylogenetic reconstruction with insertions and deletions, Preprint, 2010.
[2] A. Andoni, C. Daskalakis, A. Hassidim, S. Roch, Global alignment of molecular sequences via ancestral state reconstruction, in: ICS, 2010.; A. Andoni, C. Daskalakis, A. Hassidim, S. Roch, Global alignment of molecular sequences via ancestral state reconstruction, in: ICS, 2010. · Zbl 1250.92034
[3] Andoni, A.; Krauthgamer, R., The smoothed complexity of edit distance, Lecture Notes in Comput. Sci., 5125, 357-369 (2008) · Zbl 1153.68563
[4] Batu, T.; Kannan, S.; Khanna, S.; McGregor, A., Reconstructing strings from random traces, (SODA’04: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (2004), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelphia, PA, USA), 910-918 · Zbl 1318.68206
[5] Berger, N.; Kenyon, C.; Mossel, E.; Peres, Y., Glauber dynamics on trees and hyperbolic graphs, Probab. Theory Related Fields, 131, 3, 311-340 (2005), Extended abstract by Kenyon, Mossel and Peres appeared in Proceedings of 42nd IEEE Symposium on Foundations of Computer Science, FOCS, 2001, 568-578 · Zbl 1075.60003
[6] Bhatnagar, N.; Vera, J. C.; Vigoda, E.; Weitz, D., Reconstruction for colorings on trees, SIAM J. Discrete Math., 25, 2, 809-826 (2011) · Zbl 1298.05060
[7] Bleher, P. M.; Ruiz, J.; Zagrebnov, V. A., On the purity of the limiting Gibbs state for the Ising model on the Bethe lattice, J. Stat. Phys., 79, 1-2, 473-482 (1995) · Zbl 1081.82515
[8] C. Borgs, J.T. Chayes, E. Mossel, S. Roch, The Kesten-Stigum reconstruction bound is tight for roughly symmetric binary channels., in: FOCS, 2006, pp. 518-530.; C. Borgs, J.T. Chayes, E. Mossel, S. Roch, The Kesten-Stigum reconstruction bound is tight for roughly symmetric binary channels., in: FOCS, 2006, pp. 518-530.
[9] Cavender, J. A., Taxonomy with confidence, Math. Biosci., 40, 3-4 (1978) · Zbl 0391.92015
[10] Daskalakis, C.; Mossel, E.; Roch, S., Optimal phylogenetic reconstruction, (STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing (2006), ACM: ACM New York), 159-168 · Zbl 1301.92054
[11] C. Daskalakis, S. Roch, Alignment-free phylogenetic reconstruction, in: RECOMB, 2010, pp. 123-137.; C. Daskalakis, S. Roch, Alignment-free phylogenetic reconstruction, in: RECOMB, 2010, pp. 123-137.
[12] Edgar, R. C., MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res., 32, 5, 1792-1797 (2004), URL: http://nar.oxfordjournals.org/cgi/content/abstract/32/5/1792
[13] Elias, I., Settling the intractability of multiple alignment, J. Comput. Biol., 13, 7, 1323-1339 (2006), pMID: 17037961. URL: http://www.liebertonline.com/doi/abs/10.1089/cmb.2006.13.1323
[14] Evans, W. S.; Kenyon, C.; Peres, Y.; Schulman, L. J., Broadcasting on trees and the Ising model, Ann. Appl. Probab., 10, 2, 410-433 (2000) · Zbl 1052.60076
[15] Farris, J. S., A probability model for inferring evolutionary trees, Syst. Zool., 22, 4, 250-256 (1973)
[16] Gerschenfeld, A.; Montanari, A., Reconstruction for models on random graphs, (FOCS’07: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (2007), IEEE Computer Society: IEEE Computer Society Washington, DC, USA), 194-204
[17] Higgins, D. G.; Sharp, P. M., Clustal: a package for performing multiple sequence alignment on a microcomputer, Gene, 73, 1, 237-244 (1988)
[18] Higuchi, Y., Remarks on the limiting Gibbs states on a \((d + 1)\)-tree, Publ. Res. Inst. Math. Sci., 13, 2, 335-348 (1977) · Zbl 0376.60096
[19] Holenstein, T.; Mitzenmacher, M.; Panigrahy, R.; Wieder, U., Trace reconstruction with constant deletion probability and related results, (SODA’08: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms (2008), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelphia, PA, USA), 389-398 · Zbl 1192.94064
[20] Ioffe, D., On the extremality of the disordered state for the Ising model on the Bethe lattice, Lett. Math. Phys., 37, 2, 137-143 (1996) · Zbl 0848.60090
[21] Janson, S.; Mossel, E., Robust reconstruction on trees is determined by the second eigenvalue, Ann. Probab., 32, 2630-2649 (2004), URL: http://www.stat.berkeley.edu/mossel/publications/robust.pdf · Zbl 1061.60105
[22] S. Kannan, A. McGregor, More on reconstructing strings from random traces: insertions and deletions, in: Proceedings of ISIT, 2005, pp. 297-301.; S. Kannan, A. McGregor, More on reconstructing strings from random traces: insertions and deletions, in: Proceedings of ISIT, 2005, pp. 297-301.
[23] Katoh, K.; Misawa, K.; Kuma, K.-i.; Miyata, T., MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Res., 30, 14, 3059-3066 (2002), URL: http://nar.oxfordjournals.org/cgi/content/abstract/30/14/3059
[24] Kesten, H.; Stigum, B. P., Additional limit theorems for indecomposable multidimensional Galton-Watson processes, Ann. Math. Statist., 37, 1463-1481 (1966) · Zbl 0203.17402
[25] Levenshtein, V. I., Efficient reconstruction of sequences, IEEE Trans. Inform. Theory, 47, 1, 2-22 (2001) · Zbl 1029.94019
[26] Levenshtein, V. I., Efficient reconstruction of sequences from their subsequences or supersequences, J. Comb. Theory Ser. A, 93, 2, 310-332 (2001) · Zbl 0992.68155
[27] Liu, K.; Raghavan, S.; Nelesen, S.; Linder, C. R.; Warnow, T., Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees, Science, 324, 5934, 1561-1564 (2009), URL: http://www.sciencemag.org/cgi/content/abstract/324/5934/1561
[28] Loytynoja, A.; Goldman, N., Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis, Science, 320, 5883, 1632-1635 (2008), URL: http://www.sciencemag.org/cgi/content/abstract/320/5883/1632
[29] Martinelli, F.; Sinclair, A.; Weitz, D., Glauber dynamics on trees: boundary conditions and mixing time, Comm. Math. Phys., 250, 2, 301-334 (2004) · Zbl 1076.82010
[30] Metzler, D., Statistical alignment based on fragment insertion and deletion models, Bioinformatics, 19, 4, 490-499 (2003), URL: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/19/4/490
[31] R. Mihaescu, C. Hill, S. Rao, Fast phylogeny reconstruction through learning of ancestral sequences, CoRR abs/0812.1587.; R. Mihaescu, C. Hill, S. Rao, Fast phylogeny reconstruction through learning of ancestral sequences, CoRR abs/0812.1587. · Zbl 1266.68237
[32] Miklos, I.; Lunter, G. A.; Holmes, I., A “Long Indel” model for evolutionary sequence alignment, Mol. Biol. Evol., 21, 3, 529-540 (2004), URL: http://mbe.oxfordjournals.org/cgi/content/abstract/21/3/529
[33] Mossel, E., Recursive reconstruction on periodic trees, Random Structures Algorithms, 13, 1, 81-97 (1998), URL: http://www.stat.berkeley.edu/mossel/publications/recursive.ps · Zbl 0959.05112
[34] Mossel, E., Reconstruction on trees: beating the second eigenvalue, Ann. Appl. Probab., 11, 1, 285-300 (2001), URL: http://www.stat.berkeley.edu/mossel/publications/second.ps · Zbl 1021.90008
[35] Mossel, E., On the impossibility of reconstructing ancestral data and phylogenies, J. Comput. Biol., 10, 5, 669-678 (2003)
[36] Mossel, E., Phase transitions in phylogeny, Trans. Amer. Math. Soc., 356, 6, 2379-2404 (2004) · Zbl 1041.92018
[37] Mossel, E.; Peres, Y., Information flow on trees, Ann. Appl. Probab., 13, 3, 817-844 (2003) · Zbl 1050.60082
[38] Navarro, G., A guided tour to approximate string matching, ACM Comput. Surv. (CSUR), 33, 1, 31-88 (2001)
[39] G. Navarro, R. Baeza-Yates, E. Sutinen, J. Tarhio, Indexing methods for approximate string matching, Bulletin of the Technical Committee on 19.; G. Navarro, R. Baeza-Yates, E. Sutinen, J. Tarhio, Indexing methods for approximate string matching, Bulletin of the Technical Committee on 19.
[40] Neyman, J., Molecular studies of evolution: a source of novel statistical problems, (Gupta, S. S.; Yackel, J., Statistical Desicion Theory and Related Topics (1971), Academic Press: Academic Press New York), 1-27 · Zbl 0231.62010
[41] C. Notredame, D. Higgins, J. Heringa, T-coffee: a novel method for fast and accurate multiple sequence alignment.; C. Notredame, D. Higgins, J. Heringa, T-coffee: a novel method for fast and accurate multiple sequence alignment.
[42] Peres, Y.; Roch, S., Reconstruction on trees: exponential moment bounds for linear estimators, Electron. Comm. Probab., 16, 251-261 (2011), (electronic) · Zbl 1237.60068
[43] Rivas, E.; Eddy, S. R., Probabilistic phylogenetic inference with insertions and deletions, PLoS Comput. Biol., 4, 9, e1000172 (2008)
[44] S. Roch, Markov models on trees: reconstruction and applications, Ph.D. Thesis, UC Berkeley, 2007.; S. Roch, Markov models on trees: reconstruction and applications, Ph.D. Thesis, UC Berkeley, 2007.
[45] S. Roch, Sequence-length requirement for distance-based phylogeny reconstruction: Breaking the polynomial barrier, in: FOCS, 2008, pp. 729-738.; S. Roch, Sequence-length requirement for distance-based phylogeny reconstruction: Breaking the polynomial barrier, in: FOCS, 2008, pp. 729-738.
[46] Roch, S., Toward extracting all phylogenetic information from matrices of evolutionary distances, Science, 327, 5971, 1376-1379 (2010), URL: http://dx.doi.org/10.1126/science.1182300 · Zbl 1226.92058
[47] Sankoff, D., Minimal mutation trees of sequences, SIAM J. Appl. Math., 28, 1, 35-42 (1975), URL: http://link.aip.org/link/?SMM/28/35/1 · Zbl 0315.05101
[48] Semple, C.; Steel, M., (Phylogenetics. Phylogenetics, Mathematics and its Applications Series, vol. 22 (2003), Oxford University Press) · Zbl 1043.92026
[49] Sly, A., Reconstruction for the Potts model, (STOC’09: Proceedings of the 41st Annual ACM Symposium on Theory of Computing (2009), ACM: ACM New York, NY, USA), 581-590 · Zbl 1304.68137
[50] Sly, A., Reconstruction of random colourings, Comm. Math. Phys., 288, 3, 943-961 (2009) · Zbl 1274.60163
[51] A. Sly, Spatial and temporal mixing of Gibbs measures, Ph.D. Thesis, UC Berkeley, 2009.; A. Sly, Spatial and temporal mixing of Gibbs measures, Ph.D. Thesis, UC Berkeley, 2009.
[52] Suchard, M. A.; Redelings, B. D., BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny, Bioinformatics, 22, 16, 2047-2048 (2006), URL: http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/16/2047
[53] Thorne, J. L.; Kishino, H.; Felsenstein, J., An evolutionary model for maximum likelihood alignment of DNA sequences, J. Mol. Evol., 33, 2, 114-124 (1991)
[54] Thorne, J. L.; Kishino, H.; Felsenstein, J., Inching toward reality: an improved likelihood model of sequence evolution, J. Mol. Evol., 34, 1, 3-16 (1992)
[55] Thornton, J. W., Resurrecting ancient genes: experimental analysis of extinct molecules, Nat. Rev. Genet., 5, 5, 366-375 (2004)
[56] Viswanathan, K.; Swaminathan, R., Improved string reconstruction over insertion-deletion channels, (SODA’08: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms (2008), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelphia, PA, USA), 399-408 · Zbl 1192.94072
[57] Wang, L.; Jiang, T., On the complexity of multiple sequence alignment, J. Comput. Biol., 1, 4, 337-348 (1994)
[58] Wang, L.; Jiang, T.; Lawler, E. L., Approximation algorithms for tree alignment with a given phylogeny, Algorithmica, 16, 3, 302-315 (1996) · Zbl 0862.68119
[59] Wong, K. M.; Suchard, M. A.; Huelsenbeck, J. P., Alignment uncertainty and genomic analysis, Science, 319, 5862, 473-476 (2008), URL: http://www.sciencemag.org/cgi/content/abstract/319/5862/473 · Zbl 1226.92028
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.