×

Forest alignment with affine gaps and anchors, applied in RNA structure comparison. (English) Zbl 1292.68184

Summary: We present two enhancements to Jiang’s tree alignment algorithm, motivated by experience with its use for RNA structure alignment. One enhancement is the introduction of an affine gap model, which can be accommodated with a runtime increase by a constant factor. The second enhancement is a speed-up of the alignment algorithm when certain nodes in the trees are pre-aligned by a so-called anchoring. Both enhancements are included in a new implementation of the tool RNAforester. We evaluate the new algorithm with two applications related to RNA secondary structure analysis. Based on our experience, we suggest a new formulation of the tree alignment model, based on regular tree languages and rewrite rules.

MSC:

68W32 Algorithms on strings
68Q42 Grammars and rewriting systems
68Q45 Formal languages and automata
68R10 Graph theory (including graph drawing) in computer science
92D20 Protein sequences, DNA sequences

Software:

RNAforester; MUMMER
Full Text: DOI

References:

[1] Jiang, T.; Wang, L.; Zhang, K., Alignment of trees — an alternative to tree edit, Theoretical Computer Science, 143, 1, 137-148 (1995) · Zbl 0873.68150
[2] Höchsmann, M.; Töller, T.; Giegerich, R.; Kurtz, S., Local similarity in RNA secondary structures, CSB 2003. CSB 2003, Proceedings of the IEEE Computational Systems Bioinformatics Conference, 2, 159-168 (2003)
[3] Höchsmann, M.; Voss, B.; Giegerich, R., Pure multiple RNA secondary structure alignments: a progressive profile approach, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1, 53-62 (2004)
[4] Hofacker, I.; Fontana, W.; Stadler, P.; Bonhoeffer, L.; Tacker, M.; Schuster, P., Fast folding and comparison of RNA secondary structures, Monatshefte für Chemie, 125, 167-188 (1994)
[5] Ritchie, W.; Legendre, M.; Gautheret, D., RNA stem loops: to be or not to be cleaved by RNAse III, RNA, 13, 4, 457-462 (2007)
[6] Zhang, K.; Shasha, D., Simple fast algorithms for the editing distance between trees and related problems, SIAM Journal of Computing, 18, 6, 1245-1262 (1989) · Zbl 0692.68047
[7] Bremges, A.; Schirmer, S.; Giegerich, R., Fine-tuning structural RNA alignments in the twilight zone, BMC Bioinformatics, 11, 1, 222 (2010)
[9] Rosselló, F.; Valiente, G., An algebraic view of the relation between largest common subtrees and smallest common supertrees, Theoretical Computer Science, 362, 1, 33-53 (2006) · Zbl 1111.68091
[10] Touzet, H., Tree edit distance with gaps, Information Processing Letters, 85, 3, 123-129 (2003) · Zbl 1011.68851
[12] Lozano, A.; Pinter, R.; Rokhlenko, O.; Valiente, G.; Ziv-Ukelson, M., Seeded tree alignment, IEEE Transactions on Computational Biology and Bioinformatics, 503-513 (2008)
[13] Jiang, T.; Lin, G.; Ma, B.; Zhang, K., A general edit distance between RNA structures, Journal of computational biology, 9, 2, 371-388 (2002)
[15] Möhl, M.; Will, S.; Backofen, R., Fixed parameter tractable alignment of RNA structures including arbitrary pseudoknots, (Combinatorial Pattern Matching (2008), Springer), 69-81 · Zbl 1143.92312
[16] Backofen, R.; Landau, G.; Möhl, M.; Tsur, D.; Weimann, O., Fast RNA structure alignment for crossing input structures, (Combinatorial Pattern Matching (2009), Springer), 236-248 · Zbl 1247.68105
[17] Sankoff, D., Simultaneous solution of the RNA folding, alignment and protosequence problems, SIAM Journal of Applied Mathematics, 45, 5, 810-825 (1985) · Zbl 0581.92012
[18] Hoechsmann, M.; Toeller, T.; Giegerich, R.; Kurtz, S., Local similarity in RNA secondary structures, CSB 2003. CSB 2003, Proceedings of the IEEE Computational Systems Bioinformatics Conference, 2, 159-168 (2003)
[20] Gotoh, O., An improved algorithm for matching biological sequences, Journal of Molecular Biology, 162, 3, 705-708 (1982)
[21] Delcher, A.; Kasif, S.; Fleischmann, R.; Peterson, J.; White, O.; Salzberg, S., Alignment of whole genomes, Nucleic Acids Research, 27, 11, 2369-2376 (1999)
[22] Abouelhoda, M.; Ohlebusch, E., Multiple genome alignment: chaining algorithms revisited, (Combinatorial Pattern Matching (2003), Springer), 1-16 · Zbl 1279.92062
[23] Giegerich, R.; Voss, B.; Rehmsmeier, M., Abstract shapes of RNA, Nucleic Acids Research, 32, 16, 4843-4851 (2004)
[24] Unger, S., A global parser for context-free phrase structure grammars, Communications of the ACM, 11, 4, 240-247 (1968) · Zbl 0159.45509
[25] Younger, D., Recognition and parsing of context-free languages in time \(n^3\), Information and Control, 10, 2, 189-208 (1967) · Zbl 0149.24803
[26] Reeder, J.; Giegerich, R., Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction, Bioinformatics, 21, 17, 3516-3523 (2005)
[27] Gardner, P.; Giegerich, R., A comprehensive comparison of comparative RNA structure prediction approaches, BMC Bioinformatics, 5, 1, 140 (2004)
[28] Gardner, P. P.; Wilm, A.; Washietl, S., A benchmark of multiple sequence alignment programs upon structural RNAs, Nucleic Acids Research, 33, 8, 2433-2439 (2005)
[29] Washietl, S.; Hofacker, I., Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics, Journal of Molecular Biology, 342, 19-30 (2004)
[30] Giegerich, R.; Meyer, C.; Steffen, P., A discipline of dynamic programming over sequence data, Science of Computer Programming, 51, 3, 215-263 (2004) · Zbl 1067.90157
[31] Sauthoff, G.; Janssen, S.; Giegerich, R., Bellman’s GAP: a declarative language for dynamic programming, (Proceedings of the 13th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming (2011), ACM), 29-40
[32] Giegerich, R.; Höner zu Siederdissen, C., Semantics and ambiguity of stochastic RNA family models, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 499-516 (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.