×

The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations. (English) Zbl 1368.05023

Summary: The concepts of orthology, paralogy, and xenology play a key role in molecular evolution. Orthology and paralogy distinguish whether a pair of genes originated by speciation or duplication. The corresponding binary relations on a set of genes form complementary cographs. Allowing more than two types of ancestral event types leads to symmetric symbolic ultrametrics. Horizontal gene transfer, which leads to xenologous gene pairs, however, is inherent asymmetric since one offspring copy “jumps” into another genome, while the other continues to be inherited vertically. We therefore explore here the mathematical structure of the non-symmetric generalization of symbolic ultrametrics. Our main results tie non-symmetric ultrametrics together with di-cographs (the directed generalization of cographs), so-called uniformly non-prime (unp) 2-structures, and hierarchical structures on the set of strong modules. This yields a characterization of relation structures that can be explained in terms of trees and types of ancestral events. This framework accommodates a horizontal-transfer relation in terms of an ancestral event and thus, is slightly different from the the most commonly used definition of xenology. As a first step towards a practical use, we present a simple polynomial-time recognition algorithm of unp 2-structures and investigate the computational complexity of several types of editing problems for unp 2-structures. We show, finally that these NP-complete problems can be solved exactly as integer linear programs.

MSC:

05C05 Trees
05C90 Applications of graph theory
92D15 Problems related to evolution
68Q17 Computational difficulty of problems (lower bounds, completeness, difficulty of approximation, etc.)

Software:

Proteinortho

References:

[1] Altenhoff AM, Boeckmann B, Capella-Gutierrez S, Dalquen DA, DeLuca T, Forslund K, Huerta-Cepas J, Linard B, Pereira C, Pryszcz LP, Schreiber F, da Silva AS, Szklarczyk D, Train CM, Bork P, Lecompte O, von Mering C, Xenarios I, Sjölander K, Jensen LJ, Martin MJ, Muffato M, Gabaldón T, Lewis SE, Thomas PD, Sonnhammer E, Dessimoz C (2016) Standardized benchmarking in the quest for orthologs. Nat Methods 13(5):425-430 · doi:10.1038/nmeth.3830
[2] Böcker S, Dress AWM (1998) Recovering symbolically dated, rooted trees from symbolic ultrametrics. Adv Math 138:105-125 · Zbl 0912.05031 · doi:10.1006/aima.1998.1743
[3] Brandstädt A, Le VB, Spinrad JP (1999) Graph classes: a survey. Society for Industrial and Applied Mathematics, Philadelphia · Zbl 0919.05001 · doi:10.1137/1.9780898719796
[4] Corneil DG, Lerchs H, Burlingham Steward L (1981) Complement reducible graphs. Discr. Appl. Math. 3:163-174 · Zbl 0463.05057 · doi:10.1016/0166-218X(81)90013-5
[5] Crespelle C, Paul C (2006) Fully dynamic recognition algorithm and certificate for directed cographs. Discr. Appl. Math. 154:1722-1741 · Zbl 1110.68096 · doi:10.1016/j.dam.2006.03.005
[6] Dondi R, El-Mabrouk N, Lafond M (2016) Correction of weighted orthology and paralogy relations-complexity and algorithmic results. In: International workshop on algorithms in bioinformatics. Springer, pp 121-136 · Zbl 1383.92054
[7] Ehrenfeucht A, Gabow HN, Mcconnell RM, Sullivan SJ (1994) An \[O(n^2\] n2) divide-and-conquer algorithm for the prime tree decomposition of two-structures and modular decomposition of graphs. J Algorithms 16(2):283-294 · Zbl 0797.68079 · doi:10.1006/jagm.1994.1013
[8] Ehrenfeucht, A.; Harju, T.; Rozenberg, G.; Fülöp, Z. (ed.); Gécseg, F. (ed.), Theory of 2-structures, 1-14 (1995), Berlin · Zbl 1412.68168 · doi:10.1007/3-540-60084-1_58
[9] Ehrenfeucht A, Harju T, Rozenberg G (1999) The theory of 2-structures: a framework for decomposition and transformation of graphs. World Scientific, Singapore · Zbl 0981.05002 · doi:10.1142/4197
[10] Ehrenfeucht A, Rozenberg G (1990) Primitivity is hereditary for 2-structures. Theor Comput Sci 70(3):343-358 · Zbl 0701.05053 · doi:10.1016/0304-3975(90)90131-Z
[11] Ehrenfeucht A, Rozenberg G (1990) Theory of 2-structures, part I: clans, basic subclasses, and morphisms. Theor Comput Sci 70:277-303 · Zbl 0701.05051 · doi:10.1016/0304-3975(90)90129-6
[12] Ehrenfeucht A, Rozenberg G (1990) Theory of 2-structures, part II: representation through labeled tree families. Theor Comput Sci 70:305-342 · Zbl 0701.05052 · doi:10.1016/0304-3975(90)90130-A
[13] Engelfriet J, Harju T, Proskurowski A, Rozenberg G (1996) Characterization and complexity of uniformly nonprimitive labeled 2-structures. Theor Comput Sci 154:247-282 · Zbl 0873.68161 · doi:10.1016/0304-3975(94)00272-X
[14] Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19:99-113 · doi:10.2307/2412448
[15] Fitch WM (2000) Homology a personal view on some of the problems. Trends Genet 16:227-231 · doi:10.1016/S0168-9525(00)02005-9
[16] Gray GS, Fitch WM (1983) Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol Biol Evol 1:57-66
[17] Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N (2013) Orthology relations, symbolic ultrametrics, and cographs. J Math Biol 66:399-420 · Zbl 1408.05038 · doi:10.1007/s00285-012-0525-x
[18] Hellmuth, M.; Wieseke, N.; Xu, D. (ed.), On symbolic ultrametrics, cotree representations, and cograph edge decompositions and partitions, No. 9198, 609-623 (2015), Cham · Zbl 1468.05227
[19] Hellmuth M, Wieseke N (2016) From sequence data including orthologs, paralogs, and xenologs to gene and species trees. Springer International Publishing, Cham · doi:10.1007/978-3-319-41324-2_21
[20] Hellmuth M, Wieseke N (2016) On tree representations of relations and graphs: Symbolic ultrametrics and cograph edge decompositions. arXiv:1509.05069 (preprint ) · Zbl 1468.05227
[21] Hellmuth M, Wieseke N, Lechner M, Lenhof HP, Middendorf M, Stadler PF (2015) Phylogenomics with paralogs. Proc Natl Acad Sci USA 112:2058-2063 · doi:10.1073/pnas.1412770112
[22] Hernandez-Rosales M, Hellmuth M, Wieseke N, Huber KT, Moulton V, Stadler PF (2012) From event-labeled gene trees to species trees. BMC Bioinf 13(Suppl. 19):S6
[23] Jensen RA (2001) Orthologs and paralogs—we need to get it right. Genome Biol 2:8 · doi:10.1186/gb-2001-2-8-interactions1002
[24] Keeling PJ, Palmer JD (2008) Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet 9:605-618 · doi:10.1038/nrg2386
[25] Koonin E (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39:309-338 · doi:10.1146/annurev.genet.39.073003.114725
[26] Koonin EV, Makarova KS, Aravind L (2001) Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol 55:709-742 · doi:10.1146/annurev.micro.55.1.709
[27] Lafond M, Dondi R, El-Mabrouk N (2016) The link between orthology relations and gene trees: a correction perspective. Algorithms Mol Biol 11(1):1 · doi:10.1186/s13015-016-0067-7
[28] Lafond M, El-Mabrouk N (2015) Orthology relation and gene tree correction: complexity results. In: International workshop on algorithms in bioinformatics. Springer, pp 66-79 · Zbl 1367.92086
[29] Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ (2011) Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinf 12:124 · doi:10.1186/1471-2105-12-124
[30] Lechner M, Hernandez-Rosales M, Doerr D, Wiesecke N, Thevenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF (2014) Orthology detection combining clustering and synteny for very large datasets. PLoS One 9(8):e105,015 · doi:10.1371/journal.pone.0105015
[31] McConnell RM (1995) An \[o(n^2)\] o(n2) incremental algorithm for modular decomposition of graphs and 2-structures. Algorithmica 14(3):229-248 · Zbl 0827.05056 · doi:10.1007/BF01206330
[32] McConnell RM, de Montgolfier F (2005) Linear-time modular decomposition of directed graphs. Discr Appl Math 145(2):198-209 · Zbl 1055.05123 · doi:10.1016/j.dam.2004.02.017
[33] Möhring RH (1985) Algorithmic aspects of the substitution decomposition in optimization over relations, set systems and boolean functions. Ann Oper Res 4(1):195-225 · doi:10.1007/BF02022041
[34] Möhring RH, Radermacher FJ (1984) Substitution decomposition for discrete structures and connections with combinatorial optimization. Ann Discr Math 19:257-356 · Zbl 0567.90073
[35] Schmerl JH, Trotter WT (1993) Critically indecomposable partially ordered sets, graphs, tournaments and other binary relational structures. Discr Math 113(1):191-205 · Zbl 0776.06002 · doi:10.1016/0012-365X(93)90516-V
[36] Semple C, Steel M (2003) Phylogenetics, Oxford lecture series in mathematics and its applications, vol 24. Oxford University Press, Oxford · Zbl 1043.92026
[37] Sennblad B, Lagergren J (2009) Probabilistic orthology analysis. Syst Biol 58:411-424 · doi:10.1093/sysbio/syp046
[38] Soucy SM, Huang J, Gogarten JP (2015) Horizontal gene transfer: building the web of life. Nat Rev Genet 16:472-482 · doi:10.1038/nrg3962
[39] Valdes J, Tarjan RE, Lawler EL (1982) The recognition of series parallel digraphs. SIAM J Comput 11:298-313 · Zbl 0478.68065 · doi:10.1137/0211023
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.