×

On bubble generators in directed graphs. (English) Zbl 1435.68224

Summary: Bubbles are pairs of internally vertex-disjoint \((s, t)\)-paths in a directed graph, which have many applications in the processing of DNA and RNA data. Listing and analysing all bubbles in a given graph is usually unfeasible in practice, due to the exponential number of bubbles present in real data graphs. In this paper, we propose a notion of bubble generator set, i.e., a polynomial-sized subset of bubbles from which all the other bubbles can be obtained through a suitable application of a specific symmetric difference operator. This set provides a compact representation of the bubble space of a graph. A bubble generator can be useful in practice, since some pertinent information about all the bubbles can be more conveniently extracted from this compact set. We provide a polynomial-time algorithm to decompose any bubble of a graph into the bubbles of such a generator in a tree-like fashion. Finally, we present two applications of the bubble generator on a real RNA-seq dataset.

MSC:

68R10 Graph theory (including graph drawing) in computer science
05C20 Directed graphs (digraphs), tournaments
05C38 Paths and cycles
05C85 Graph algorithms (graph-theoretic aspects)
68T09 Computational aspects of data analysis and big data
92D20 Protein sequences, DNA sequences

Software:

STAR; ABySS; Velvet

References:

[1] Acuña, V., Grossi, R., Italiano, G.F., Lima, L., Rizzi, R., Sacomoto, G., Sagot, M., Sinaimeri, B.: On bubble generators in directed graphs. In: 43rd International Workshop on Graph-Theoretic Concepts in Computer Science, WG 2017, Eindhoven, The Netherlands, June 21-23 Lecture Notes in Computer Science, vol. 10520, pp. 18-31. Springer (2017) · Zbl 1435.68225
[2] Benoit-Pilven, C.; Marchet, C.; Chautard, E.; Lima, L.; Lambert, Mp; Sacomoto, G.; Rey, A.; Cologne, A.; Terrone, S.; Dulaurier, L.; Claude, Jb; Bourgeois, C.; Auboeuf, D.; Lacroix, V., Complementarity of assembly-first and mapping-first approaches for alternative splicing annotation and differential analysis from RNAseq data, Sci. Rep., 8, 1, 4307 (2018) · doi:10.1038/s41598-018-21770-7
[3] Birmelé, E., Crescenzi, P., Ferreira, R., Grossi, R., Lacroix, V., Marino, A., Pisanti, N., Sacomoto, G., Sagot, M.F.: Efficient bubble enumeration in directed graphs. In: SPIRE, pp. 118-129 (2012)
[4] Bollobás, B., Modern Graph Theory (1998), Berlin: Springer, Berlin · Zbl 0902.05016
[5] Brankovic, L.; Iliopoulos, Cs; Kundu, R.; Mohamed, M.; Pissis, Sp; Vayani, F., Linear-time superbubble identification algorithm for genome assembly, Theor. Comput. Sci., 609, 374-383 (2016) · Zbl 1331.92091 · doi:10.1016/j.tcs.2015.10.021
[6] Cormen, Th; Leiserson, Ce; Rivest, Rl, Introduction to Algorithms (1991), Cambridge: MIT Press, Cambridge
[7] Deo, N., Graph Theory with Applications to Engineering and Computer Science (1974), Englewood Cliffs: Prentice-Hall, Englewood Cliffs · Zbl 0285.05102
[8] Dobin, A.; Davis, Ca; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, Tr, Star: ultrafast universal rna-seq aligner, Bioinformatics, 29, 1, 15-21 (2013) · doi:10.1093/bioinformatics/bts635
[9] Gleiss, Pm; Leydold, J.; Stadler, Pf, Circuit bases of strongly connected digraphs, Discuss. Math. Gr. Theory, 23, 2, 241-260 (2003) · Zbl 1055.05068 · doi:10.7151/dmgt.1200
[10] Iqbal, Z.; Caccamo, M.; Turner, I.; Flicek, P.; Mcvean, G., De novo assembly and genotyping of variants using colored de bruijn graphs, Nat. Genet., 44, 2, 226-232 (2012) · doi:10.1038/ng.1028
[11] Kavitha, T.; Liebchen, C.; Mehlhorn, K.; Michail, D.; Rizzi, R.; Ueckerdt, T.; Zweig, Ka, Cycle bases in graphs characterization, algorithms, complexity, and applications, Comput. Sci. Rev., 3, 4, 199-243 (2009) · Zbl 1301.05195 · doi:10.1016/j.cosrev.2009.08.001
[12] Kavitha, T.; Mehlhorn, K., Algorithms to compute minimum cycle bases in directed graphs, Theory Comput. Syst., 40, 4, 485-505 (2007) · Zbl 1121.68087 · doi:10.1007/s00224-006-1319-6
[13] Lima, L.; Sinaimeri, B.; Sacomoto, G.; Lopez-Maestre, H.; Marchet, C.; Miele, V.; Sagot, Mf; Lacroix, V., Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads, Algorithms Mol. Biol., 12, 2-2 (2017) · doi:10.1186/s13015-017-0091-2
[14] Maclane, S., A combinatorial condition for planar graphs, Fundamenta Mathematicae, 28, 22-32 (1937) · Zbl 0015.37501
[15] Miller, Jr; Koren, S.; Sutton, G., Assembly algorithms for next-generation sequencing data, Genomics, 95, 6, 315-327 (2010) · doi:10.1016/j.ygeno.2010.03.001
[16] Onodera, Taku; Sadakane, Kunihiko; Shibuya, Tetsuo, Detecting Superbubbles in Assembly Graphs, Lecture Notes in Computer Science, 338-348 (2013), Berlin, Heidelberg: Springer Berlin Heidelberg, Berlin, Heidelberg
[17] Pevzner, Pa; Tang, H.; Tesler, G., De novo repeat classification and fragment assembly, Genome Res., 14, 9, 1786-1796 (2004) · doi:10.1101/gr.2395204
[18] Sacomoto, G.; Kielbassa, J.; Chikhi, R.; Uricaru, R.; Antoniou, P.; Sagot, Mf; Peterlongo, P.; Lacroix, V., Kissplice: de-novo calling alternative splicing events from rna-seq data, BMC Bioinf., 13, S-6, S5 (2012) · doi:10.1186/1471-2105-13-S6-S5
[19] Sacomoto, Gustavo; Lacroix, Vincent; Sagot, Marie-France, A Polynomial Delay Algorithm for the Enumeration of Bubbles with Length Constraints in Directed Graphs and Its Application to the Detection of Alternative Splicing in RNA-seq Data, Lecture Notes in Computer Science, 99-111 (2013), Berlin, Heidelberg: Springer Berlin Heidelberg, Berlin, Heidelberg
[20] Sammeth, M., Complete alternative splicing events are bubbles in splicing graphs, J. Comput. Biol., 16, 8, 1117-1140 (2009) · doi:10.1089/cmb.2009.0108
[21] Shilov, Ge, Linear Algebra (1977), New York: Dover Publications, New York
[22] Simpson, Jt; Wong, K.; Jackman, Sd; Schein, Je; Jones, Sjm; Birol, I., ABySS: A parallel assembler for short read sequence data, Genome Res., 19, 6, 1117-1123 (2009) · doi:10.1101/gr.089532.108
[23] Sung, Wk; Sadakane, K.; Shibuya, T.; Belorkar, A.; Pyrogova, I., An \(O(m \log m)\)-time algorithm for detecting superbubbles, IEEE/ACM Trans. Comput. Biol. Bioinf., 12, 4, 770-777 (2015) · doi:10.1109/TCBB.2014.2385696
[24] Uricaru, R.; Rizk, G.; Lacroix, V.; Quillery, E.; Plantard, O.; Chikhi, R.; Lemaitre, C.; Peterlongo, P., Reference-free detection of isolated SNPs, Nucl. Acids Res., 43, 2, e11 (2015) · doi:10.1093/nar/gku1187
[25] Younsi, R.; Maclean, D., Using \(2k+2\) bubble searches to find single nucleotide polymorphisms in k-mer graphs, Bioinformatics, 31, 5, 642-646 (2015) · doi:10.1093/bioinformatics/btu706
[26] Zerbino, D.; Birney, E., Velvet: algorithms for de novo short read assembly using de Bruijn graphs, Genome Res., 18, 821-829 (2008) · doi:10.1101/gr.074492.107
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.