×

Fast and efficient Rmap assembly using the bi-labelled de Bruijn graph. (English) Zbl 1518.92112

Kingsford, Carl (ed.) et al., 20th international workshop on algorithms in bioinformatics. WABI 2020, September 7–9, 2020, Pisa, Italy, virtual conference. Proceedings. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik. LIPIcs – Leibniz Int. Proc. Inform. 172, Article 9, 16 p. (2020).
Summary: Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there exists very few choices for assembling Rmap data. There exists only one publicly-available non proprietary method for assembly and one proprietary method that is available via an executable. Furthermore, the publicly-available method, by A. Valouev et al. [Lect. Notes Comput. Sci. 3500, 489–504 (2005; Zbl 1119.92336)], follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. in this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. [loc. cit.] and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas testudineus). Our method was the only one able to successfully run on all three genomes. The method of Valouev et al. [loc. cit.] only successfully ran on E. coli and Bionano Solve successfully ran on E. coli and human but not on the fish genome. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies.
For the entire collection see [Zbl 1445.68019].

MSC:

92D20 Protein sequences, DNA sequences
92D10 Genetics and epigenetics
92-08 Computational methods for problems pertaining to biology

Citations:

Zbl 1119.92336
Full Text: DOI

References:

[1] Thomas S. Anantharaman, Bud Mishra, and David C. Schwartz. Genomics via optical mapping iii: Contiging genomic DNA and variations (extended abstract). In ISMB-99, pages 18-27. AAAI Press, 1997.
[2] Anton Bankevich et al. SPAdes: A New Genome Assembly Algorithm and its Applications to Single-Cell Sequencing. J. Comput. Biol., 19(5):455-477, 2012.
[3] Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509-517, 1975. · Zbl 0306.68061
[4] Srikar Chamala et al. Assembly and validation of the genome of the nonmodel basal angiosperm amborella. Science, 342(6165):1516-1517, 2013.
[5] Ping Chen, Xinyun Jing, Jian Ren, Han Cao, Pei Hao, and Xuan Li. Modelling BioNano optical data and simulation study of genome map assembly. Bioinformatics, 34(23):3966-3974, 2018. 9:15
[6] Deanna M. Church et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol., 7(5):e1000112+, 2009.
[7] Yang Dong et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat. Biotechnol., 31:135-141, 2013.
[8] Xian Fan, Jie Xu, and Luay Nakhleh. Detecting large indels using optical map data. In Proc. of RECOMB-CG, pages 108-127, 2018.
[9] Ganeshkumar Ganapathy et al. De novo high-coverage sequencing and annotated assemblies of the budgerigar genome. GigaScience, 3:11, 2014.
[10] Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi, and Glenn Tesler. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8):1072-1075, 2013.
[11] Ramana M. Idury and Michael S. Waterman. A new algorithm forDNA sequence assembly. J. Comput. Biol, 2(2):291-306, 1995.
[12] Le Li et al. OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps. Genome Biol., 18(1):230, 2017.
[13] Menglu Li et al. Towards a more accurate error model for BioNano optical maps. In Proc of ISBRA, pages 67-79, 2016.
[14] Paul Medvedev, Son Pham, Mark Chaisson, Glenn Tesler, and Pavel Pevzner. Paired de Bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. J. Comput. Biol, 18, 2011.
[15] Giles Miclotte, Stéphane Plaisance, Stephane Rombauts, Yves Van de Peer, Pieter Audenaert, et al. OMSim: a simulator for optical map data. Bioinformatics, pages 2740-2742, 2017.
[16] Martin D. Muggli, Simon J. Puglisi, Roy Ronen, and Christina Boucher. Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics, 31(12):i80-i88, 2015.
[17] Kingshuk Mukherjee, Bahar Alipanahi, Tamer Kahveci, Leena Salmela, and Christina Boucher. Aligning optical maps to de Bruijn graphs. Bioinformatics, 35(18):3250-3256, 2019.
[18] Kingshuk Mukherjee, Darshan Washimkar, Martin D. Muggli, Leena Salmela, and Christina Boucher. Error correcting optical mapping data. GigaScience, 7, 2018.
[19] Weihua Pan, Tao Jiang, and Stefano Lonardi. OMGS: optical map-based genome scaffolding. J. Comput. Biol, 27(4):519-533, 2020. · Zbl 1412.92208
[20] Weihua Pan and Stefano Lonardi. Accurate detection of chimeric contigs via BioNano optical maps. Bioinformatics, 35(10):1760-1762, 2018.
[21] Yu Peng, Henry CM Leung, Siu-Ming Yiu, and Francis YL Chin. IDBA-UD: A de novo assem-bler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11):1420-1428, 2012.
[22] Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci., 98(17):9748-9753, 2001. · Zbl 0993.92018
[23] Susan Reslewic et al. Whole-genome shotgun optical mapping of Rhodospirillum Rubrum. Appl. Environ. Microbiol., 71(9):5511-5522, 2005.
[24] David C. Schwartz, Xiaojun Li, Luis I Hernandez, Satyadarshan P. Ramnarain, Edward J. Huff, and Yu-Ker Wang. Ordered restriction maps of saccharomyces cerevisiae chromosomes constructed by optical mapping. Science, 262:110-114, 1993.
[25] Jennifer M. Shelton, Michelle C. Coleman, Nic Herndon, Nanyan Lu, Ernest T. Lam, Thomas Anantharaman, Palak Sheth, and Susan J. Brown. Tools and pipelines for BioNano data: molecule assembly pipeline and fasta super scaffolding tool. BMC Genomics, 16(1):734, 2015.
[26] Jared T. Simpson, Kim Wong, Shaun D. Jackman, Jacqueline E. Schein, Steven J. M. Jones, and Inanç Birol. ABySS: a parallel assembler for short read sequence data. Genome Res., 19(6):1117-1123, 2009.
[27] Brian Teague et al. High-resolution human genome structure by single-molecule analysis. Proc. Natl. Acad. Sci., 107(24):10848-10853, 2010.
[28] Anton Valouev, David C. Schwartz, Shiguo Zhou, and Michael S. Waterman. An algorithm for assembly of ordered restriction maps from single dna molecules. Proc. Natl. Acad. Sci., 103(43):15770-15775, 2006.
[29] Daniel R. Zerbino and Ewan Birney. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18(5):821-829, 2008.
[30] Shiguo Zhou et al. A whole-genome shotgun optical map of Yersinia pestis strain KIM. Appl. Environ. Microbiol., 68(12):6321-6331, 2002.
[31] Shiguo Zhou et al. Shotgun optical mapping of the entire leishmania major Friedlin genome. Mol. Biochem. Parasitol., 138(1):97-106, 2004.
[32] Shiguo Zhou et al. Validation of rice genome sequence by optical mapping. BMC Genom., 8(1):278, 2007.
[33] Shiguo Zhou et al. A Single Molecule Scaffold for the Maize Genome. PLoS Genetics, 5:e1000711, 2009.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.