×

Aligning two fragmented sequences. (English) Zbl 1014.92012

Summary: Upon completion of the human and mouse genome sequences, world-wide sequencing capacity will turn to other complex organisms. Current strategies call for many of these genomes to be incompletely sequenced. That is, holes will remain in the known sequences, and the relative order and orientation of the known sequence fragments may not be determined. Sequence comparison between two genomes of this sort may allow some of the fragments to be oriented and ordered relative to each other by computational means. We formalize this as an optimization problem, show that the problem is MAX-SNP hard, and develop a polynomial time algorithm that is guaranteed to produce a solution whose score is within a factor 3 of optimal.

MSC:

92C40 Biochemistry, molecular biology
65Y20 Complexity and performance of numerical algorithms
92-08 Computational methods for problems pertaining to biology
Full Text: DOI

References:

[1] A. Bar-Noy, S. Guha, J. Naor, B. Schieber, Approximating the throughput of multiple machines in real-time scheduling, Proceedings of the 31st ACM STOC, Atlanta, GA, USA, 1999, pp. 622-631.; A. Bar-Noy, S. Guha, J. Naor, B. Schieber, Approximating the throughput of multiple machines in real-time scheduling, Proceedings of the 31st ACM STOC, Atlanta, GA, USA, 1999, pp. 622-631. · Zbl 1345.68026
[2] Berman, P.; DasGupta, B., Multi-phase algorithms for throughput maximization for real-time scheduling, J. Combin. Optim., 4, 3, 307-323 (2000) · Zbl 0991.90061
[3] Berman, P.; Karpinski, M., On some tighter inapproximability results, (Wiedermann, J.; van Emde Boas, P.; Nielen, M., Automata, Languages and Programming, ICALP’99. Automata, Languages and Programming, ICALP’99, Lecture Notes in Computer Science, Vol. 1644 (1999), Springer: Springer Berlin)
[4] Bouck, J.; Miller, W.; Gorrell, J.; Muzny, D.; Gibbs, R. A., Analysis of the quality and utility of random shotgun sequencing at low redundancies, Genome Res., 8, 1074-1084 (1998)
[5] B. Chandra, M.M. Halldórsson, Greedy local improvement and weighted packing approximation, SODA, 1999.; B. Chandra, M.M. Halldórsson, Greedy local improvement and weighted packing approximation, SODA, 1999.
[6] Dirac, G. A., Some theorems on abstract graphs, Proc. London Math. Soc., 2, 69-81 (1952) · Zbl 0047.17001
[7] McClelland, M.; Florea, L.; Sanderson, K.; Clifton, S.; Parkhill, J.; Churcher, C.; Dougan, G.; Wilson, R.; Miller, W., Comparison of the Escherichia coli K12 genome with sampled genomes of Klebsiella pneumoniae and three Salmonella enterica serovars, Typhimurium, Typhi and Paratyphi, Nucleic Acids Res., 28, 4974-4986 (2000)
[8] Onyango, P.; Miller, W.; Lehoczky, J.; Leung, C.; Birren, B.; Wheelan, S.; Dewar, K.; Feinberg, A. P., Sequence and comparative analysis of the mouse 1 megabase region orthologous to the human 11p15 imprinted domain, Genome Res., 10, 1697-1710 (2000)
[9] Sanger, F.; Nicklen, S.; Coulson, A. R., DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. USA, 74, 12, 5463-5468 (1977)
[10] The Sanger Center & The Genome Sequencing Center, Towards a complete human genome sequence, Genome Res. 8 (1998) 1097-1108.; The Sanger Center & The Genome Sequencing Center, Towards a complete human genome sequence, Genome Res. 8 (1998) 1097-1108.
[11] Weber, J.; Myers, G., Whole genome shotgun sequencing, Genome Res., 7, 401-409 (1997)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.