×

Self-organizing map approaches for the haplotype assembly problem. (English) Zbl 1172.92016

Summary: Haplotype assembly is to reconstruct a pair of haplotypes from single nucleotide polymorphism (SNP) values observed in a set of individual DNA fragments. We focus on studying a minimum error correction (MEC) model for the haplotype assembly problem and explore self-organizing map (SOM) methods for this problem. Specifically, haplotype assembly by MEC is formulated as an integer linear programming model. Since the MEC problem is NP-hard and thus cannot be solved exactly within acceptable running time for large-scale instances, we investigate the ability of classical SOMs to solve the haplotype assembly problem with the MEC model. Then, aiming to overcome the limits of classical SOMs, a novel SOM approach is proposed for this problem. Extensive computational experiments on both synthesized and real data sets show that the new SOM-based algorithm can efficiently reconstruct haplotype pairs in a very high accuracy under realistic parameter settings. Comparison with previous methods also confirms the superior performance of the new SOM approach.

MSC:

92C40 Biochemistry, molecular biology
90C90 Applications of mathematical programming
90C10 Integer programming
90C05 Linear programming
Full Text: DOI

References:

[1] Clark, A., Inference of haplotypes from pre-amplified samples of diploid populations, Mol. Biol. Evol., 7, 2, 111-122 (1990)
[2] Daly, M., High-resolution haplotype structure in the human genome, Nat. Genet., 29, 229-232 (2001)
[3] Douglas, J.; Boehnke, M.; Gillanders, E.; Trent, J.; Gruber, S., Experimentally-derived haplotypes substantially increase the eciency of linkage disequilibrium studies, Nat. Genet., 28, 361-364 (2001)
[4] Favata, F.; Walker, R., A study of the application of Kohonen-type neural networks to the travelling salesman problem, Biol. Cybern., 64, 463-468 (1991) · Zbl 0722.92002
[5] Greenberg, H. J.; Hart, W. E.; Lancia, G., Opportunities for combinatorial op- timization in computational biology, Informs J. Comput., 14, 211-231 (2004) · Zbl 1239.90003
[6] Gusfield, D., Inference of haplotypes from samples of diploid populations: complexity and algorithms, J. Comput. Biol., 8, 3, 305-323 (2001)
[7] Halldórsson, B.; Istrail, S.; De La Vega, F., Optimal selection of SNP markers for disease association studies, Hum. Hered., 58, 190-202 (2004)
[8] Halperin, E.; Eskin, E.; Karp, R. M., Large scale reconstruction of haplotypes from genotype data, (Proceedings of the 7th Annual Conference on Research in Computational Molecular Biology (RECOMB) (2003), Springer), 104-113
[9] Helmuth, L., Genome research: map of human genome 3.0, Science, 5530, 293, 583-585 (2001)
[10] Hoehe, M.; Kopke, K.; Wendel, B., Sequence variability and candidate gene analysis in complex disease: association of \(\mu\) opoid receptor gene variation with substance dependence, Hum. Mol. Genet., 9, 2895-2908 (2000)
[11] Lancia, G.; Bafna, V.; Istrail, S.; Lippert, R.; Schwartz, R., SNPs problems, complexity and algorithms, (Proceedings of the Annual European Symposium on Algorithms (ESA) (2001), Springer), 182-193 · Zbl 1016.92023
[12] Levy, S.; Sutton, G.; Ng, P. C.; Feuk, L.; Halpern, A. L., The diploid genome sequence of an individual human, PLoS Biol., 5, 10, e254 (2007)
[13] Li, Z.; Zhou, W.; Zhang, X.; Chen, L., A parsimonious tree-grow method for haplotype inference, Bioinformatics, 21, 3475-3481 (2005)
[14] Lippert, R.; Schwartza, R.; Lancia, G.; Istrail, S., Algorithmic strategies for the SNPs haplotype assembly problem, Brief. Bioinform., 3, 1, 23-31 (2002)
[15] Rizzi, R.; Bafna, V.; Istrail, S.; Lancia, G., Practical algorithms and fixed-parameter tractability for the single individual SNP haplotyping problem, (Proceedings of the 2nd Annual Workshop on Algorithms in Bioinformatics (WABI) (2002), Springer), 29-43 · Zbl 1016.68685
[16] Venter, J., The sequence of the human genome, Science, 291, 1304-1351 (2001)
[17] Wang, R. S.; Wu, L. Y.; Zhang, J. H.; Zhang, X. S., Algorithms for the SNP haplotype assembly problem, Appl. Math. A: J. Chin. Univ. (Series A), 19, Suppl., 515-528 (2004)
[18] Wang, R. S.; Wu, L. Y.; Li, Z.; Zhang, X. S., Haplotype reconstruction from SNP fragments by minimum error correction, Bioinformatics, 21, 2456-2462 (2005)
[19] Wjst, M., Target SNP selection in complex disease association studies, BMC Bioinformatics, 5, 92 (2004)
[20] L.Y. Wu, Application of neural networks in combinatorial optimization and DNA Sequencing. Ph.D. Thesis, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 2002.; L.Y. Wu, Application of neural networks in combinatorial optimization and DNA Sequencing. Ph.D. Thesis, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 2002.
[21] Yan, H.; Papadopoulos, N.; Marra, G., Conversion of diploidy to haploidy, Nature, 403, 723-724 (2000)
[22] Zhang, X. S., Neural Networks in Optimization (2000), Kluwer Academic Publishers · Zbl 1017.90109
[23] Zhang, X. S.; Wang, R. S.; Wu, L. Y.; Chen, L., Models and algorithms for haplotyping problem, Curr. Bioinform., 1, 105-114 (2006)
[24] Zhao, Y.; Wu, L. Y.; Zhang, J. H.; Wang, R. S.; Zhang, X. S., Haplotype assembly from aligned weighted SNP fragments, Comput. Biol. Chem., 29, 281-287 (2005) · Zbl 1102.92031
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.