×

PULLPRU: a practical approach to estimate phylogenies from single nucleotide polymorphism haplotypes under the maximum parsimony criterion. (English) Zbl 1413.90187

Summary: Phylogeny estimation has been the subject of several researches due to its significant importance and numerous applications. The aim of this research is to study the phylogeny estimation from single nucleotide polymorphism (SNP) haplotypes under the maximum parsimony criterion (MPPEP-SNP). Previous exact methods have modeled the mentioned problem as a mixed integer programming (MIP) problem. Since the problem, in general, proved to be NP-hard which causes MIP models to take long runtime for solving large-scale instances, the need for non-exact methods is obvious. In this paper, the authors propose a heuristic algorithm that attempts to find the MPPEP-SNP solution in several stages by solving a specific MIP model in each stage. Created based on network flows formulation, MIP models appearing in each stage are very small; therefore, their exact solutions could be found practically very fast. In order to evaluate the performance of the proposed algorithm, it has been tested on both simulated and real instances and compared with Pars and Flow-RM as two of the best known methods. Our numerical experiments show that the proposed method can compete with the previous methods in terms of accuracy, runtime, and specially the largeness of the solved instances.

MSC:

90C11 Mixed integer programming
90C35 Programming involving graphs or networks
90C90 Applications of mathematical programming

Software:

PHYLIP
Full Text: DOI

References:

[1] Agarwala R, Fernández-Baca D (1994) A polynomial-time algorithm for the perfect phylogeny problem when the number of character states is fixed. SIAM J Comput 23:1216-1224 · Zbl 0835.68052 · doi:10.1137/S0097539793244587
[2] Bonizzoni P (2007) A linear-time algorithm for the perfect phylogeny haplotype problem. Algorithmica 48:267-285 · Zbl 1121.92050 · doi:10.1007/s00453-007-0094-3
[3] Brooks DR, Bilewitch J, Condy C, Evans DC, Folinsbee KE, Fröbisch J (2007) Quantitative phylogenetic analysis in the 21st century. Rev Mex Biodivers 78:225-252
[4] Bruni R (2010) Mathematical approaches to polymer sequence analysis and related problems. Springer, Berlin
[5] Camin JH, Sokal RR (1965) A method for deducing branching sequences in phylogeny. Evolution 19:311-326 · doi:10.1111/j.1558-5646.1965.tb01722.x
[6] Catanzaro D (2009) The minimum evolution problem: overview and classification. Networks 53:112-125 · Zbl 1168.92036 · doi:10.1002/net.20280
[7] Catanzaro D (2011) Estimating phylogenies from molecular data. In: Mathematical approaches to polymer sequence analysis and related problems. Springer, New York, pp 149-176
[8] Catanzaro D, Ravi R, Schwartz R (2013) A mixed integer linear programming model to reconstruct phylogenies from single nucleotide polymorphism haplotypes under the maximum parsimony criterion. Algorithms Mol Biol 8:1 · doi:10.1186/1748-7188-8-3
[9] Ding Z, Filkov V, Gusfield D (2006) A linear-time algorithm for the perfect phylogeny haplotyping (PPH) problem. J Comput Biol 13:522-553 · doi:10.1089/cmb.2006.13.522
[10] Felsenstein J (2004) Inferring phylogenies, vol 2. Sinauer Associates, Sunderland
[11] Felsenstein J (2005) PHYLIP (phylogeny inference package) version 3.6. Department of Genome Sciences, University of Washington, Seattle
[12] Foulds LR, Graham RL (1982) The Steiner problem in phylogeny is NP-complete. Adv Appl Math 3:43-49 · Zbl 0489.92002 · doi:10.1016/S0196-8858(82)80004-3
[13] Garey MR, Johnson DS (1979) A guide to the theory of NP-completeness. WH Freemann, New York · Zbl 0411.68039
[14] Gascuel O (2005) Mathematics of evolution and phylogeny. Oxford University Press, Oxford · Zbl 1104.92332
[15] Gatesy J, DeSalle R, Wahlberg N (2007) How many genes should a systematist sample? Conflicting insights from a phylogenomic matrix characterized by replicated incongruence. Syst Biol 56:355-363 · doi:10.1080/10635150701294733
[16] Graham R, Foulds L (1982) Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computational time. Math Biosci 60:133-142 · Zbl 0489.92003 · doi:10.1016/0025-5564(82)90125-0
[17] Gusfield D (1991) Efficient algorithms for inferring evolutionary trees. Networks 21:19-28 · Zbl 0719.92015 · doi:10.1002/net.3230210104
[18] Gusfield, D.; Baeza-Yates, R. (ed.); Chávez, E. (ed.); Crochemore, M. (ed.), Haplotype inference by pure parsimony, No. 2676, 144-155 (2003), Berlin, Heidelberg · doi:10.1007/3-540-44888-8_11
[19] Heath TA, Hedtke SM, Hillis DM (2008) Taxon sampling and the accuracy of phylogenetic analyses. J Syst Evol 46:239-257
[20] Hedtke SM, Townsend TM, Hillis DM (2006) Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst Biol 55:522-529 · doi:10.1080/10635150600697358
[21] Hein J (1990) Reconstructing evolution of sequences subject to recombination using parsimony. Math Biosci 98:185-200 · Zbl 0692.92014 · doi:10.1016/0025-5564(90)90123-G
[22] Hein J (1993) A heuristic method to reconstruct the history of sequences subject to recombination. J Mol Evol 36:396-405 · doi:10.1007/BF00182187
[23] Hillis DM, Pollock DD, McGuire JA, Zwickl DJ (2003) Is sparse taxon sampling a problem for phylogenetic inference? Syst Biol 52:124 · doi:10.1080/10635150390132911
[24] Jin G, Nakhleh L, Snir S, Tuller T (2007) Efficient parsimony-based methods for phylogenetic network reconstruction. Bioinformatics 23:e123-e128 · doi:10.1093/bioinformatics/btl313
[25] Kannan S, Warnow T (1997) A fast algorithm for the computation and enumeration of perfect phylogenies. SIAM J Comput 26:1749-1763 · Zbl 0885.68073 · doi:10.1137/S0097539794279067
[26] Misra N, Blelloch G, Ravi R, Schwartz R (2011) Generalized Buneman pruning for inferring the most parsimonious multi-state phylogeny. J Comput Biol 18:445-457 · doi:10.1089/cmb.2010.0254
[27] Pollock DD, Zwickl DJ, McGuire JA, Hillis DM (2002) Increased taxon sampling is advantageous for phylogenetic inference. Syst Biol 51:664 · doi:10.1080/10635150290102357
[28] Rosenberg MS, Kumar S (2003) Taxon sampling, bioinformatics, and phylogenomics. Syst Biol 52:119 · doi:10.1080/10635150390132894
[29] Semple C, Steel MA (2003) Phylogenetics, vol 24. Oxford University Press, Oxford · Zbl 1043.92026
[30] Sforza CL, Edwards AWF (1964) Analysis of human evolution. Genet. Today 3:923-933
[31] Sridhar, S.; Lam, F.; Blelloch, GE; Ravi, R.; Schwartz, R.; Măndoiu, I. (ed.); Zelikovsky, A. (ed.), Efficiently finding the most parsimonious phylogenetic tree via linear programming, No. 4463, 37-48 (2007), Berlin, Heidelberg · doi:10.1007/978-3-540-72031-7_4
[32] Sridhar S, Lam F, Blelloch GE, Ravi R, Schwartz R (2008) Mixed integer linear programming for maximum-parsimony phylogeny inference. IEEE/ACM Trans Comput Biol Bioinf 5:323-331 · doi:10.1109/TCBB.2008.26
[33] Zheng W, Zheng WM(2015) A brief review to phylogenetic reconstruction by maximum parsimony. In: 2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC) Qinhuangdao, China (pp 830-835). IEEE. https://doi.org/10.1109/IMCCC.2015.181
[34] Zwickl DJ, Hillis DM (2002) Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51:588-598 · doi:10.1080/10635150290102339
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.