×

Phylogenetic analysis of DNA sequences with a novel characteristic vector. (English) Zbl 1303.92081

Summary: In the basic biological research, one of major tasks is to compare biological sequences to infer evolutionary relations among sequences. In this paper, considering both the positions and numbers of a \(k\)-word and the random background, a novel characteristic vector of a DNA sequence is proposed to serve for genetic sequences comparison and phylogenetic analysis. The vector is composed of elements which characterize the relative difference of a DNA sequence from a sequence generated by a \((k-2)\)th order Markov process. Finally, we reconstruct the phylogenetic trees of 48 HEV (Hepatitis E virus) and 20 Eutherian mammals. The results show that this new method provides more information about \(k\)-word and improves the efficiency of sequence comparison.

MSC:

92D15 Problems related to evolution
92D10 Genetics and epigenetics

Software:

PHYLIP
Full Text: DOI

References:

[1] Waterman M. S.: Introduction to computational biology: maps, sequeces, and genomes. Chapman & Hall, New York (1995) · Zbl 0831.92011
[2] Durbin R., Eddy S.R., Krogh A., Mitchison G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998) · Zbl 0929.92010
[3] Randic M., Vracko M.: J. Chem. Inf. Comput. Sci. 40, 599 (2000) · doi:10.1021/ci9901082
[4] Randic M., Vracko M., Lers N., Plavsic D.: Chem. Phys. Lett. 368, 1 (2003) · doi:10.1016/S0009-2614(02)01784-0
[5] Randic M., Balaban A.T.: J. Chem. Inf. comput. Sci. 43, 532 (2003) · doi:10.1021/ci020051a
[6] Liao B., Wang T.M.: Chem. Phys. Lett. 388(1-3), 195 (2004) · doi:10.1016/j.cplett.2004.02.089
[7] Huang G.H., Liao B., Li Y.F., Yu Y.G.: Biophys. Chem. 143, 55 (2009) · doi:10.1016/j.bpc.2009.03.013
[8] Liao B.: Chem. Phys. Lett. 401, 196 (2005) · doi:10.1016/j.cplett.2004.11.059
[9] Berger J., Mitra S., Carli M., Neri A.: J. Franklin Inst. 341, 37 (2004) · Zbl 1094.92025 · doi:10.1016/j.jfranklin.2003.12.002
[10] Yao Y.H., Wang T.M.: Chem. Phys. Lett. 398, 318 (2004) · doi:10.1016/j.cplett.2004.09.087
[11] Yao Y.H., Nan X.Y., Wang T.M.: Chem. Phys. Lett. 411, 248 (2005) · doi:10.1016/j.cplett.2005.06.040
[12] Vinga S., Almeida J.: Bioinformatics 19, 513 (2003) · doi:10.1093/bioinformatics/btg005
[13] Reinert G., Schbath S., Waterman M.S.: J. Comput. Biol. 7, 1 (2000) · doi:10.1089/10665270050081360
[14] Dai Q., Wang T.M.: Bioinformatics 24, 2296 (2008) · doi:10.1093/bioinformatics/btn436
[15] Huang Y.J., Yang L.P., Wang T.M.: J. Theor. Biol. 269(1), 217 (2011) · Zbl 1307.92286 · doi:10.1016/j.jtbi.2010.10.027
[16] Blaisdell B.E.: Proc. Natl. Acad. Sci. USA. 83, 5155 (1986) · Zbl 0592.92011 · doi:10.1073/pnas.83.14.5155
[17] Wu T.J., Burke J.P., Davison D.B.: Biometrics 53, 1431 (1997) · Zbl 0931.62100 · doi:10.2307/2533509
[18] Wu T.J., Hsieh Y.C., Li L.A.: Biometrics 57, 441 (2001) · Zbl 1209.62339 · doi:10.1111/j.0006-341X.2001.00441.x
[19] Stuart G.W., Moffect K., Baker S.: Bioinformatics 18, 100 (2002) · doi:10.1093/bioinformatics/18.1.100
[20] Hao B.L., Qi J.: J. Bioinf. Comput. Biol. 2, 1 (2004) · doi:10.1142/S0219720004000442
[21] Gao L., Qi J., Hao B.L.: AAPPS Bull. 6, 3 (2006)
[22] Qi J., Wang B., Hao B.L.: J. Mol. Biol. 58, 1 (2004)
[23] Wang H., Xu Z., Gao L., Hao B.L.: BMC Evol. Biol. 9, 195 (2009) · doi:10.1186/1471-2148-9-195
[24] Lu L., Li C., Hagedorn C.H.: Rev. Med. Virol. 16, 5 (2006) · doi:10.1002/rmv.482
[25] Liu Z.H., Meng J.H., Sun X.: Biochem. Biophys. Res. Commun. 368, 223 (2008) · doi:10.1016/j.bbrc.2008.01.070
[26] Chatterjee R., Tsarev S., Pillot J., Coursaget P., Emerson S.U., Purcell R.H.: J. Med. Virol. 53, 139 (1997) · doi:10.1002/(SICI)1096-9071(199710)53:2<139::AID-JMV5>3.0.CO;2-A
[27] van Cuyck-Gandre H., Zhang H.Y., Tsarev S.A., Clements N.J., Cohen S.J., Caudill J.D., Buisson Y., Coursaget P., Warren R.L., Longer C.F.: J. Med. Virol. 53, 340 (1997) · doi:10.1002/(SICI)1096-9071(199712)53:4<340::AID-JMV5>3.0.CO;2-7
[28] Felsenstein J.: PHYLIP (Phylogenetic Inference Package) ver. 3.57. Department of Genetics, University of Washington, Seattle, WA (1995)
[29] Arnason U., Adegoke J.A., Bodin K., Born E.W., Esa Y.B., Gullberg A., Nilsson M., Short R.V., Xu X.f., Janke A.: Proc. Natl. Acad. Sci. USA. 99(12), 8151 (2002) · doi:10.1073/pnas.102164299
[30] Reyes A., Gissi C., Catzeflis F., Nevo E., Pesole G., Saccone C.: Mol. Biol. Evol. 21(2), 397 (2004) · doi:10.1093/molbev/msh033
[31] Prasad A.B., Allard M.W., Green E.D.: Mol. Biol. Evol. 25(9), 1795 (2008) · doi:10.1093/molbev/msn104
[32] Zheng X.Q., Qin Y.F., Wang J.: Math. Biosci. 217, 159 (2009) · Zbl 1157.92310 · doi:10.1016/j.mbs.2008.11.006
[33] Otu H.H., Sayood K.: Bioinformatics 19, 2122 (2003) · doi:10.1093/bioinformatics/btg295
[34] Fletcher W., Yang Z.H.: Mol. Biol. Evol. 26(8), 1879 (2009) · doi:10.1093/molbev/msp098
[35] Robinson D., Foulds L.: Math. Biosci. 53, 131 (1981) · Zbl 0451.92006 · doi:10.1016/0025-5564(81)90043-2
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.