×

Similarity analysis for DNA sequences based on chaos game representation. Case study: the albumin. (English) Zbl 1414.92203

Summary: Using chaos game representation we introduce a novel and straightforward method for identifying similarities/dissimilarities between DNA sequences of the same type, from different organisms. A matrix is associated to each CGR pattern and the similarities result from the comparison between the matrices of the sequences of interest. Three different methods of analysis of the resulting difference matrix are considered: a 3-dimensional representation giving both local and global information, a numerical characterization by defining an n-letter word similarity measure and a statistical evaluation. The method is illustrated by implementation to the study of albumin nucleotides sequences from eight mammal species taking as reference the human albumin.

MSC:

92D20 Protein sequences, DNA sequences
62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Achuthsankar, S. Nair; Vrinda, V. Nair; Arun, K. S.; Kant, K.; Dey, A., (Fulekar, M. H., Bio-sequence Signatures Using Chaos Game Representation in Bioinformatics: Applications in Life and Environmental Sciences (2009), Springer), 62-76, cap 6
[2] Almeida, J. S.; Carriço, J. A.; Maretzek, A.; Noble, P. A.; Fletcher, M., Analysis of genomic sequences by chaos game representation, Bioinformatics, 17, 429-437 (2001)
[3] Cristescu, C. P.; Stan, C.; Scarlat, I. E., Modeling with the chaos game (i). Simulating some features of real time series, U.P.B. Sci. Bull. Ser. A, 71, 95-100 (2009)
[4] Cristescu, C. P.; Stan, C.; Scarlat, E., The dynamics of exchange rate time series and the chaos game, Physica A., 388, 4845-4855 (2009)
[5] Deschavanne, P. J.; Giron, A.; Vilain, J.; Fagot, G.; Fertil, B., Genomic signature: characterization and classification of species assessed by chaos game representation of sequences, Mol. Biol. Evol., 16, 1391-1399 (1999)
[6] Gates, M. A., A simple way to look at DNA, J. Theor. Biol., 119, 319-328 (1986)
[7] González-Díaza, H.; Pérez-Montoto, L. G.; Duardo-Sanchez, A.; Paniagua, E.; Vázquez-Prieto, S.; Vilas, R., Generalized lattice graphs for 2D-visualization of biological information, J. Theor. Biol., 261, 136-147 (2009) · Zbl 1403.92091
[8] Jeffrey, H. J., Chaos game representation of gene structure, Nucleic Acids Res., 18, 2163-2170 (1990)
[9] Li, C.; Tang, N.; Wang, J., Directed graphs of DNA sequences and their numerical characterization, J. Theor. Biol., 241, 173-177 (2006) · Zbl 1447.92306
[10] Liao, B.; Wang, T. M., Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation, Chem. Phys. Lett., 388, R195-R200 (2004)
[11] Peak, D.; Frame, M., Chaos Under Control: The Art and Science of Complexity (1994), Freeman: Freeman New York · Zbl 0857.58031
[12] Randić, M., Another look at the chaos-game representation of DNA, Chem. Phys. Lett., 456, 84-88 (2008)
[13] Randić, M.; Vračko, M.; Lerš, N.; Plavšić, D., Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation, Chem. Phys. Lett., 371, R202-R207 (2003)
[14] Randić, M.; Zupan, J.; Vikic-Topic, D., On representation of proteins by star-like graphs, J. Mol. Graph. Model., 26, 290-305 (2007)
[15] Rasouli, M.; Rasouli, G.; Lenz, F. A.; Borrett, D.; Verhagen, L.; Kwan, H. C., Chaos game representation of human pallidal spike trains, J. Biol. Phys., 36, 197-205 (2010)
[16] Roy, A.; Raychaudhury, C.; Nandy, A., Novel techniques of graphical representation and analysis of DNA sequences—a review, J. Biosci., 23, 55-71 (1998)
[17] Wang, S.; Tian, F.; Qiu, Y.; Liu, X., J. Theor. Biol. (2010)
[18] Yang, J. Y.; Peng, Z. L.; Yu, Z. G.; Zhang, R. J.; Anh, V.; Wang, D., Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation, J. Theor. Biol., 257, 618-626 (2009) · Zbl 1400.92417
[19] Yu, Z. G.; Anh, V.; Lau, K. S., Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses, J. Theor. Biol., 226, 341-348 (2004) · Zbl 1439.92148
[20] Zbilut, J. P.; Sirabella, P.; Giuliani, A.; Manetti, C.; Colosimo, A.; Webber, C. L., Review of nonlinear analysis of proteins through recurrence quantification, Cell Biochem. Biophys., 36, 67-87 (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.