×

Self-similarity analysis of eubacteria genome based on weighted graph. (English) Zbl 1397.92227

Summary: We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes.

MSC:

92C40 Biochemistry, molecular biology
05C90 Applications of graph theory

References:

[1] Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic acids research, 25, 3389-3402, (1997)
[2] Andraos, J., Kinetic plasticity and the determination of product ratios forkinetic schemes leading to multiple products without rate laws: new methods based on directed graphs, Canadian journal of chemistry, 86, 342-357, (2008)
[3] Barabási, A.L.; Albert, R., Emergence of scaling in random networks, Science, 286, 5439, 509-512, (1999) · Zbl 1226.05223
[4] Chou, K.C., A new schematic method in enzyme kinetics, European journal of biochemistry, 113, 195-198, (1980)
[5] Chou, K.C., Graphical rules in steady and non-steady enzyme kinetics, Journal of biological chemistry, 264, 12074-12079, (1989)
[6] Chou, K.C., Review: applications of graph theory to enzyme kinetics and protein folding kinetics. steady and non-steady state systems, Biophysical chemistry, 35, 1-24, (1990)
[7] Chou, K.C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Current proteomics, 6, 262-274, (2009)
[8] Chou, K.C., Graphic rule for drug metabolism systems, Current drug metabolism, 11, 4, 369-378, (2010)
[9] Chou, K.C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), Journal of theoretical biology, 273, 1, 236-247, (2011) · Zbl 1405.92212
[10] Chou, K.C.; Elrod, D.W., Protein subcellular location prediction, Protein engineering, 12, 107-118, (1999)
[11] Chou, K.C.; Forsen, S., Graphical rules for enzyme-catalyzed rate laws, Biochemical journal, 187, 829-835, (1980)
[12] Chou, K.C.; Forsen, S., Graphical rules of steady-state reaction systems, Canadian journal of chemistry, 59, 737-755, (1981)
[13] Chou, K.C.; Liu, W.M., Graphical rules for non-steady state enzyme kinetics, Journal of theoretical biology, 91, 637-654, (1981)
[14] Chou, K.C.; Shen, H.B., Foldrate: a web-server for predicting protein folding rates from primary sequence, The open bioinformatics journal, 3, 31-50, (2009), Accessible at : 〈http://www.bentham.org/open/tobioij/〉
[15] Chou, K.C.; Zhang, C.T.; Elrod, D.W., Do antisense proteins exist?, Journal of protein chemistry, 15, 59-61, (1996)
[16] Diao, Y.; Li, M.; Feng, Z.; Yin, J.; Pan, Y., The community structure of human cellular signaling network, Journal of theoretical biology, 247, 608-615, (2007) · Zbl 1455.92057
[17] Gao, L.; Ding, Y.S.; Dai, H.; Shao, S.H.; Huang, Z.D.; Chou, K.C., A novel fingerprint map for detecting SARS-cov, Journal of pharmaceutical and biomedical analysis, 41, 246-250, (2006)
[18] Lin, W.Z.; Xiao, X.; Chou, K.C., GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis, Protein engineering, design and selection, 22, 699-705, (2009)
[19] Needleman, S.B.; Wunsch, C.D., A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, 48, 443-453, (1970)
[20] Needleman S.B., Wunsch C.D., EMBOSS - Needle, 〈http://www.ebi.ac.uk/Tools/emboss/align/index.html; Needleman S.B., Wunsch C.D., EMBOSS - Needle, 〈http://www.ebi.ac.uk/Tools/emboss/align/index.html
[21] Okayasu, T.; Sorimachi, K., Organisms can essentially be classified according to two codon patterns, Amino acids, 36, 2, 261-271, (2009)
[22] Pajek Batagelj, V., 2010. 〈http://vlado.fmf.uni-lj.si/pub/networks/pajek/; Pajek Batagelj, V., 2010. 〈http://vlado.fmf.uni-lj.si/pub/networks/pajek/
[23] Qi, X.Q.; Wen, J.; Qi, Z.H., New 3D graphical representation of DNA sequence based on dual nucleotides, Journal of theoretical biology, 249, 681-690, (2007) · Zbl 1453.92233
[24] Qi, Z.H.; Qi, X.Q., Novel 2D graphical representation of DNA sequence based on dual nucleotides, Chemical physics letters, 440, 139-144, (2007)
[25] Qi, Z.H.; Fan, T.R., PN-curve: a 3D graphical representation of DNA sequences and their numerical characterization, Chemical physics letters, 442, 434-440, (2007)
[26] Qi, Z.H.; Qi, X.Q., Numerical characterization of DNA sequences based on digital signal method, Computers in biology and medicine, 39, 388-391, (2009)
[27] Qi, Z.H.; Wang, J.M.; Qi, X.Q., Classification analysis of dual nucleotides using dimension reduction, Journal of theoretical biology, 260, 104-109, (2009)
[28] Qi, Z.H.; Wei, R.Y., A combination dimensionality reduction approach to codon position patterns of eubacteria based on their complete genomes, Journal of theoretical biology, 272, 26-34, (2011) · Zbl 1405.92182
[29] Randic´, M.; Vracko, M.; Lers, N.; Plavsic, D., Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation, Chemical physics letters, 371, 202-207, (2003)
[30] Randic´, M.; Vracko, M.; Lers, N.; Plavsic, D., Novel 2-D graphical representation of DNA sequences and their numerical characterization, Chemical physics letters, 368, 1-6, (2003)
[31] Randic´, M., Spectrum-like graphical representation of DNA based on codons, Acta chimica slovenica, 53, 477-485, (2006)
[32] Shen, H.B.; Song, J.N.; Chou, K.C., Prediction of protein folding rates from primary sequence by fusing multiple sequential features, Journal of biomedical science and engineering (jbise) 2, 136-143, (2009), Accessible at: 〈http://www.srpublishing.org/journal/jbise/〉
[33] Shikata, N.; Maki, Y.; Noguchi, Y., Multi-layered network structure of amino acid (AA) metabolism characterized by each essential AA-deficient condition, Amino acids, 33, 113-121, (2007)
[34] Sorimachi, K.; Okayasu, T., Classification of eubacteria based on their omplete genome: where does mycoplasmataceae belong?, Biology letters, 271, S127-S130, (2004)
[35] Sorimachi, K.; Okayasu, T., Universal rules governing genome evolution expressed by linear formulas, Open genomics journal, 1, 33-43, (2008)
[36] Sorimachi, K.; Okayasu, T., Codon evolution is governed by linear formulas, Amino acids, 34, 4, 661-668, (2008)
[37] Syvanen, M.; Kado, C.I., Horizontal gene transfer, (2002), Academic Press New York
[38] Tatiana, A.T. Thomas, L.M., BLAST (bl2seq), 〈http://blast.ncbi.nlm.nih.gov/bl2seq/wblast2.cgi; Tatiana, A.T. Thomas, L.M., BLAST (bl2seq), 〈http://blast.ncbi.nlm.nih.gov/bl2seq/wblast2.cgi
[39] Wang, M.; Yao, J.S.; Huang, Z.D.; Xu, Z.J.; Liu, G.P.; Zhao, H.Y.; Wang, X.Y.; Yang, J.; Zhu, Y.S.; Chou, K.C., A new nucleotide-composition based fingerprint of SARS-cov with visualization analysis, Medicinal chemistry, 1, 39-47, (2005)
[40] Wu, Z.C.; Xiao, X.; Chou, K.C., 2D-MH: a web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids, Journal of theoretical biology, 267, 29-34, (2010) · Zbl 1410.92089
[41] Watts, D.J.; Strogatz, S.H., Collective dynamics of ‘small-world’ networks, Nature, 393, 6684, 409-410, (1998) · Zbl 1368.05139
[42] Xiao, X.; Lin, W.Z.; Chou, K.C., Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes, Journal of computational chemistry, 29, 2018-2024, (2008)
[43] Xiao, X.; Lin, W.Z., Application of protein grey incidence degree measure to predict protein quaternary structural types, Amino acids, 37, 4, 741-749, (2009)
[44] Xiao, X.; Shao, S.H.; Ding, Y.S.; Huang, Z.D.; Chou, K.C., Using cellular automata images and pseudo amino acid composition to predict protein subcellular location, Amino acids, 30, 49-54, (2006)
[45] Xiao, X.; Shao, S.H.; Chou, K.C., A probability cellular automaton model for hepatitis B viral infections, Biochemical and biophysical research communications, 342, 605-610, (2006)
[46] Xiao, X.; Shao, S.; Ding, Y.; Huang, Z.; Chen, X.; Chou, K.C., Anapplication of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation, Journal of theoretical biology, 235, 555-565, (2005) · Zbl 1445.92184
[47] Xiao, X.; Wang, P.; Chou, K.C., Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image, Journal of theoretical biology, 254, 691-696, (2008) · Zbl 1400.92416
[48] Xiao, X.; Wang, P.; Chou, K.C., GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes, Journal of computational chemistry, 30, 1414-1423, (2009)
[49] Xiao, X.; Wang, P.; Chou, K.C., Predicting protein quaternary structural attribute by hybridizing functional domain composition and pseudo amino acid composition, Journal of applied crystallography, 42, 169-173, (2009)
[50] Xiao, X.; Wang, P.; Chou, K.C., GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions, Molecular biosystems, 7, 3, 911-919, (2011)
[51] Xiao, X., Wang, P., and Chou, K.C., 2010. Quat-2L: a web-server for predicting protein quaternary structural attributes. Molecular Diversity, 10.1007/s11030-010-9227-8.; Xiao, X., Wang, P., and Chou, K.C., 2010. Quat-2L: a web-server for predicting protein quaternary structural attributes. Molecular Diversity, 10.1007/s11030-010-9227-8.
[52] Yao, Y.H.; Nan, X.Y.; Wang, T.M., A new 2D graphical representation classification curve and the analysis of similarity/dissimilarity of DNA sequences, Journal of molecular structure: THEOCHEM, 764, 101-108, (2006)
[53] Yao, Y.H.; Dai, Q.; Nan, X.Y., Analysis of similarity/dissimilarity of DNA sequences based on a class of 2D graphical representation, Journal of computational chemistry, 2, 1632-1639, (2008)
[54] Yao, Y.H.; Dai, Q.; Li, L., Similarity/dissimilarity studies of protein sequences based on a new 2D graphical representation, Journal of computational chemistry, 31, 1045-1052, (2010)
[55] Zhang, C.T.; Chou, K.C., Graphic analysis of codon usage strategy in1490 human proteins, Journal of protein chemistry, 12, 329-335, (1993)
[56] Zhang, C.T.; Chou, K.C., Analysis of codon usage in 1562 E. coli protein coding sequences, Journal of molecular biology, 238, 1-8, (1994)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.