×

Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins. (English) Zbl 1406.92211

Summary: By incorporating the information of gene ontology, functional domain, and sequential evolution, a new predictor called Gneg-mPLoc was developed. It can be used to identify Gram-negative bacterial proteins among the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. It can also be used to deal with the case when a query protein may simultaneously exist in more than one location. Compared with the original predictor called Gneg-PLoc, the new predictor is much more powerful and flexible. For a newly constructed stringent benchmark dataset in which none of proteins included has \(\geq 25\%\) pairwise sequence identity to any other in a same subset (location), the overall jackknife success rate achieved by Gneg-mPLoc was 85.5%, which was more than 14% higher than the corresponding rate by the Gneg-PLoc. As a user friendly web-server, Gneg-mPLoc is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/Gneg-multi/.

MSC:

92C40 Biochemistry, molecular biology
92C37 Cell biology
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI

References:

[1] Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A.; Hill, D. P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J. C.; Richardson, J. E.; Ringwald, M.; Rubin, G. M.; Sherlock, G., Gene ontology: tool for the unification of biology, Nat. Genet., 25, 25-29 (2000)
[2] Camon, E.; Magrane, M.; Barrell, D.; Binns, D.; Fleischmann, W.; Kersey, P.; Mulder, N.; Oinn, T.; Maslen, J.; Cox, A.; Apweiler, R., The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro, Genome Res., 13, 662-672 (2003)
[3] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo-amino acid composition and support vector machine, Protein Pept. Lett., 16, 27-31 (2009)
[4] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, PROTEINS: Struct. Funct. Genet., 43, 246-255 (2001), (Erratum: ibid., 2001, Vol.44, 60)
[5] Chou, K. C., Review: structural bioinformatics and its impact to biomedical science, Curr. Med. Chem., 11, 2105-2134 (2004)
[6] Chou, K. C., Pseudo-amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteomics, 6, 262-274 (2009)
[7] Chou, K. C.; Zhang, C. T., Review: Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349 (1995)
[8] Chou, K. C.; Shen, H. B., Large-scale predictions of Gram-negative bacterial protein subcellular locations, J. Proteome Res., 5, 3420-3428 (2006)
[9] Chou, K. C.; Shen, H. B., Large-scale plant protein subcellular location prediction, J. Cell. Biochem., 100, 665-678 (2007)
[10] Chou, K. C.; Shen, H. B., Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites, J. Proteome Res., 6, 1728-1734 (2007)
[11] Chou, K. C.; Shen, H. B., Review: recent progresses in protein subcellular location prediction, Anal. Biochem., 370, 1-16 (2007)
[12] Chou, K. C.; Shen, H. B., Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. Protoc., 3, 153-162 (2008)
[13] Cover, T. M.; Hart, P. E., Nearest neighbour pattern classification, IEEE Trans. Inf. Theory IT, 13, 21-27 (1967) · Zbl 0154.44505
[14] Denoeux, T., A k-nearest neighbor classification rule based on Dempster-Shafer theory, IEEE Trans. Syst. Man Cybern., 25, 804-813 (1995)
[15] Ding, H.; Luo, L.; Lin, H., Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo-amino acid composition, Protein Peptide Lett., 16, 351-355 (2009)
[16] Favre, D.; Ngai, P. K.; Timmis, K. N., Relatedness of a periplasmic, broad-specificity RNase from Aeromonas hydrophila to RNase I of Escherichia coli and to a family of eukaryotic RNases, J. Bacteriol., 175, 3710-3722 (1993)
[17] Finn, R. D.; Mistry, J.; Schuster-Bockler, B.; Griffiths-Jones, S.; Hollich, V.; Lassmann, T.; Moxon, S.; Marshall, M.; Khanna, A.; Durbin, R.; Eddy, S. R.; Sonnhammer, E. L.; Bateman, A., Pfam: clans, web tools and services, Nucleic. Acids. Res., 34, D247-D251 (2006)
[18] Gardy, J. L.; Laird, M. R.; Chen, F.; Rey, S.; Walsh, C. J.; Ester, M.; Brinkman, F. S., PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis, Bioinformatics, 21, 617-623 (2005)
[19] Gardy, J. L.; Spencer, C.; Wang, K.; Ester, M.; Tusnady, G. E.; Simon, I.; Hua, S.; deFays, K.; Lambert, C.; Nakai, K.; Brinkman, F. S., PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria, Nucleic Acids Res., 31, 3613-3617 (2003)
[20] Gerstein, M.; Thornton, J. M., Sequences and topology, Curr. Opin. Struct. Biol., 13, 341-343 (2003)
[21] Glory, E.; Murphy, R. F., Automated subcellular location determination and high-throughput microscopy, Dev. Cell, 12, 7-16 (2007)
[22] Gonzalez-Diaz, H.; Gonzalez-Diaz, Y.; Santana, L.; Ubeira, F. M.; Uriarte, E., Proteomics, networks, and connectivity indices, Proteomics, 8, 750-778 (2008)
[23] Kedarisetti, K. D.; Kurgan, L. A.; Dick, S., Classifier ensembles for protein structural class prediction with varying homology, Biochem. Biophys. Res. Commun., 348, 981-988 (2006)
[24] Letunic, I.; Copley, R. R.; Pils, B.; Pinkert, S.; Schultz, J.; Bork, P., SMART 5: domains in the context of genomes and networks, Nucleic Acids Res., 34, D257-D260 (2006)
[25] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo-amino acid composition, J. Theor. Biol., 252, 350-356 (2008) · Zbl 1398.92076
[26] Loewenstein, Y.; Raimondo, D.; Redfern, O. C.; Watson, J.; Frishman, D.; Linial, M.; Orengo, C.; Thornton, J.; Tramontano, A., Protein function annotation by homology-based inference, Genome. Biol., 10, 207 (2009)
[27] Marchler-Bauer, A.; Anderson, J. B.; Derbyshire, M. K.; DeWeese-Scott, C.; Gonzales, N. R.; Gwadz, M.; Hao, L.; He, S.; Hurwitz, D. I.; Jackson, J. D.; Ke, Z.; Krylov, D.; Lanczycki, C. J.; Liebert, C. A.; Liu, C.; Lu, F.; Lu, S.; Marchler, G. H.; Mullokandov, M.; Song, J. S.; Thanki, N.; Yamashita, R. A.; Yin, J. J.; Zhang, D.; Bryant, S. H., CDD: a conserved domain database for interactive domain family analysis, Nucleic Acids Res., 35, D237-D240 (2007)
[28] Nakai, K., Protein sorting signals and prediction of subcellular localization, Adv. Protein Chem., 54, 277-344 (2000)
[29] Nakai, K.; Kanehisa, M., Expert system for predicting protein localization sites in Gram-negative bacteria, Proteins: Struct. Funct. Genet., 11, 95-110 (1991)
[30] Nakai, K.; Horton, P., PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization, Trends Biochem. Sci., 24, 34-36 (1999)
[31] Schaffer, A. A.; Aravind, L.; Madden, T. L.; Shavirin, S.; Spouge, J. L.; Wolf, Y. I.; Koonin, E. V.; Altschul, S. F., Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic Acids Res., 29, 2994-3005 (2001)
[32] Shafer, G., A mathematical Theory of Evidence (1976), Princeton University Press: Princeton University Press Princeton NJ · Zbl 0359.62002
[33] Shen, H. B.; Chou, K. C., Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites, Biochem. Biophys. Res. Commun., 355, 1006-1011 (2007)
[34] Smith, C., 2008. Subcellular targeting of proteins and drugs. 〈http://www.biocompare.com/Articles/TechnologySpotlight/976/Subcellular-Targeting-Of-Proteins-And-Drugs.html; Smith, C., 2008. Subcellular targeting of proteins and drugs. 〈http://www.biocompare.com/Articles/TechnologySpotlight/976/Subcellular-Targeting-Of-Proteins-And-Drugs.html
[35] Tatusov, R. L.; Fedorova, N. D.; Jackson, J. D.; Jacobs, A. R.; Kiryutin, B.; Koonin, E. V.; Krylov, D. M.; Mazumder, R.; Mekhedov, S. L.; Nikolskaya, A. N.; Rao, B. S.; Smirnov, S.; Sverdlov, A. V.; Vasudevan, S.; Wolf, Y. I.; Yin, J. J.; Natale, D. A., The COG database: an updated version includes eukaryotes, BMC Bioinform., 4, 41 (2003)
[36] Zeng, Y. H.; Guo, Y. Z.; Xiao, R. Q.; Yang, L.; Yu, L. Z.; Li, M. L., Using the augmented Chou’s pseudo-amino acid composition for predicting protein submitochondria locations based on auto-covariance approach, J. Theor. Biol., 259, 366-372 (2009) · Zbl 1402.92193
[37] Zhou, G. P., An intriguing controversy over protein structural class prediction, J. Protein Chem., 17, 729-738 (1998)
[38] Zouhal, L. M.; Denoeux, T., An evidence-theoretic K-NN rule with parameter optimization, IEEE Trans. Syst. Man Cybern., 28, 263-271 (1998)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.