×

iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. (English) Zbl 1397.92238

Summary: In the last two decades or so, although many computational methods were developed for predicting the subcellular locations of proteins according to their sequence information, it is still remains as a challenging problem, particularly when the system concerned contains both single- and multiple-location proteins. Also, among the existing methods, very few were developed specialized for dealing with viral proteins, those generated by viruses. Actually, knowledge of the subcellular localization of viral proteins in a host cell or virus-infected cell is very important because it is closely related to their destructive tendencies and consequences. In this paper, by introducing the “multi-label scale” and by hybridizing the gene ontology information with the sequential evolution information, a predictor called iLoc-Virus is developed. It can be utilized to identify viral proteins among the following six locations: (1) viral capsid, (2) host cell membrane, (3) host endoplasmic reticulum, (4) host cytoplasm, (5) host nucleus, and (6) secreted. The iLoc-Virus predictor not only can more accurately predict the location sites of viral proteins in a host cell, but also have the capacity to deal with virus proteins having more than one location. As a user-friendly web-server, iLoc-Virus is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Virus. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user’s convenience, the iLoc-Virus web-server also has the function to accept the batch job submission. It is anticipated that iLoc-Virus may become a useful high throughput tool for both basic research and drug development.

MSC:

92C40 Biochemistry, molecular biology
92C37 Cell biology
68T05 Learning and adaptive systems in artificial intelligence
92-08 Computational methods for problems pertaining to biology
Full Text: DOI

References:

[1] Altschul, S.F., Evaluating the statistical significance of multiple distinct local alignments, (), 1-14
[2] Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; Harris, M.A.; Hill, D.P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J.C.; Richardson, J.E.; Ringwald, M.; Rubin, G.M.; Sherlock, G., Gene ontology: tool for the unification of biology, Nat. genet., 25, 25-29, (2000)
[3] Camon, E.; Magrane, M.; Barrell, D.; Lee, V.; Dimmer, E.; Maslen, J.; Binns, D.; Harte, N.; Lopez, R.; Apweiler, R., The gene ontology annotation (GOA) database: sharing knowledge in uniprot with gene ontology, Nucleic acids res., 32, D262-6, (2004)
[4] Camon, E.; Magrane, M.; Barrell, D.; Binns, D.; Fleischmann, W.; Kersey, P.; Mulder, N.; Oinn, T.; Maslen, J.; Cox, A.; Apweiler, R., The gene ontology annotation (GOA) project: implementation of GO in SWISS-PROT, trembl, and interpro, Genome res., 13, 662-672, (2003)
[5] Cedano, J.; Aloy, P.; P’erez-Pons, J.A.; Querol, E., Relation between amino acid composition and cellular location of proteins, J. mol. biol., 266, 594-600, (1997)
[6] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein pept. lett., 16, 27-31, (2009)
[7] Chen, Y.L.; Li, Q.Z., Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition, J. theor. biol., 248, 377-381, (2007) · Zbl 1451.92113
[8] Chou, K.C., Graphic rules in steady and non-steady enzyme kinetics, J. biol. chem., 264, 12074-12079, (1989)
[9] Chou, K.C., The convergence-divergence duality in lectin domains of the selectin family and its implications, FEBS lett., 363, 123-126, (1995)
[10] Chou, K.C., A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space, Proteins: struct. funct. genet., 21, 319-344, (1995)
[11] Chou, K.C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins: struct. funct. genet., 43, 246-255, (2001), (Erratum: ibid., 2001, vol. 44, 60)
[12] Chou, K.C., Review: structural bioinformatics and its impact to biomedical science, Curr. med. chem., 11, 2105-2134, (2004)
[13] Chou, K.C., Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review), J. theor. biol., 273, 236-247, (2011) · Zbl 1405.92212
[14] Chou, K.C.; Zhang, C.T., Predicting protein folding types by distance functions that make allowances for amino acid interactions, J. biol. chem., 269, 22014-22020, (1994)
[15] Chou, K.C.; Zhang, C.T., Review: prediction of protein structural classes, Crit. rev. biochem. mol. biol., 30, 275-349, (1995)
[16] Chou, K.C.; Elrod, D.W., Protein subcellular location prediction, Protein eng., 12, 107-118, (1999)
[17] Chou, K.C.; Shen, H.B., Review: recent progresses in protein subcellular location prediction, Anal. biochem., 370, 1-16, (2007)
[18] Chou, K.C.; Shen, H.B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nat. protocols, 3, 153-162, (2008)
[19] Chou, K.C.; Shen, H.B., Protident: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information, Biochem. biophys. res. commun., 376, 321-325, (2008)
[20] Chou, K.C.; Shen, H.B., Cell-ploc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. sci., 2, 1090-1103, (2010), (openly accessible at 〈http://www.scirp.org/journal/NS/〉)
[21] Chou, K.C.; Shen, H.B., Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization, Plos one, 5, e11335, (2010)
[22] Chou, K.C.; Shen, H.B., A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: euk-mploc 2.0, Plos one, 5, e9931, (2010)
[23] Ding, H.; Luo, L.; Lin, H., Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition, Protein pept. lett., 16, 351-355, (2009)
[24] Ding, Y.S.; Zhang, T.L., Using Chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier, Pattern recognition lett., 29, 1887-1892, (2008)
[25] Du, P.; Cao, S.; Li, Y., Subchlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm, J. theor. biol., 261, 330-335, (2009) · Zbl 1403.92063
[26] Emanuelsson, O.; Nielsen, H.; Brunak, S.; von Heijne, G., Predicting subcellular localization of proteins based on their N-terminal amino acid sequence, J. mol. biol., 300, 1005-1016, (2000)
[27] Esmaeili, M.; Mohabatkar, H.; Mohsenzadeh, S., Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses, J. theor. biol., 263, 203-209, (2010) · Zbl 1406.92455
[28] Gao, Q.B.; Jin, Z.C.; Ye, X.F.; Wu, C.; He, J., Prediction of nuclear receptors with optimal pseudo amino acid composition, Anal. biochem., 387, 54-59, (2009)
[29] Gardy, J.L.; Laird, M.R.; Chen, F.; Rey, S.; Walsh, C.J.; Ester, M.; Brinkman, F.S., Psortb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis, Bioinformatics, 21, 617-623, (2005)
[30] Georgiou, D.N.; Karakasidis, T.E.; Nieto, J.J.; Torres, A., Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition, J. theor. biol., 257, 17-26, (2009) · Zbl 1400.92393
[31] Gerstein, M.; Thornton, J.M., Sequences and topology, Curr. opin. struct. biol., 13, 341-343, (2003)
[32] Glory, E.; Murphy, R.F., Automated subcellular location determination and high-throughput microscopy, Dev. cell, 12, 7-16, (2007)
[33] Guo, J.; Rao, N.; Liu, G.; Yang, Y.; Wang, G., Predicting protein folding rates using the concept of Chou’s pseudo amino acid composition, Journal of computational chemistry, 32, 1612-1617, (2011)
[34] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. theor. biol., 271, 10-17, (2011) · Zbl 1405.92217
[35] Jahandideh, S.; Hoseini, S.; Jahandideh, M.; Hoseini, A.; Disfani, F.M., Gamma-turn types prediction in proteins using the two-stage hybrid neural discriminant model, J. theor. biol., 259, 517-522, (2009) · Zbl 1402.92326
[36] Kandaswamy, K.K.; Pugalenthi, G.; Moller, S.; Hartmann, E.; Kalies, K.U.; Suganthan, P.N.; Martinetz, T., Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition, Protein pept. lett., 17, 1473-1479, (2010)
[37] Kannan, S.; Hauth, A.M.; Burger, G., Function prediction of hypothetical proteins without sequence similarity to proteins of known function, Protein pept. lett., 15, 1107-1116, (2008)
[38] Li, F.M.; Li, Q.Z., Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach, Protein pept. lett., 15, 612-616, (2008)
[39] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. theor. biol., 252, 350-356, (2008) · Zbl 1398.92076
[40] Lin, H.; Ding, H., Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition, J. theor. biol., 269, 64-69, (2011) · Zbl 1307.92080
[41] Liu, T.; Zheng, X.; Wang, C.; Wang, J., Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: an approach from auto covariance transformation, Protein pept. lett., 17, 1263-1269, (2010)
[42] Loewenstein, Y.; Raimondo, D.; Redfern, O.C.; Watson, J.; Frishman, D.; Linial, M.; Orengo, C.; Thornton, J.; Tramontano, A., Protein function annotation by homology-based inference, Genome biol., 10, 207, (2009)
[43] Mahalanobis, P.C., On the generalized distance in statistics, Proc. natl. inst. sci. India, 2, 49-55, (1936) · Zbl 0015.03302
[44] Mardia, K.V.; Kent, J.T.; Bibby, J.M., Multivariate analysis: chapter 11 discriminant analysis; chapter 12 multivariate analysis of variance; chapter 13 cluster analysis, (1979), Academic Press London, (pp. 322-381)
[45] Mohabatkar, H., Prediction of cyclin proteins using Chou’s pseudo amino acid composition, Protein pept. lett., 17, 1207-1214, (2010)
[46] Mondal, S.; Bhavna, R.; Mohan Babu, R.; Ramakumar, S., Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification, J. theor. biol., 243, 252-260, (2006) · Zbl 1447.92309
[47] Nakai, K., Protein sorting signals and prediction of subcellular localization, Adv. protein chem., 54, 277-344, (2000)
[48] Nakai, K.; Kanehisa, M., Expert system for predicting protein localization sites in Gram-negative bacteria, Proteins: struct. funct. genet., 11, 95-110, (1991)
[49] Nakashima, H.; Nishikawa, K., Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J. mol. biol., 238, 54-61, (1994)
[50] Nanni, L.; Lumini, A., Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization, Amino acids, 34, 653-660, (2008)
[51] Pan, Y.X.; Zhang, Z.Z.; Guo, Z.M.; Feng, G.Y.; Huang, Z.D.; He, L., Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach, J. protein chem., 22, 395-402, (2003)
[52] Park, K.J.; Kanehisa, M., Prediction of protein subcellular locations by support vector machines using compositions of amino acid and amino acid pairs, Bioinformatics, 19, 1656-1663, (2003)
[53] Pillai, K.C.S., Mahalanobis D2, (), 176-181, (This reference also presents a brief biography of Mahalanobis who was a man of great originality and who made considerable contributions to statistics)
[54] Qiu, J.D.; Huang, J.H.; Shi, S.P.; Liang, R.P., Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform, Protein pept. lett., 17, 715-722, (2010)
[55] Reinhardt, A.; Hubbard, T., Using neural networks for prediction of the subcellular location of proteins, Nucleic acids res., 26, 2230-2236, (1998)
[56] Sahu, S.S.; Panda, G., A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction, Comput. biol. chem., 34, 320-327, (2010) · Zbl 1403.92221
[57] Schaffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F., Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic acids res., 29, 2994-3005, (2001)
[58] Shen, H.B.; Chou, K.C., Virus-ploc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells, Biopolymers, 85, 233-240, (2007)
[59] Shen, H.B.; Chou, K.C., Predicting protein fold pattern with functional domain and sequential evolution information, J. theor. biol., 256, 441-446, (2009) · Zbl 1400.92413
[60] Shen, H.B.; Chou, K.C., Quatident: a web server for identifying protein quaternary structural attribute by fusing functional domain and sequential evolution information, J. proteome res., 8, 1577-1584, (2009)
[61] Shen, H.B.; Chou, K.C., Gneg-mploc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins, J. theor. biol., 264, 326-333, (2010) · Zbl 1406.92211
[62] Shen, H.B.; Chou, K.C., Virus-mploc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites, J. biomol. struct. dyn., 28, 175-186, (2010)
[63] Small, I.; Peeters, N.; Legeai, F.; Lurin, C., Predotar: a tool for rapidly screening proteomes for N-terminal targeting sequences, Proteomics, 4, 1581-1590, (2004)
[64] Smith, C., 2008. Subcellular targeting of proteins and drugs. 〈http://www.biocompare.com/Articles/TechnologySpotlight/976/Subcellular-Targeting-Of-Proteins-And-Drugs.html〉; Smith, C., 2008. Subcellular targeting of proteins and drugs. 〈http://www.biocompare.com/Articles/TechnologySpotlight/976/Subcellular-Targeting-Of-Proteins-And-Drugs.html〉
[65] Wong, J.H.; Ng, T.B., Studies on an antifungal protein and a chromatographically and structurally related protein isolated from the culture broth of bacillus amyloliquefaciens, Protein pept. lett., 16, 1399-1406, (2009)
[66] Wootton, J.C.; Federhen, S., Statistics of local complexity in amino acid sequences and sequence databases, Comput. chem., 17, 149-163, (1993) · Zbl 0825.92102
[67] Xiao, X.; Wang, P.; Chou, K.C., Quat-2L: a web-server for predicting protein quaternary structural attributes, Mol. diversity, 15, 149-155, (2011)
[68] Xiao, X.; Wang, P.; Chou, K.C., GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions, Mol. biosyst., 7, 911-919, (2011)
[69] Xiao, X.; Shao, S.H.; Huang, Z.D.; Chou, K.C., Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, J. comput. chem., 27, 478-482, (2006)
[70] Xiao, X.; Shao, S.H.; Ding, Y.S.; Huang, Z.D.; Chou, K.C., Using cellular automata images and pseudo amino acid composition to predict protein subcellular location, Amino acids, 30, 49-54, (2006)
[71] Yu, L.; Guo, Y.; Li, Y.; Li, G.; Li, M.; Luo, J.; Xiong, W.; Qin, W., Secretp: identifying bacterial secreted proteins by fusing new features into Chou’s pseudo-amino acid composition, J. theor. biol., 267, 1-6, (2010) · Zbl 1410.92040
[72] Zeng, Y.H.; Guo, Y.Z.; Xiao, R.Q.; Yang, L.; Yu, L.Z.; Li, M.L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. theor. biol., 259, 366-372, (2009) · Zbl 1402.92193
[73] Zhang, G.Y.; Fang, B.S., Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo amino acid composition, J. theor biol., 253, 310-315, (2008)
[74] Zhang, G.Y.; Li, H.C.; Gao, J.Q.; Fang, B.S., Predicting lipase types by improved Chou’s pseudo-amino acid composition, Protein pept. lett., 15, 1132-1137, (2008)
[75] Zhou, G.P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins: struct. funct. genet., 50, 44-48, (2003)
[76] Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. theor. biol., 248, 546-551, (2007) · Zbl 1451.92245
[77] Zou, D.; He, Z.; He, J.; Xia, Y., Supersecondary structure prediction using Chou’s pseudo amino acid composition, J. comput. chem., 32, 271-278, (2011)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.