×

Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC. (English) Zbl 1406.92212

Summary: Identifying the location of proteins in a cell plays an important role in understanding their functions, such as drug design, therapeutic target discovery and biological research. However, the traditional subcellular localization experiments are time-consuming, laborious and small scale. With the development of next-generation sequencing technology, the number of proteins has grown exponentially, which lays the foundation of the computational method for identifying protein subcellular localization. Although many methods for predicting subcellular localization of proteins have been proposed, most of them are limited to single-location. In this paper, we propose a multi-kernel SVM to predict subcellular localization of both multi-location and single-location proteins. First, we make use of the evolutionary information extracted from position specific scoring matrix (PSSM) and physicochemical properties of proteins, by Chou’s general PseAAC and other efficient functions. Then, we propose a multi-kernel support vector machine (SVM) model to identify multi-label protein subcellular localization. As a result, our method has a good performance on predicting subcellular localization of proteins. It achieves an average precision of 0.7065 and 0.6889 on two human datasets, respectively. All results are higher than those achieved by other existing methods. Therefore, we provide an efficient system via a novel perspective to study the protein subcellular localization.

MSC:

92C40 Biochemistry, molecular biology
68T05 Learning and adaptive systems in artificial intelligence
92C37 Cell biology
Full Text: DOI

References:

[1] Apweiler, R., Functional information in swiss-prot: the basis for large-scale characterisation of protein sequences, Brief. Bioinform., 2, 1, 9-18 (2001)
[2] Camon, E.; Magrane, M.; Barrell, D.; Binns, D.; Fleischmann, W.; Kersey, P.; Mulder, N.; Oinn, T.; Maslen, J.; Cox, A., The gene ontology annotation (goa) project: implementation of go in swiss-prot, trembl, and interpro, Genome Res., 13, 4, 662-672 (2003)
[3] Chen, J.; Tang, Y. Y.; Chen, C. L.; Fang, B.; Lin, Y.; Shang, Z., Multi-label learning with fuzzy hypergraph regularization for protein subcellular location prediction, IEEE Trans. Nanobiosci., 13, 4, 438-447 (2014)
[4] Chen, Z.; Zhao, P.; Li, F.; Leier, A.; Marquezlago, T. T.; Wang, Y.; Webb, G. I.; Smith, A. I.; Daly, R. J.; Chou, K. C., Ifeature: a python package and web server for features extraction and selection from protein and peptide sequences., Bioinformatics, 34, 14, 2499-2502 (2018)
[5] Cheng, X.; Xiao, X., Ploc-mvirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal go information into general pseaac, Gene, 628, 315-321 (2017)
[6] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-mplant: predict subcellular localization of multi-location plant proteins via incorporating the optimal go information into general pseaac, Mol. Biosyst., 13, 1722-1727 (2017)
[7] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key go information into general pseaac, Genomics, 110, 1, 50-58 (2018)
[8] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-mgneg: predict subcellular localization of gram-negative bacterial proteins by deep gene ontology learning via general pseaac, Genomics, 110, 4, 231-239 (2018)
[9] Cheng, X.; Xiao, X.; Chou, K. C., Ploc-mhum: predict subcellular localization of multi-location human proteins via general pseaac to winnow out the crucial go information, Bioinformatics, 34, 9, 1448-1456 (2018)
[10] Cheng, X.; Zhao, S. G.; Lin, W. Z.; Xiao, X.; Chou, K. C., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 22, 3524-3531 (2017)
[11] Cheng, X.; Zhao, S. G.; Xiao, X.; Chou, K. C., Iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 3, 341-346 (2017)
[12] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 3, 246-255 (2001)
[13] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 1, 236-247 (2011) · Zbl 1405.92212
[14] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 6, 1092-1100 (2013)
[15] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem. (Los Angeles), 11, 3, 218-234 (2015)
[16] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 21, 2337-2358 (2017)
[17] Chou, K. C.; Cai, Y. D., Using functional domain composition and support vector machines for prediction of protein subcellular location, J. Biol. Chem., 277, 48, 45765-45769 (2002)
[18] Chou, K. C.; Cai, Y. D., Prediction of protein subcellular locations by go-fund-PseAA predictor, Biochem. Biophys. Res. Commun., 320, 4, 1236-1239 (2004)
[19] Chou, K. C.; Cai, Y. D., Using go-pseaa predictor to predict enzyme sub-class, Biochem. Biophys. Res. Commun., 325, 2, 506-509 (2004)
[20] Chou, K. C.; Shen, H. B., Recent progress in protein subcellular location prediction., Anal. Biochem., 370, 1, 1-16 (2007)
[21] Chou, K. C.; Shen, H. B., Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms, Nat. Protoc., 3, 2, 153-162 (2008)
[22] Chou, K. C.; Shen, H. B., A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: euk-mploc 2.0, PLoS ONE, 5, 4, e9931 (2010)
[23] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS ONE, 6, 3, e18258 (2011)
[24] Chou, K. C.; Wu, Z. C.; Xiao, X., Iloc-hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites., Mol. Biosyst., 8, 2, 629-641 (2012)
[25] Cortes, C.; Mohri, M.; Rostamizadeh, A., Algorithms for learning kernels based on centered alignment, J. Mach. Learn. Res., 13, 2, 795-828 (2012) · Zbl 1283.68286
[26] Eisenhaber, F.; Bork, P., Wanted: subcellular localization of proteins based on sequence, Trends Cell Biol., 8, 4, 169-170 (1998)
[27] Fan, G. L.; Li, Q. Z., Predicting protein submitochondria locations by combining different descriptors into the general form of chou’s pseudo amino acid composition, Amino Acids, 43, 2, 545-555 (2012)
[28] Gasteiger, E.; Hoogland, C.; Gattiker, A.; Duvaud, S.; Wilkins, M. R.; Appel, R. D.; Bairoch, A., Protein identification and analysis tools on the expasy server, Methods Mol. Biol., 112, 112, 531 (1999)
[29] He, J.; Chang, S. F.; Xie, L., Fast kernel learning for spatial pyramid matching, Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 1-7 (2008)
[30] He, J.; Hong, G.; Liu, W., Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites, PLoS ONE, 7, 6, e37155 (2012)
[31] Hu, J.; Li, Y.; Zhang, M.; Yang, X.; Shen, H. B.; Yu, D. J., Predicting protein-dna binding residues by weightedly combining sequence-based features and boosting multiple svms, IEEE/ACM Trans. Comput. Biol. Bioinform., 14, 6, 1389-1398 (2017)
[32] Hu, Y.; Li, T.; Sun, J.; Tang, S.; Cong, P., Predicting gram-positive bacterial protein subcellular localization based on localization motifs, J. Theor. Biol., 308 (2012) · Zbl 1411.92078
[33] Jeong; Cheol, J.; Lin; Xiaotong; Chen; Xuewen, On position-specific scoring matrix for protein function prediction, IEEE/ACM Trans. Comput. Biol. Bioinf., 8, 2, 308-315 (2011)
[34] KC, C.; DW, E., Protein subcellular location prediction, Protein Eng., 12, 2, 107 (1999)
[35] Kuo-Chen, C.; Shen, H. B., Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization, PLoS ONE, 5, 6, e11335 (2010)
[36] Laarhoven, T. V.; Nabuurs, S. B.; Marchiori, E., Gaussian interaction profile kernels for predicting drugtarget interaction, Bioinformatics, 27, 21, 3036-3043 (2011)
[37] Li, F.; Li, C.; Marquezlago, T. T.; Leier, A.; Akutsu, T.; Purcell, A. W.; Smith, A. I.; Lithgow, T.; Daly, R. J.; Song, J., Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome., Bioinformatics (2018)
[38] Li, M.; Li, W.; Wu, F. X.; Pan, Y.; Wang, J., Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information, J. Theor. Biol., 447, 65-73 (2018)
[39] Lin, W. Z.; Fang, J. A.; Xiao, X.; Chou, K. C., Iloc-animal: a multi-label learning classifier for predicting subcellular localization of animal proteins., Mol. Biosyst., 9, 4, 634-644 (2013)
[40] Liu, L. M.; Xu, Y.; Chou, K. C., Ipgk-pseaac: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general pseaac., Med. Chem. (Los Angeles), 13, 6, 552-559 (2017)
[41] Lu, Y.; Wang, L.; Lu, J.; Yang, J.; Shen, C., Multiple kernel clustering based on centered kernel alignment, Pattern Recognit., 47, 11, 3656-3664 (2014) · Zbl 1373.68324
[42] Nakai, K., Protein sorting signals and prediction of subcellular localization, Adv. Protein Chem., 54, 54, 277-344 (2000)
[43] Nakashima, H.; Nishikawa, K., Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies., J. Mol. Biol., 238, 1, 54 (1994)
[44] Nanni, L.; Brahnam, S.; Lumini, A., Wavelet images and chou’s pseudo amino acid composition for protein classification, Amino Acids, 43, 2, 657-665 (2012)
[45] Nanni, L.; Lumini, A.; Brahnam, S., An empirical study of different approaches for protein classification, Sci.World J., 2014, 62, 236717 (2014)
[46] Pan, G.; Jiang, L.; Tang, J.; Guo, F., A novel computational method for detecting dna methylation sites with dna sequence information and physicochemical properties, Int. J. Mol. Sci., 19, 2, 511 (2018)
[47] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, Z. C.; Jia, J. H.; Chou, K. C., Ikcr-pseens: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics, 110, 239-246 (2017)
[48] Shen, H. B., Recent advances in developing web-servers for predicting protein attributes, Mol. Biosyst., 9, 1092-1100 (2013)
[49] Shen, H. B.; Chou, K. C., Gpos-mploc: a top-down approach to improve the quality of predicting subcellular localization of gram-positive bacterial proteins, Protein Pept. Lett., 16, 12, 1478-1484 (2009)
[50] Shen, H. B.; Chou, K. C., A top-down approach to enhance the power of predicting human protein subcellular localization: hum-mploc 2.0, Anal. Biochem., 394, 2, 269-274 (2009)
[51] Shen, H. B.; Chou, K. C., Gneg-mploc: a top-down strategy to enhance the quality of predicting subcellular localization of gram-negative bacterial proteins, J. Theor. Biol., 264, 2, 326-333 (2010) · Zbl 1406.92211
[52] Shen, H. B.; Chou, K. C., Virus-mploc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites, J. Biomol. Struct. Dyn., 28, 2, 175-186 (2010)
[53] Stormo, G. D.; Schneider, T. D.; Gold, L.; Ehrenfeucht, A., Use of the ’perceptron’ algorithm to distinguish translational initiation sites in e. coli., Nucl. Acids Res., 10, 9, 2997-3011 (1982)
[54] Su, Z. D.; Huang, Y.; Zhang, Z. Y.; Zhao, Y. W.; Wang, D.; Chen, W.; Chou, K. C.; Lin, H., Iloc-lncrna: predict the subcellular location of lncrnas by incorporating octamer composition into general pseknc, Bioinformatics (2018)
[55] Uddin, M. R.; Sharma, A.; Farid, D. M.; Rahman, M. M.; Dehzangi, A.; Shatabda, S., Evostruct-sub: an accurate gram-positive protein subcellular localization predictor using evolutionary and structural features, J. Theor. Biol., 443, 138-146 (2018)
[56] Wang, J.; Yang, B.; Leier, A.; Marquez-Lago, T. T.; Hayashida, M.; Rocker, A.; Yanju, Z.; Akutsu, T.; Chou, K. C.; Strugnell, R. A., Bastion6: a bioinformatics approach for accurate prediction of type vi secreted effectors, Bioinformatics, 34, 15, 2546-2555 (2018)
[57] Wang, X.; Li, G. Z.; Lu, W. C., Virus-ecc-mploc: a multi-label predictor for predicting the subcellular localization of virus proteins with both single and multiple sites based on a general form of Chou’s pseudo amino acid composition., Protein Pept. Lett., 20, 3, 309-317 (2013)
[58] Wang, X.; Zhang, W. W.; Li, G. Z., Multip-schlo: multi-label protein subchloroplast localization prediction with chous pseudo amino acid composition and a novel multi-label classifier, Bioinformatics, 31, 16, 2639-2645 (2015)
[59] Wang, Y.; Ding, Y.; Guo, F.; Wei, L.; Tang, J., Improved detection of dna-binding proteins via compression technology on PSSM information, PLoS ONE, 12, 9, e0185587 (2017)
[60] Wei, L.; Liao, M.; Gao, X.; Wang, J.; Lin, W., mgof-loc: a novel ensemble learning method for human protein subcellular localization prediction, Neurocomputing, 217, 73-82 (2016)
[61] Wu, Z. C.; Xiao, X.; Chou, K. C., Iloc-plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites., Mol. Biosyst., 7, 12, 3287-3297 (2011)
[62] Wu, Z. C.; Xiao, X.; Chou, K. C., Iloc-gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex gram-positive bacterial proteins., Protein Pept. Lett., 19, 1, 4-14 (2012)
[63] Xiang, C.; Zhao, S. G.; Xuan, X.; Chou, K. C., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, 8, 35, 58494-58503 (2017)
[64] Xiao, X.; Cheng, X.; Su, S. C.; Chou, K. C., Ploc-mgpos: incorporate key gene ontology information into general pseaac for predicting subcellular localization of gram-positive bacterial proteins, Nat. Sci. (Irvine), 9, 9, 331-349 (2017)
[65] Xiao, X.; Wu, Z. C.; Chou, K. C., Iloc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites., J. Theor. Biol., 284, 1, 42-51 (2011) · Zbl 1397.92238
[66] Zhang, M. L.; Zhou, Z. H., Ml-knn: a lazy learning approach to multi-label learning, Pattern Recognit., 40, 7, 2038-2048 (2007) · Zbl 1111.68629
[67] Zhang, M. L.; Zhou, Z. H., A review on multi-label learning algorithms, IEEE Trans. Knowl. Data Eng., 26, 8, 1819-1837 (2014)
[68] Zhang, W.; Zhu, X.; Fu, Y.; Tsuji, J.; Weng, Z., Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods, BMC Bioinform., 18, Suppl 13, 464 (2017)
[69] Zhang, W.; Zhu, X.; Fu, Y.; Tsuji, J.; Weng, Z., The prediction of human splicing branchpoints by multi-label learning, IEEE International Conference on Bioinformatics and Biomedicine, 254-259 (2017)
[70] Zhou, G.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins Struct. Funct. Bioinform., 50, 1, 44-48 (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.