×

Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou’s pseudo-amino acid composition. (English) Zbl 1397.92228

Summary: Mitochondrion is important organelle of most eukaryotes and play an important role in participating in various life activities of cells. However, some functions of mitochondria can only be achieved in specific submitochondrial location, the study of submitochondrial locations will help to further understand the biological function of protein, which is a hotspot in proteomics research. In this paper, we propose a new method for protein submitochondrial locations prediction. Firstly, the features of protein sequence are extracted by combining Chou’s pseudo-amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM). Then the extracted feature information is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict the protein submitochondrial locations. We obtained the ideal prediction results by jackknife test and compared with other prediction methods. The results indicate that the proposed method is significantly better than the existing research results, which can provide a new method to predict protein locations in other organelles. The source code and all datasets are available at https://github.com/QUST-BSBRC/PseAAC-PsePSSM-WD/ for academic use.

MSC:

92C40 Biochemistry, molecular biology
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Ahmad, K.; Waris, M.; Hayat, M., Prediction of protein submitochondrial locations by incorporating dipeptide composition into Chou’s general pseudo amino acid composition, J. Membr. Biol., 249, 1-12, (2016)
[2] Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Miller, W.; Lipman, D. J., Gappe BLAST and PSI-BLAST: a new general database search program, Nucleic Acids Res., 25, 3389-3402, (1997)
[3] Ashbumer, M.; Catherine, A. B.; Judith, A. B.; David, B.; Heather, B.; Michael, J. C.; Allan, P. D.; Kara, D.; Selina, S. D.; Janan, T. E.; Midori, A. H.; David, P. H.; Laurie, I. T.; Andrew, K.; Suzanna, L.; John, C. M.; Joel, E. R.; Martin, R.; Gerald, M. R.; Gavin, S., Gene ontology: tool for the unification of biology, Nat. Genet., 25, 25-29, (2000)
[4] Bhasin, M.; Raghava, G. P., Eslpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST, Nucleic Acids Res., 32, 414-419, (2004)
[5] Burbulla, L. F.; Song, P.; Mazzulli, J. R.E.; Zampese, E.; Wong, Y. C.; Jeon, S.; Santos, D. P.; Blanz, J.; Obermaier, C. D.; Strojny, C.; Savas, J. N.; Kiskinis, E.; Zhuang, X.; Krüger, R.; Surmeier, D. J.; Krainc, D., Dopamine oxidation mediates mitochondrial and lysosomal dysfunction in Parkinson’s disease, Science, 357, 1255-1261, (2017)
[6] Cai, Y. D.; Chou, K. C., Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins, Biochem. Biophys. Res. Commun., 4, 172-173, (2000)
[7] Cai, Y. D.; Chou, K. C., Predicting enzyme subclass by functional domain composition and pseudo amino acid composition, J. Proteome Res., 4, 967-971, (2005)
[8] Caragea, C.; Caragea, D.; Silvescu, A.; Honavar, V., Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models, BMC Bioinf., 11, 1-13, (2010)
[9] Cedano, J.; Aloy, P.; Perez-Pons, J. A.; Querol, E., Relation between amino acid composition and cellular location of proteins, J. Mol. Biol., 266, 594-600, (1997)
[10] Chang, S. G.; Yu, B.; Vetterli, M., Adaptive wavelet thresholding for image denoising and compression, IEEE Trans. Image Process., 9, 1532-1546, (2000) · Zbl 0962.94028
[11] Chang, S. G.; Yu, B.; Vetterli, M., Spatially adaptive wavelet thresholding with context modeling for image denoising, IEEE Trans. Image Process., 9, 1522-1531, (2000) · Zbl 0962.94027
[12] Chang, C. C.; Lin, C. J., LIBSVM: a library for support machines, ACM Trans. Intell. Syst. Technol., 2, 1-27, (2011)
[13] Chen, W.; Feng, P. M.; Lin, H., Irspot-psednc: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, e68, (2013)
[14] Chen, W.; Feng, P.; Ding, H.; Lin, H., Irna-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition, Anal. Biochem., 490, 26-33, (2015)
[15] Chen, W.; Feng, P.; Ding, H., Using deformation energy to analyze nucleosome positioning in genomes, Genomics, 107, 69-75, (2016)
[16] Chen, W.; Tang, H.; Ye, J.; Lin, H., Irna-pseu: identifying RNA pseudouridine sites, Mol. Ther. Nucleic Acids, 5, e332, (2016)
[17] Chen, W.; Feng, P.; Yang, H.; Ding, H., Irna-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 4208-4217, (2017)
[18] Cheng, X.; Xiao, X., Ploc-mplant: predict subcellular localization of multi-location plant proteins via incorporating the optimal GO information into general pseaac, Mol. Biosyst., 13, 1722-1727, (2017)
[19] Cheng, X.; Xiao, X., Ploc-mvirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general pseaac, Gene, 628, 315-321, (2017)
[20] Cheng, X.; Xiao, X., Ploc-mgneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general pseaac, Genomics., (2017)
[21] Cheng, X.; Xiao, X., Ploc-mhum: predict subcellular localization of multi-location human proteins via general pseaac to winnow out the crucial GO information, Bioinformatics, 11, 1-9, (2017)
[22] Cheng, X.; Xiao, X., Ploc-meuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general pseaac, Genomics, 110, 50-58, (2018)
[23] Cheng, X.; Zhao, S. G., Iatc-misf: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346, (2017)
[24] Cheng, X.; Zhao, S. G., Iatc-mhyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals, Oncotarget, 8, 58494-58503, (2017)
[25] Cheng, X.; Zhao, S. G.; Lin, W. Z., Ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 3524-3531, (2017)
[26] Chou, K. C., Using subsite coupling to predict signal peptides, Protein Eng, 14, 75-79, (2001)
[27] Chou, K. C., Prediction of protein cellular attributes using pseudo amino acid composition, Proteins Struct. Funct. Bioinform., 43, 246-255, (2001)
[28] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19, (2005)
[29] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247, (2011) · Zbl 1405.92212
[30] Chou, K. C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. Biosyst., 9, 1092-1100, (2013)
[31] Chou, K. C., Impacts of bioinformatics to medicinal chemistry, Med. Chem., 11, 218-234, (2015)
[32] Chou, K. C., An unprecedented revolution in medicinal chemistry driven by the progress of biological science, Curr. Top. Med. Chem., 17, 2337-2358, (2017)
[33] Chou, K. C.; Cai, Y. D., A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology, Biochem. Biophys. Res. Commun., 311, 743-747, (2003)
[34] Chou, K. C.; Elrod, D. W., Using discriminant function for prediction of subcellular location of prokaryotic proteins, Biochem. Biophs. Res. Commun., 252, 63-68, (1998)
[35] Chou, K. C.; Shen, H. B., Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers, J. Proteome Res., 5, 1888-1897, (2006)
[36] Chou, K. C.; Shen, H. B., Memtype-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through pse-PSSM, Biochem. Biophys. Res. Commun., 360, 339-345, (2007)
[37] Chou, K. C.; Shen, H. B., Recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16, (2007)
[38] Chou, K. C.; Shen, H. B., Recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63-92, (2009)
[39] Chou, K. C.; Jia, J.; Liu, Z.; Xiao, X.; Liu, B., Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition, J. Biomol. Struct. Dyn., 34, 1946-1961, (2016)
[40] Deng, M.; Tu, Z.; Sun, F.; Chen, T., Mapping gene ontology to proteins based on protein-protein interaction data, Bioinformatics, 20, 895-902, (2004)
[41] Ding, C. H.; Dubchak, I., Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics, 17, 349-358, (2001)
[42] Ding, H.; Guo, S. H.; Deng, E. Z.; Yuan, L. F.; Guo, F. B.; Huang, J.; Rao, N. N.; Chen, W.; Lin, H., Prediction of golgi-resident protein types by using feature selection technique, Chemom. Intell. Lab. Syst., 124, 9-13, (2013)
[43] Du, P. F.; Jiao, Y. S., Predicting protein submitochondrial locations by incorporating the positional-specific physicochemical properties into Chou’s general pseudo-amino acid compositions, J. Theor. Biol., 416, 81-87, (2017)
[44] Du, P. F.; Li, Y. D., Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence, BMC Bioinf., 7, 1-8, (2006)
[45] Du, P.; Tian, Y.; Yan, Y., Subcellular localization prediction for human internal and organelle membrane proteins with projected gene ontology scores, J. Theor. Biol., 313, 61-67, (2012)
[46] Du, P. F.; Yu, Y., Submito-PSPCP: predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions, Biomed. Res. Int., 3, (2013)
[47] Ehsan, A.; Mahmood, K.; Khan, Y. D.; Khan, S. A., A novel modeling in mathematical biology for classification of signal peptides, Sci. Rep., 8, 1039, (2018)
[48] Fan, G. L.; Li, Q. Z., Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition, Amino Acids, 43, 545-555, (2012)
[49] Feng, P.; Ding, H.; Yang, H.; Chen, W., Irna-psecoll: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into pseknc, Mol. Ther. Nucleic Acids, 7, 155-163, (2017)
[50] Feng, P.; Yang, H.; Ding, H.; Lin, H., Idna6ma-pseknc: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into pseknc, Genomics, (2018)
[51] Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F., An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes, Pattern Recognit., 44, 1761-1776, (2011)
[52] Höglund, A.; Dönnes, P.; Blum, T.; Adolph, H. W.; Kohlbacher, O., Multiloc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition, Bioinformatics, 22, 1158-1165, (2006)
[53] Huang, Y.; Li, Y., Prediction of protein subcellular locations using fuzzy k-NN method, Bioinformatics, 20, 21-28, (2004)
[54] Jia, J.; Liu, Z.; Xiao, X., Ippi-esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac, J. Theor. Biol., 377, 47-56, (2015)
[55] Jia, J.; Liu, Z.; Xiao, X.; Liu, B., Isuc-pseopt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56, (2016)
[56] Jones, D. T., Protein prediction based on position-specific scoring matrices, J. Mol. Biol., 292, 195-202, (1999)
[57] Kabir, M.; Yu, D. J., Predicting dnase I hypersensitive sites via un-biased pseudo trinucleotide composition, Chemom. Intell. Lab. Syst., 167, 78-84, (2017)
[58] Kandaswamy, A.; Kumar, C. S.; Ramanathan, R. P.; Jayaraman, S.; Malmurugan, N., Neural classification of lung sounds using wavelet coefficients, Comput. Biol. Med., 34, 523-537, (2004)
[59] King, B. R.; Vural, S.; Pandey, S.; Barteau, A.; Guda, C., Ngloc: software and web server for predicting protein subcellular localization in prokaryotes and eukaryotes, BMC Res. Notes, 5, 1-7, (2012)
[60] Lin, H.; Chen, W.; Yuan, L. F.; Li, Z. Q.; Hui, D., Using over-represented tetrapeptides to predict protein submitochondria locations, Acta Biotheor., 61, 259-268, (2013)
[61] Lin, H.; Li, Q. Z., Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components, J. Comput. Chem., 28, 1463-1466, (2007)
[62] Lin, H.; Wu, Y.; Tang, H.; Chen, W., Predicting human enzyme family classes by using pseudo amino acid composition, Curr. Proteom., 13, 99-104, (2016)
[63] Li, W. C.; Deng, E. Z.; Ding, H.; Chen, W.; Lin, H., Iori-pseknc: a predictor for identifying origin of replication with pseudo k-tuple nucleotide composition, chemom, Intell. Lab. Syst., 141, 100-106, (2015)
[64] Liò, P., Wavelets in bioinformatics and computational biology: state of art and perspectives, Bioinformatics, 19, 2-9, (2003)
[65] Liu, B.; Fang, L.; Liu, F.; Wang, X., Identification of real microrna precursors with a pseudo structure status composition approach, PLoS ONE, 10, (2015)
[66] Liu, B.; Fang, L.; Long, R.; Lan, X., Ienhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition, Bioinformatics, 32, 362-369, (2016)
[67] Liu, B.; Liu, F.; Wang, X. L.; Chen, J. J.; Fang, L. Y.; Chou, K. C., Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, 65-71, (2015)
[68] Liu, B.; Long, R., Idhs-EL: identifying dnase I hypersensi-tivesites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32, 2411-2418, (2016)
[69] Liu, B.; Wang, S.; Long, R.; Chou, K. C., Irspot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41, (2017)
[70] Liu, B.; Wu, H.; Chou, K. C., Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci., 9, 67-91, (2017)
[71] Liu, B.; Yang, F., 2L-pirna: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids., 7, 267-277, (2017)
[72] Liu, B.; Yang, F.; Huang, D. S., Ipromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based pseknc, Bioinformatics, 34, 33-40, (2018)
[73] Liu, L. M.; Xu, Y., Ipgk-pseaac: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general pseaac, Med. Chem., 13, 552-559, (2017)
[74] Luisier, F.; Blu, T.; Unser, M., A new sure approach to image denoising: interscale orthonormal wavelet thresholding, IEEE Trans. Image Process, 16, 593-606, (2007)
[75] Mariana, R. V.L. H., Animal mitochondrion diagram, (2011), https://commons.wikimedia.org/wiki/File:Animal_mitochondrion_diagram_zh_ml.svg
[76] Mei, S.; Wang, F., Amino acid classification based spectrum kernel fusion for protein subnuclear localization, BMC Bioinf., 11, 1-8, (2010)
[77] Nakashima, H.; Nishikawa, K., Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies, J. Mol. Biol., 238, 54-61, (1994)
[78] Nanni, L.; Lumini, A., Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria location, Amino Acids, 34, 653-660, (2008)
[79] Qi, H.; Casalena, G.; Shi, S.; Yu, L.; Ebefors, K.; Sun, Y.; Zhang, W.; D’Agati, V.; Schlondorff, D.; Haraldsson, B.; Böttinger, E.; Daehn, I., Glomerular endothelial mitochondrial dysfunction is essential and characteristic of diabetic kidney disease susceptibility, Diabetes, 66, 763-778, (2017)
[80] Qiu, W. R.; Jiang, S. Y.; Sun, B. Q., Irna-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general pseknc and ensemble classifier, Med. Chem., 13, 734-743, (2017)
[81] Qiu, W. R.; Sun, B. Q., Iptm-mlys: identifying multiple lysine PTM sites and their different types, Bioinformatics, 32, 3116-3123, (2016)
[82] Qiu, W. R.; Sun, B. Q.; Xiao, X.; Xu, Z. C., Ikcr-pseens: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics, (2017)
[83] Saidijam, M.; Azizpour, S.; Patching, S. G., Amino acid composition analysis of human secondary transport proteins and implications for reliable membrane topology prediction, J. Biomol. Struct. Dyn., 35, 929-949, (2017)
[84] Shen, H. B.; Chou, K. C., Nuc-ploc: a new web-server for predicting protein subnuclear localization by fusing pseaa composition and psepssm, Protein Eng. Des. Sel., 20, 561-567, (2007)
[85] Shen, H. B.; Chou, K. C., Pseaac: a flexible web server for generating various kinds of protein pseudo amino acid composition, Anal. Biochem., 373, 386-388, (2008)
[86] Shen, H. B.; Yang, J.; Chou, K. C., Fuzzy KNN for predicting membrane protein types from pseudo amino acid composition, J. Theor. Biol., 240, 9-13, (2006) · Zbl 1447.92312
[87] Shi, S. P.; Qiu, J. D.; Sun, X. Y.; Huang, J. H.; Huang, S. Y.; Suo, S. B.; Liang, R. P.; Zhang, L., Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction, BBA-Mol. Cell Res., 1813, 424-430, (2011)
[88] Uddin, M. R.; Sharma, A.; Farid, D. M.; Rahman, M. M.; Dehzangi, R.; Shatabda, S., Evostruct-sub: an accurate Gram-positive protein subcellular localization predictor using evolutionary and structural features, J. Theor. Biol., 443, 138-146, (2018)
[89] Vapnik, V. N., The nature of statistical learning theory, (1995), Springer New York · Zbl 0833.62008
[90] Xiang, Q. L.; Liao, B.; Li, X. H.; Xu, H. M.; Chen, J.; Shi, Z. X.; Dai, Q.; Yao, Y. H., Subcellular localization prediction of apoptosis proteins based on evolutionary information and support vector machine, Artif. Intell. Med., 78, 41-46, (2017)
[91] Xiao, X.; Cheng, X.; Su, S.; Nao, Q., Ploc-mgpos: incorporate key gene ontology information into general pseaac for predicting subcellular localization of Gram-positive bacterial proteins, Nat. Sci., 9, 331-349, (2017)
[92] Xu, Y.; Ding, J.; Wu, L. Y., Isno-pseaac: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition, PLoS One, 8, e55844, (2013)
[93] Xu, Y.; Li, C., Ipreny-pseaac: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into pseaac, Med. Chem., 13, 544-551, (2017)
[94] Yu, B.; Zhang, Y., The analysis of colon cancer gene expression profiles and the extraction of informative genes, J. Comput. Theor. Nanosci., 10, 1097-1103, (2013)
[95] Yu, B.; Zhang, Y., A simple method for predicting transmembrane proteins based on wavelet transform, Int. J. Biol. Sci., 9, 22-33, (2013)
[96] Yu, B.; Li, S.; Chen, C.; Xu, J. M.; Qiu, W. Y.; Wu, X.; Chen, R. X., Prediction subcellular localization of Gram-negative bacterial proteins by support vector machine using wavelet denoising and Chou’s pseudo amino acid composition, Chemom. Intell. Lab. Syst., 167, 102-112, (2017)
[97] Yu, B.; Li, S.; Qiu, W. Y.; Chen, C.; Chen, R. X.; Wang, L.; Wang, M. H.; Zhang, Y., Accurate prediction of subcellular location of apoptosis proteins combining Chou’s pseaac and psepssm based on wavelet denoising, Oncotarget, 8, 107640-107665, (2017)
[98] Yu, B.; Lou, L. F.; Li, L.; Zhang, Y.; Qiu, W. Y.; Wu, X.; Wang, M. H.; Tian, B. G., Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising, J. Mol. Graph. Model., 76, 260-273, (2017)
[99] Zakeri, P.; Moshiri, B.; Sadeghi, M., Prediction of protein submitochondria locations based on data fusion of various features of sequences, J. Theor. Biol., 269, 208-216, (2011) · Zbl 1307.92094
[100] Zeng, Y. H.; Guo, Y. Z.; Xiao, R. Q.; Yang, L.; Yu, L. Z.; Li, M. L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. Theor. Biol., 259, 366-372, (2009) · Zbl 1402.92193
[101] Zhai, J. X.; Cao, T. J.; An, J. Y.; Bian, Y. T., Highly accurate prediction of protein self-interactions by incorporating the average block and PSSM information into the general pseaac, J. Theor. Biol., 432, 80-86, (2017) · Zbl 1393.92016
[102] Zhang, S. L.; Duan, X., Prediction of protein subcellular localization with oversampling approach and Chou’s general pseaac, J. Theor. Biol., 437, 239-250, (2018) · Zbl 1394.92047
[103] Zhu, P. P.; Li, W. C.; Zhong, Z. J.; Deng, E. Z.; Ding, H.; Chen, W.; Lin, H., Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition, Mol. Biosyst., 11, 558-563, (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.