×

MFSC: multi-voting based feature selection for classification of Golgi proteins by adopting the general form of Chou’s PseAAC components. (English) Zbl 1406.92167

Summary: Automatic identification of protein subcellular localization has gained much popularity in the last few decades. Subcellular localizations are useful in diagnosis of different diseases as well as in the process of drug development. Golgi is a vital type of protein, which provides means of transportation to several other proteins destined for lysosome, plasma membrane and secretion etc. Cis-Golgi and trans-Golgi are two ends of Golgi protein meant for reception and transmission of various substances. Dysfunction in Golgi proteins may lead to different types of diseases especially the inheritable and neurodegenerative problems.
Due to the significance of Golgi proteins, it is indispensable to correctly identify the Golgi proteins. In this paper, a novel and high throughput computational model is proposed which can identify the subGolgi proteins precisely. Discrete and evolutionary feature extraction schemes are applied so that all the salient, noiseless, and relevant information from protein sequences could be captured. Unfortunately, the benchmark dataset publicly available is quite imbalance, where trans-Golgi sequences constitute 72% of the whole dataset that reflects biasness, redundancy, and lack of hypothesis generalization. In order to cover the limitations of imbalance data, synthetic minority over sampling technique is utilized to balance the number of instances in different classes of the dataset. In addition, a condense feature space is formed by fusing the high rank features of eleven different feature selection techniques. The high rank features are selected through majority voting algorithm; consequently, the feature space is reduced 85%. The experiential results demonstrate that kNN classifier obtained promising results in combination with hybrid feature space. It has yielded an accuracy of 98% in jackknife cross-validation, 94% in independent data and 96% in 10-fold cross-validation test. It is ascertained that the proposed model is reliable, consistent and serves as a valuable tool for the research community.

MSC:

92C40 Biochemistry, molecular biology
92D20 Protein sequences, DNA sequences
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Acid, S.; De Campos, L. M.; Fernández, M., Minimum redundancy maximum relevancy versus score-based methods for learning Markov boundaries, (Proceedings of the Eleventh International Conference on Intelligent Systems Design and Applications (ISDA) (2011), IEEE), 619-623
[2] Afridi, T. H.; Khan, A.; Lee, Y. S., Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition, Amino Acids, 42, 1443-1454 (2012)
[3] Akbar, S.; Hayat, M.; Iqbal, M.; Jan, M. A., iACP-GAEnsC: evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybridd feature space, Artif. Intell. Med, 79, 62-70 (2017)
[4] Altman, D. G.; Bland, J. M., Diagnostic tests. 1: sensitivity and specificity, BMJ: Br. Med. J., 308, 1552 (1994)
[5] Altman, N. S., An introduction to kernel and nearest-neighbor nonparametric regression, Am. Stat., 46, 175-185 (1992)
[6] Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402 (1997)
[7] Araki, E.; Oyadomari, S.; Mori, M., Impact of endoplasmic reticulum stress pathway on pancreatic β-cells and diabetes mellitus, Exp. Biol. Med., 228, 1213-1217 (2003)
[8] Arif, M.; Hayat, M.; Jan, Z., iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into Chou’s pseudo amino acid composition, J. Theor. Biol, 442, 11-21 (2018) · Zbl 1397.92180
[9] Bhasin, M.; Raghava, G. P., Classification of nuclear receptors based on amino acid composition and dipeptide composition, J. Biol. Chem., 279, 23262-23266 (2004)
[10] Bock, J.; Klumperman, J.; Davanger, S.; Scheller, R., Syntaxin 6 functions in trans-Golgi network vesicle trafficking, Mol. Biol. Cell, 8, 1261-1271 (1997)
[11] Bradley, P. S.; Mangasarian, O. L., Feature selection via concave minimization and support vector machines, (Proceedings of the ICML, 98 (1998)), 82-90
[12] Brettschneider, J.; Del Tredici, K.; Lee, V. M.-Y.; Trojanowski, J. Q., Spreading of pathology in neurodegenerative diseases: a focus on human studies, Nat. Rev. Neurosci., 16, 109-120 (2015)
[13] Cai, D.; Zhang, C.; He, X., Unsupervised feature selection for multi-cluster data, (Proceedings of the Sixteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2010), ACM), 333-342
[14] Cai, L.; Huang, T.; Su, J.; Zhang, X.; Chen, W.; Zhang, F.; He, L.; Chou, K.-C., Implications of newly identified brain eQTL genes and their interactors in Schizophrenia, Mol. Ther. Nucleic Acids, 12, 433-442 (2018)
[15] Cai, Y.-D.; Liu, X.-J.; Xu, X.-b.; Chou, K.-C., Prediction of protein structural classes by support vector machines, Comput. Chem., 26, 293-296 (2002)
[16] Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z., Propy: a tool to generate various modes of Chou’s PseAAC, Bioinformatics, 29, 960-962 (2013)
[17] Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; Kegelmeyer, W. P., SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16, 321-357 (2002) · Zbl 0994.68128
[18] Chen, W.; Feng, P.-M.; Lin, H.; Chou, K.-C., iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition, Nucleic Acids Res., 41, 6, e68 (2013)
[19] Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K.-C., iRNA-PseU: identifying RNA pseudouridine sites, Mol. Ther. Nucleic Acids, 5, e332 (2016)
[20] Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.-C., Using deformation energy to analyze nucleosome positioning in genomes, Genomics, 107, 69-75 (2016)
[21] Chen, W.; Ding, H.; Feng, P.; Lin, H.; Chou, K.-C., iACP: a sequence-based tool for identifying anticancer peptides, Oncotarget, 7, 16895 (2016)
[22] Chen, W.; Ding, H.; Zhou, X.; Lin, H.; Chou, K.-C., iRNA (m6A)-PseDNC: identifying N6-methyladenosine sites using pseudo dinucleotide composition, Anal. Biochem., 561, 59-65 (2018)
[23] Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.-C., iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences, Oncotarget, 8, 4208 (2017)
[24] Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.-C., iRNA-3typeA: identifying three types of modification at RNA’s adenosine sites, Mol. Ther. Nucleic Acids, 11, 468-474 (2018)
[25] Chen, X.-X.; Tang, H.; Li, W.-C.; Wu, H.; Chen, W.; Ding, H.; Lin, H., Identification of bacterial cell wall lyases via pseudo amino acid composition, BioMed Res. Int., 2016 (2016)
[26] Cheng, X.; Xiao, X.; Chou, K.-C., pLoc-mVirus: predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC, Gene, 628, 315-321 (2017)
[27] Cheng, X.; Xiao, X.; Chou, K.-C., pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC, Mol. BioSyst., 13, 1722-1727 (2017)
[28] Cheng, X.; Xiao, X.; Chou, K.-C., pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information, Bioinformatics, 1, 9 (2017)
[29] Cheng, X.; Xiao, X.; Chou, K.-C., pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC, Genomics, 110, 1, 50-58 (2017)
[30] Cheng, X.; Xiao, X.; Chou, K.-C., pLoc-mGneg: predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC, Genomics (2017)
[31] Cheng, X.; Xiao, X.; Chou, K.-C., pLoc_bal-mGneg: predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC, J. Theor. Biol., 458, 92-102 (2018) · Zbl 1406.92173
[32] Cheng, X.; Zhao, S.-G.; Xiao, X.; Chou, K.-C., iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals, Bioinformatics, 33, 341-346 (2016)
[33] Cheng, X.; Zhao, S.-G.; Lin, W.-Z.; Xiao, X.; Chou, K.-C., pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites, Bioinformatics, 33, 3524-3531 (2017)
[34] Cheng, X.; Lin, W.-Z.; Xiao, X.; Chou, K.-C.; Hancock, J., pLoc_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC, Bioinformatics (2018) · Zbl 1406.92173
[35] Chou, K.-C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247 (2011) · Zbl 1405.92212
[36] Chou, K.-C., Some remarks on predicting multi-label attributes in molecular biosystems, Mol. BioSyst., 9, 1092-1100 (2013)
[37] Chou, K.-C.; Zhang, C.-T., Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349 (1995)
[38] Chou, K.-C.; Cai, Y.-D., A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology, Biochem. Biophys. Res. Commun., 311, 743-747 (2003)
[39] Chou, K.-C.; Shen, H.-B., Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization, Biochem. Biophys. Res. Commun., 347, 150-157 (2006)
[40] chou, K.-C.; Shen, H.-B., Recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16 (2007)
[41] Chou, K.-C.; Shen, H.-B., MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM, Biochem. Biophys. Res. Commun., 360, 339-345 (2007)
[42] Chou, K.-C.; Shen, H.-B., Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms, Nat. Protoc., 3, 153 (2008)
[43] Chou, K.-C.; Shen, H.-B., Recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 1, 63 (2009)
[44] Chou, K.-C.; Shen, H.-B., Cell-PLoc 2.0: An improved package of web-servers for predicting subcellular localization of proteins in various organisms, Development, 109, 1091 (2010)
[45] Chou, K.-C.; Cheng, X.; Xiao, X., pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset, Genomics (2018) · Zbl 1406.92173
[46] Chou, K. C., Prediction of protein cellular attributes using pseudo‐amino acid composition, Proteins Struct. Funct. Bioinform, 43, 246-255 (2001)
[47] Culvenor, J. G.; Maher, F.; Evin, G.; Malchiodi‐Albedi, F.; Cappai, R.; Underwood, J. R.; Davis, J. B.; Karran, E. H.; Roberts, G. W.; Beyreuther, K., Alzheimer’s disease‐associated presenilin 1 in neuronal cells: evidence for localization to the endoplasmic reticulum‐Golgi intermediate compartment, J. Neurosci. Res., 49, 719-731 (1997)
[48] Day, K. J.; Staehelin, L. A.; Glick, B. S., A three-stage model of Golgi structure and function, Histochem. Cell Biol., 140, 239-249 (2013)
[49] Dehzangi, A.; López, Y.; Lal, S. P.; Taherzadeh, G.; Michaelson, J.; Sattar, A.; Tsunoda, T.; Sharma, A., PSSM-Suc: accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction, J. Theor. Biol., 425, 97-102 (2017) · Zbl 1381.92002
[50] Ding, H.; Liu, L.; Guo, F.-B.; Huang, J.; Lin, H., Identify Golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition, Protein Pept. Lett., 18, 58-63 (2011)
[51] Ding, H.; Guo, S.-H.; Deng, E.-Z.; Yuan, L.-F.; Guo, F.-B.; Huang, J.; Rao, N.; Chen, W.; Lin, H., Prediction of Golgi-resident protein types by using feature selection technique, Chemomet. Intell. Lab. Syst., 124, 9-13 (2013)
[52] Du, P.; Gu, S.; Jiao, Y., PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets, Int. J. Mol. Sci., 15, 3495-3506 (2014)
[53] Du, P.; Wang, X.; Xu, C.; Gao, Y., PseAAC-Builder: A cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions, Anal. Biochem., 425, 117-119 (2012)
[54] Duda, R. O.; Hart, P. E.; Stork, D. G., Pattern Classification (2012), John Wiley & Sons
[55] Ehsan, A.; Mahmood, K.; Khan, Y. D.; Khan, S. A.; Chou, K.-C., A novel modeling in mathematical biology for classification of signal peptides, Sci. Rep., 8, 1039 (2018)
[56] Fan, G.-L.; Li, Q.-Z., Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition, Amino Acids, 43, 545-555 (2012)
[57] Farquhar, M. G.; Palade, G. E., The Golgi apparatus (complex)-(1954-1981)-from artifact to center stage, J. Cell Biol., 91, 77s-103s (1981)
[58] Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.-C., iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC, Mol. Ther. Nucleic Acids, 7, 155-163 (2017)
[59] Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C., iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC, Genomics (2018)
[60] Fujita, Y.; Okamoto, K., Golgi apparatus of the motor neurons in patients with amyotrophic lateral sclerosis and in mice models of amyotrophic lateral sclerosis, Neuropathology, 25, 388-394 (2005)
[61] Fujita, Y.; Ohama, E.; Takatama, M.; Al-Sarraj, S.; Okamoto, K., Fragmentation of Golgi apparatus of nigral neurons with α-synuclein-positive inclusions in patients with Parkinson’s disease, Acta Neuropathol., 112, 261-265 (2006)
[62] Gonatas, N.; Gonatas, J. O.; Stieber, A., The involvement of the Golgi apparatus in the pathogenesis of amyotrophic lateral sclerosis, Alzheimer’s disease, and ricin intoxication, Histochem. Cell Biol., 109, 591-600 (1998)
[63] Gromiha, M. M.; Suwa, M., A simple statistical method for discriminating outer membrane proteins with better accuracy, Bioinformatics, 21, 961-968 (2005)
[64] Gu, Q., Li, Z., and Han, J., 2012. Generalized fisher score for feature selection, UAI’11 Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Pp. 266-273.; Gu, Q., Li, Z., and Han, J., 2012. Generalized fisher score for feature selection, UAI’11 Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Pp. 266-273.
[65] Guo, J. X.; Rao, N. N., The influence of dipeptide composition on protein folding rates, advanced materials research, Trans. Tech. Publ., 378, 157-160 (2012)
[66] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Mach. Learn., 46, 389-422 (2002) · Zbl 0998.68111
[67] Hall, M.A., 2000. Correlation-based feature selection of discrete and numeric class machine learning, ICML ’00 Proceedings of the Seventeenth International Conference on Machine Learning, Pp: 359-366.; Hall, M.A., 2000. Correlation-based feature selection of discrete and numeric class machine learning, ICML ’00 Proceedings of the Seventeenth International Conference on Machine Learning, Pp: 359-366.
[68] Hayat, M.; Khan, A., Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition, J. Theor. Biol., 271, 10-17 (2011) · Zbl 1405.92217
[69] Hayat, M.; Khan, A., MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102 (2012) · Zbl 1307.92308
[70] He, X., Cai, D., and Niyogi, P., Laplacian score for feature selection, advances in neural information processing systems, 2005, NIPS’05 Proceedings of the 18th International Conference on Neural Information Processing Systems, Pp: 507-514.; He, X., Cai, D., and Niyogi, P., Laplacian score for feature selection, advances in neural information processing systems, 2005, NIPS’05 Proceedings of the 18th International Conference on Neural Information Processing Systems, Pp: 507-514.
[71] Hu, Z.; Zeng, L.; Xie, L.; Lu, W.; Zhang, J.; Li, T.; Wang, X., Morphological alteration of Golgi apparatus and subcellular compartmentalization of TGF-β1 in Golgi apparatus in gerbils following transient forebrain ischemia, Neurochem. Res., 32, 1927-1931 (2007)
[72] Huang, C.; Yuan, J.-Q., Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou’s pseudo amino acid compositions, J. Theor. Biol., 335, 205-212 (2013) · Zbl 1397.92195
[73] Javed, F.; Hayat, M., Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou’s PseAAC, Genomics (2018)
[74] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iPPI-Esml: an ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC, J. Theor. Biol., 377, 47-56 (2015)
[75] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition, J. Biomol. Struct. Dyn., 34, 1946-1961 (2016)
[76] Jia, J.; Zhang, L.; Liu, Z.; Xiao, X.; Chou, K.-C., pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC, Bioinformatics, 32, 3133-3141 (2016)
[77] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iPPBS-Opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets, Molecules, 21, 95 (2016)
[78] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iSuc-PseOpt: identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset, Anal. Biochem., 497, 48-56 (2016)
[79] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC, Oncotarget, 7, 34558 (2016)
[80] Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.-C., pSuc-Lys: predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach, J. Theor. Biol., 394, 223-230 (2016) · Zbl 1343.92153
[81] Jiao, Y.-S.; Du, P.-F., Prediction of Golgi-resident protein types using general form of Chou’s pseudo-amino acid compositions: approaches with minimal redundancy maximal relevance feature selection, J. Theor. Biol., 402, 38-44 (2016) · Zbl 1343.92378
[82] Jiao, Y.-S.; Du, P.-F., Predicting Golgi-resident protein types using pseudo amino acid compositions: approaches with positional specific physicochemical properties, J. Theor. Biol., 391, 35-42 (2016) · Zbl 1343.92154
[83] Kabir, M.; Hayat, M., iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples, Mol. Genet. Genom., 291, 285-296 (2016)
[84] Krishnan, S. M., Using Chou’s general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains, J. Theor. Biol., 445, 62-74 (2018)
[85] Kumar, M.; Verma, R.; Raghava, G. P., Prediction of mitochondrial proteins using support vector machine and hidden Markov model, J. Biol. Chem., 281, 5357-5363 (2006)
[86] Lin, H.; Chen, W.; Ding, H., AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes, PloS One, 8, e75726 (2013)
[87] Ling, C. X.; Li, C., Data mining for direct marketing: problems and solutions, (Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 98 (1998)), 73-79
[88] Liu, B.; Long, R.; Chou, K.-C., iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework, Bioinformatics, 32, 2411-2418 (2016)
[89] Liu, B.; Yang, F.; Chou, K.-C., 2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function, Mol. Ther. Nucleic Acids, 7, 267-277 (2017)
[90] Liu, B.; Wu, H.; Chou, K.-C., Pse-in-One 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nat. Sci, 9, 67 (2017)
[91] Liu, B.; Wang, S.; Long, R.; Chou, K.-C., iRSpot-EL: identify recombination spots with an ensemble learning approach, Bioinformatics, 33, 35-41 (2016)
[92] Liu, B.; Yang, F.; Huang, D.-S.; Chou, K.-C., iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC, Bioinformatics, 34, 33-40 (2017)
[93] Liu, B.; Weng, F.; Huang, D.-S.; Chou, K.-C., iRO-3wPseKNC: Identify DNA replication origins by three-window-based PseKNC, Bioinformatics, 1, 8 (2018)
[94] Liu, B.; Fang, L.; Long, R.; Lan, X.; Chou, K.-C., iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition, Bioinformatics, 32, 362-369 (2015)
[95] Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.-C., Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences, Nucleic Acids Res., 43, W65-W71 (2015)
[96] Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chen, J.; Chou, K.-C., Identification of real microRNA precursors with a pseudo structure status composition approach, PloS One, 10, Article e0121501 pp. (2015)
[97] Liu, G.-H.; Shen, H.-B.; Yu, D.-J., Prediction of protein-protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures, J. Membr. Biol., 249, 141-153 (2016)
[98] Liu, H.; Motoda, H., Computational Methods of Feature Selection (2007), CRC Press · Zbl 1130.62118
[99] Liu, L.-M.; Xu, Y.; Chou, K.-C., iPGK-PseAAC: identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC, Med. Chem., 13, 552-559 (2017)
[100] Liu, T.; Zheng, X.; Wang, J., Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile, Biochimie, 92, 1330-1334 (2010)
[101] Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C., iDNA-Methyl: Identifying DNA methylation sites via pseudo trinucleotide composition, Anal. Biochem., 474, 69-77 (2015)
[102] Liu, Z.; Xiao, X.; Yu, D.-J.; Jia, J.; Qiu, W.-R.; Chou, K.-C., pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties, Anal. Biochem., 497, 60-67 (2016)
[103] Munro, S., Localization of proteins to the Golgi apparatus, Trends Cell Biol., 8, 11-15 (1998)
[104] Opat, A. S.; van Vliet, C.; Gleeson, P. A., Trafficking and localisation of resident Golgi glycosylation enzymes, Biochimie, 83, 763-773 (2001)
[105] Pazzani, M.; Merz, C.; Murphy, P.; Ali, K.; Hume, T.; Brunk, C., Reducing misclassification costs, (Proceedings of the Eleventh International Conference on Machine Learning (1994)), 217-225
[106] Powers, D. M., Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation, Journal of Machine Learning Technologies, 2, 1, 37-63 (2011)
[107] Qiu, W.-R.; Sun, B.-Q.; Xiao, X.; Xu, Z.-C.; Chou, K.-C., iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC, Oncotarget, 7, 44310 (2016)
[108] Qiu, W.-R.; Jiang, S.-Y.; Xu, Z.-C.; Xiao, X.; Chou, K.-C., iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition, Oncotarget, 8, 41178 (2017)
[109] Qiu, W.-R.; Jiang, S.-Y.; Sun, B.-Q.; Xiao, X.; Cheng, X.; Chou, K.-C., iRNA-2methyl: identify RNA 2′-O-methylation sites by incorporating sequence-coupled effects into general PseKNC and ensemble classifier, Med. Chem., 13, 734-743 (2017)
[110] Qiu, W.-R.; Sun, B.-Q.; Xiao, X.; Xu, Z.-C.; Jia, J.-H.; Chou, K.-C., iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics, 110, 5, 239-246 (2017)
[111] Qiu, W.-R.; Sun, B.-Q.; Xiao, X.; Xu, Z.-C.; Jia, J.-H.; Chou, K.-C., iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier, Genomics, 110, 239-246 (2018)
[112] Roffo, G.; Melzi, S.; Cristani, M., Infinite Feature Selection, (Proceedings of the IEEE International Conference on Computer Vision (2015)), 4202-4210
[113] Satiat‐Jeunemaitre, B.; Cole, L.; Bourett, T.; Howard, R.; Hawes, C., Brefeldin A effects in plant and fungal cells: something new about vesicle trafficking?, J. Microsc., 181, 162-177 (1996)
[114] Schäffer, A. A.; Aravind, L.; Madden, T. L.; Shavirin, S.; Spouge, J. L.; Wolf, Y. I.; Koonin, E. V.; Altschul, S. F., Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic Acids Res., 29, 2994-3005 (2001)
[115] Shen, H.-B.; Chou, K.-C., PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition, Anal. Biochem., 373, 386-388 (2008)
[116] Song, J.; Wang, Y.; Li, F.; Akutsu, T.; Rawlings, N. D.; Webb, G. I.; Chou, K.-C., iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites, Brief. Bioinform (2018)
[117] Song, J.; Li, F.; Takemoto, K.; Haffari, G.; Akutsu, T.; Chou, K.-C.; Webb, G. I., PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework, J. Theor. Biol., 443, 125-137 (2018)
[118] Su, Z.-D.; Huang, Y.; Zhang, Z.-Y.; Zhao, Y.-W.; Wang, D.; Chen, W.; Chou, K.-C.; Lin, H.; Hancock, J., iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC, Bioinformatics, 34, 24, 4196-4204 (2018)
[119] Tahir, M.; Hayat, M.; Kabir, M., Sequence based predictor for discrimination of enhancer and their types by applying general form of Chou’s trinucleotide composition, Comput. Methods Progr. Biomed., 146, 69-75 (2017)
[120] Ungar, D., Golgi linked protein glycosylation and associated diseases, Semin. Cell Dev. Biol., 20, 762-769 (2009)
[121] Verma, R.; Varshney, G. C.; Raghava, G., Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile, Amino Acids, 39, 101-110 (2010)
[122] Waris, M.; Ahmad, K.; Kabir, M.; Hayat, M., Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix, Neurocomputing, 199, 154-162 (2016)
[123] Xiao, X.; Cheng, X.; Su, S.; Mao, Q.; Chou, K.-C., pLoc-mGpos: incorporate key gene ontology information into general PseAAC for predicting subcellular localization of Gram-positive bacterial proteins, Nat. Sci., 9, 330 (2017)
[124] Xiao, X.; Cheng, X.; Chen, G.; Mao, Q.; Chou, K.-C., pLoc_bal-mGpos: predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC, Genomics (2018) · Zbl 1406.92173
[125] Xiao, X.; Min, J.-L.; Lin, W.-Z.; Liu, Z.; Cheng, X.; Chou, K.-C., iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach, J. Biomol. Struct. Dyn., 33, 2221-2233 (2015)
[126] Xu, Y.; Wang, Z.; Li, C.; Chou, K.-C., iPreny-PseAAC: identify C-terminal cysteine prenylation sites in proteins by incorporating two tiers of sequence couplings into PseAAC, Med. Chem., 13, 544-551 (2017)
[127] Xuao, X.; Cheng, X.; Chen, G.; Mao, Q., pLoc_bal-mGpos: predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC, Genomics (2018) · Zbl 1406.92173
[128] Yang, H.; Qiu, W.-R.; Liu, G.; Guo, F.-B.; Lin, H., iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC, Int. J. Biol. Sci, 14, 8, 883-891 (2018)
[129] Yang, H.; Qiu, W.-R.; Liu, G.; Guo, F.-B.; Chen, W.; Chou, K.-C.; Lin, H., iRSpot-Pse6NC: identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC, Int. J. Biol. Sci, 14, 883 (2018)
[130] Yang, R.; Zhang, C.; Gao, R.; Zhang, L., A novel feature extraction method with feature selection to identify golgi-resident protein types from imbalanced data, Int. J. Mol. Sci., 17, 218 (2016)
[131] Yu, D.; Wu, X.; Shen, H.; Yang, J.; Tang, Z.; Qi, Y.; Yang, J., Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features, IEEE Trans. Nanobiosci., 11, 375-385 (2012)
[132] Zaffalon, M.; Hutter, M., Robust feature selection by mutual information distributions, (Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (2002), Morgan Kaufmann Publishers Inc), 577-584
[133] Zhang, Y.; Xie, R.; Wang, J.; Leier, A.; Marquez-Lago, T. T.; Akutsu, T.; Webb, G. I.; Chou, K.-C.; Song, J., Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework, Brief. Bioinform., 5 (2018)
[134] Zhou, G. P.; Doctor, K., Subcellular location prediction of apoptosis proteins, Proteins Struct. Funct. Bioinform., 50, 44-48 (2003)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.