×

Predicting \(O\)-glycosylation sites in mammalian proteins by using SVMs. (English) Zbl 1102.92022

Summary: \(O\)-glycosylation is one of the most important, frequent and complex post-translational modifications. This modification can activate and affect protein functions. We present three support vector machine models based on physical properties, 0/1 systems, and a system combining the above two features. The prediction accuracies of the three models have reached 0.82, 0.85 and 0.85, respectively. The accuracies of the three SVMs methods were evaluated by ‘leave-one-out’ cross validation. This approach provides a useful tool to help identify the \(O\)-glycosylation sites in mammalian proteins. An online prediction web server is available at http://www.biosino.org/Oglyc.

MSC:

92C40 Biochemistry, molecular biology
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Bairoch, A. B., The SWISS-PROT protein sequence data bank, recent developments, Nucleic Acids Res., 21, 3093-3096 (1993)
[2] Bewick, V., Statistics review 13: receiver operating characteristic curves, Crit. Care, 8, 508-512 (2004)
[3] Blom, N., Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence, Proteomics, 4, 1633-1649 (2004)
[4] Cai, D. L.; Chou, K. C., Support vector machines for prediction of protein signal sequences and their cleavage sites, Peptides, 24, 159-161 (2003)
[5] Cai, D. L.; Xu, X.; Zhou, G. P., Support vector machines for predicting protein structural class, BMC Bioinformatics, 2, 3, 3 (2001)
[6] Cai, D. L.; Xu, X. B.; Chou, K. C., Support vector machines for prediction of protein subcellular location, Mol. Cell Biol. Res. Commun., 4, 230-233 (2000)
[7] Cai, Y. D., Support vector machines for predicting HIV protease cleavage sites in protein, J. Comput. Chem., 23, 267-274 (2002)
[8] Cai, Y. D., Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J., 84, 3257-3263 (2003)
[9] Chung, M. K.; Sun, C. L.; Wang, L. L.; Lin, C. J., Radius margin bounds for support vector machines with the RBF kernel, Neural Comput., 15, 2643-2681 (2003) · Zbl 1085.68123
[10] Cortes, C.; Vladimir, V., Support-vector networks, Mach. Learn (1995) · Zbl 0831.68098
[11] Eisenhaber, B. B.; Eisenhaber, F., Prediction of potential GPI-modification sites in proprotein sequences, J. Mol. Biol., 292, 741-758 (1999)
[12] Gentleman, C. C.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S., Bioconductor: open software development for computational biology and bioinformatics, Genome Biol., 5, R80 (2004)
[13] Guo, J. C.; Sun, Z.; Lin, Y., A novel method for protein secondary structure prediction using dual-layer SVM and profiles, Proteins, 54, 738-743 (2004)
[14] Hansen, E. L.; Engelbrecht, J.; Bohr, H.; Nielsen, J. O.; Hansen, J. E., Prediction of \(O\)-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:polypeptide \(N\)-acetylgalactosaminyl transferase, Biochem. J., 308, 801-813 (1995)
[15] Hansen, E. L.; Tolstrup, N.; Gooley, A. A.; Williams, K. L.; Brunak, S., NetOglyc: prediction of mucin type \(O\)-glycosylation sites based on sequence context and surface accessibility, Glycoconj. J., 15, 115-130 (1998)
[16] Jenkins, N. P.; James, D. C., Getting the glycosylation right: implications for the biotechnology industry, Nat. Biotechnol., 14, 975-981 (1996)
[17] Joachims, T., Making large-scale SVM learning practical, Adv. Kernel Methods (1999)
[18] Johnson, N. P., Advantages to transforming the receiver operating characteristic (ROC) curve into likelihood ratio co-ordinates, Stat. Med., 23, 2257-2266 (2004)
[19] Julenius, K. M.; Gupta, R.; Brunak, S., Prediction, conservation analysis, and structural characterization of mammalian mucin-type \(O\)-glycosylation sites, Glycobiology, 15, 153-164 (2005)
[20] Kim, H. L.; Oh, B.; Kimm, K.; Koh, I., Prediction of phosphorylation sites using SVMs, Bioinformatics, 20, 3179-3184 (2004)
[21] Rost, B. S., Prediction of protein secondary structure at better than 70
[22] Scheragat, T. O.H. A., Statistical analysis of the physical properties of the 20 naturally occurring amino acids, J. Protein Chem., 4, 23-55 (1985)
[23] Sing, T. S.; Beerenwinkel, N.; Lengauer, T., ROCR: visualizing classifier performance in \(R\), Bioinformatics, 21, 3940-3941 (2005)
[24] Vapnik, M. G.V. N., A training algorithm for optimal margin classifiers, (Proceedings of the Fifth Annual Workshop on Computational Learning Theory (1992))
[25] Vapnik, V. N.; Vapnik, C. V., The nature of statistical learning theory support-vector networks, Mach. Learn (1995) · Zbl 0833.62008
[26] Zavaljevski, N. S.; Reifman, J., Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions, Bioinformatics, 18, 689-696 (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.