Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 6:6:34595.
doi: 10.1038/srep34595.

GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features

Affiliations

GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features

Fuyi Li et al. Sci Rep. .

Abstract

Glycosylation plays an important role in cell-cell adhesion, ligand-binding and subcellular recognition. Current approaches for predicting protein glycosylation are primarily based on sequence-derived features, while little work has been done to systematically assess the importance of structural features to glycosylation prediction. Here, we propose a novel bioinformatics method called GlycoMinestruct(http://glycomine.erc.monash.edu/Lab/GlycoMine_Struct/) for improved prediction of human N- and O-linked glycosylation sites by combining sequence and structural features in an integrated computational framework with a two-step feature-selection strategy. Experiments indicated that GlycoMinestruct outperformed NGlycPred, the only predictor that incorporated both sequence and structure features, achieving AUC values of 0.941 and 0.922 for N- and O-linked glycosylation, respectively, on an independent test dataset. We applied GlycoMinestruct to screen the human structural proteome and obtained high-confidence predictions for N- and O-linked glycosylation sites. GlycoMinestruct can be used as a powerful tool to expedite the discovery of glycosylation events and substrates to facilitate hypothesis-driven experimental studies.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Overview of the GlycoMinestruct framework.
Four major steps are denoted by different colors: dataset collection and preprocessing (blue), feature extraction (yellow), feature analysis and selection (red), model evaluation (green).
Figure 2
Figure 2. Residue specificity and enrichment of sequons.
(a) N- and (b) O-linked glycosylation sites with the “human protein dataset” selected as the background set. Sequence logos and statistical test (binomial probabilities and Bonferroni correction) were generated using the pLogo program.
Figure 3
Figure 3. The relative importance and ranking of the selected optimal features.
(a) N-linked glycosylation and (b) O-linked glycosylation based on the average accuracy decrease of models trained after removal of a correspoding feature from the feature set.
Figure 4
Figure 4. ROC curves.
(a) Different GlycoMinestruct models trained with OFSs selected from all features, sequence features only, and structural features only, for N- and O-linked glycosylation sites. (b) N- and O-linked glycosylation-site predictions from GlycoMinestruct (trained with the OFS) and NGlycPred using the independent test dataset.
Figure 5
Figure 5. Predicted N-linked glycosylation sites from two case-study proteins using GlycoMinestruct.
(a) Toll-like receptor 8. (b) α-L-iduronidase. Predicted N-glycosylation sites from both GlycoMinestruct and NGlycoPred are colored in yellow, while the sites that were correctly predicted by GlycoMinestruct, but were not predicted by NGlycPred are coloured in red. The illustrations of Pfam domains and N-glycosylation sites of these two proteins shown at the bottom of each panel were rendered using the IBS program.
Figure 6
Figure 6. Functional enrichment analysis and classification of N-linked and O-linked glycoproteomes in terms of protein subcellular location, KEGG pathway, molecular function and biological process based on GO annotations.
(a) Subcellular locations and GO terms enriched in N-linked glycosylated proteins. (b) Subcellular locations and GO terms enriched in O-linked glycosylated proteins. (c,d) Distributions of N-linked and O-linked glycosylated proteins categorized based on the numbers of predicted glycosylation sites.

Similar articles

Cited by

References

    1. Spiro R. G. Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds. Glycobiology 12, 43R–56R (2002). - PubMed
    1. Moharir A., Peck S. H., Budden T. & Lee S. Y. The role of N-glycosylation in folding, trafficking, and functionality of lysosomal protein CLN5. PLoS One 8, e74299, doi: 10.1371/journal.pone.0074299 (2013). - DOI - PMC - PubMed
    1. Marino K., Bones J., Kattla J. J. & Rudd P. M. A systematic approach to protein glycosylation analysis: a path through the maze. Nat Chem Biol 6, 713–723, doi: 10.1038/nchembio.437 (2010). - DOI - PubMed
    1. Moremen K. W., Tiemeyer M. & Nairn A. V. Vertebrate protein glycosylation: diversity, synthesis and function. Nature reviews. Molecular cell biology 13, 448–462, doi: 10.1038/nrm3383 (2012). - DOI - PMC - PubMed
    1. Kiermaier E. et al.. Polysialylation controls dendritic cell trafficking by regulating chemokine recognition. Science 351, 186–190, doi: 10.1126/science.aad0512 (2016). - DOI - PMC - PubMed

Publication types