Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 5;22(6):bbab245.
doi: 10.1093/bib/bbab245.

Porpoise: a new approach for accurate prediction of RNA pseudouridine sites

Affiliations

Porpoise: a new approach for accurate prediction of RNA pseudouridine sites

Fuyi Li et al. Brief Bioinform. .

Abstract

Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming experimental research. Therefore, computational approaches that can be used to perform accurate in silico identification of pseudouridine sites from the large amount of RNA sequence data are highly desirable and can aid in the functional elucidation of this critical modification. Here, we propose a new computational approach, termed Porpoise, to accurately identify pseudouridine sites from RNA sequence data. Porpoise builds upon a comprehensive evaluation of 18 frequently used feature encoding schemes based on the selection of four types of features, including binary features, pseudo k-tuple composition, nucleotide chemical property and position-specific trinucleotide propensity based on single-strand (PSTNPss). The selected features are fed into the stacked ensemble learning framework to enable the construction of an effective stacked model. Both cross-validation tests on the benchmark dataset and independent tests show that Porpoise achieves superior predictive performance than several state-of-the-art approaches. The application of model interpretation tools demonstrates the importance of PSTNPs for the performance of the trained models. This new method is anticipated to facilitate community-wide efforts to identify putative pseudouridine sites and formulate novel testable biological hypothesis.

Keywords: RNA pseudouridine sit; ebioinformatics; machine learning; sequence analysis; stacking ensemble learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The overall framework of Porpoise. There are four major steps, including dataset preparation, feature engineering, model training and optimization and performance evaluation.
Figure 2
Figure 2
Performance comparison of variant XGBoost models trained using 18 different types of features in terms of AUC value.
Figure 3
Figure 3
Performance comparison of different combinations of base classifiers.
Figure 4
Figure 4
Boxplots showing that Strategy 1 significantly outperformed Strategy 2 in terms of MCC score on both (A) benchmark training and (B) independent test datasets.
Figure 5
Figure 5
Top 20 features of Porpoise ranked by the SHAP algorithm for predicting species-specific RNA pseudouridine sites of (A) H. sapiens, (B) S. cerevisiae and (C) M. musculus.

Similar articles

Cited by

References

    1. Ge J, Yu Y-T. RNA pseudouridylation: new insights into an old modification. Trends Biochem Sci 2013;38:210–8. - PMC - PubMed
    1. Charette M, Gray MW. Pseudouridine in RNA: what, where, how, and why. IUBMB Life 2000;49:341–52. - PubMed
    1. Davis DR, Veltri CA, Nielsen L. An RNA model system for investigation of pseudouridine stabilization of the codon-anticodon interaction in tRNALys, tRNAHis and tRNATyr. J Biomol Struct Dyn 1998;15:1121–32. - PubMed
    1. Basak A, Query CC. A pseudouridine residue in the spliceosome core is part of the filamentous growth program in yeast. Cell Rep 2014;8:966–73. - PMC - PubMed
    1. Jack K, Bellodi C, Landry DM, et al. . rRNA pseudouridylation defects affect ribosomal ligand binding and translational fidelity from yeast to human cells. Mol Cell 2011;44:660–6. - PMC - PubMed

Publication types