Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 3:23:2034-2048.
doi: 10.1016/j.csbj.2024.04.052. eCollection 2024 Dec.

GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs

Affiliations

GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs

Shuangkai Han et al. Comput Struct Biotechnol J. .

Abstract

Numerous research results demonstrated that understanding the subcellular localization of non-coding RNAs (ncRNAs) is pivotal in elucidating their roles and regulatory mechanisms in cells. Despite the existence of over ten computational models dedicated to predicting the subcellular localization of ncRNAs, a majority of these models are designed solely for single-label prediction. In reality, ncRNAs often exhibit localization across multiple subcellular compartments. Furthermore, the existing multi-label localization prediction models are insufficient in addressing the challenges posed by the scarcity of training samples and class imbalance in ncRNA dataset. To address these limitations, this study proposes a novel multi-label localization prediction model for ncRNAs, named GP-HTNLoc. To mitigate class imbalance, GP-HTNLoc adopts separate training approaches for head and tail location labels. Additionally, GP-HTNLoc introduces a pioneering graph prototype module to enhance its performance in small-sample, multi-label scenarios. The experimental results based on 10-fold cross-validation on benchmark datasets demonstrate that GP-HTNLoc achieves competitive predictive performance. The average results from 10 rounds of testing on an independent dataset show that GP-HTNLoc outperforms the best existing models on the human lncRNA, human snoRNA, and human miRNA subsets, with average precision improvements of 31.5%, 14.2%, and 5.6%, respectively, reaching 0.685, 0.632, and 0.704. A user-friendly online GP-HTNLoc server is accessible at https://56s8y85390.goho.co.

Keywords: Class imbalance; Heterogeneous graph representation learning; Multi-label classification; Non-coding RNA subcellular localization prediction.

PubMed Disclaimer

Conflict of interest statement

All authors disclosed no relevant relationships.

Figures

ga1
Graphical abstract
Fig. 1
Fig. 1
The class imbalance phenomenon present in the six subsets of the benchmark dataset: (a) LncRNA dataset; (b) miRNA dataset; (c) snoRNA dataset; (d) Human LncRNA dataset; (e) Human miRNA dataset; (f) Human snoRNA dataset.
Fig. 2
Fig. 2
Distribution of subcellular localization of lncRNAs in independent datasets.
Fig. 3
Fig. 3
The overall architecture of GP-HTNLoc comprises three main components: (i) imbalanced learning based on graph prototypes, (ii) fine-tuning, and (iii) prediction. In the unbalanced learning phase, this study introduces a graph prototype module that obtains prototypical representations of labels from head label samples, which in turn trains a transfer learner to transfer rich categorization knowledge from the head class to the sample-scarce tail class (illustrated in the figure using the lncRNA dataset).
Fig. 4
Fig. 4
Workflow diagram of the graphical prototype module. Firstly, the labels and samples association heterogeneous graph is constructed from the label matrix, then the sample nodes are initialized with deep sequence features and the label nodes are initialized with the standard normal distribution. Following this, the label node embeddings are learned on the heterogeneous graph by using HGCN and MetaPath2Vec, respectively. Finally, the two types of embeddings are fused to obtain the final label embeddings as labeling prototypes.
Fig. 5
Fig. 5
The average AP of 10 times 10-fold cross-validation under different parameters of GP-HTNLoc is shown. (A) Different fine-tuning epoch numbers, (B) Different HGCN training epoch numbers.
Fig. 6
Fig. 6
Performance of GP-HTNLoc on three subsets of the benchmark dataset under different graph prototype dimensions.
Fig. 7
Fig. 7
Box plots of the average precision and accuracy of the ablation experiments for each component of GP-HTNLoc obtained based on 10 ten-fold cross-validation on three subsets of the benchmark dataset (lncRNAs, snoRNAs, miRNAs). Differences between groups are shown as p-values.GP-HTN denotes the final model GP-HTNLoc in this paper,AP-HTN denotes the AP-HTNLoc model, and BiL-C denotes the model composed of BiLSTM and classifiers.
Fig. 8
Fig. 8
Model interpretation based on Shapley values and motif analysis on the snoRNA subset of the benchmark dataset. (A) Top 15 features for the nucleus, (B) top 15 features for the nucleolus, (C) top 15 features among all five subcellular localizations. (D) Three motifs obtained from the MEME suite.

Similar articles

References

    1. Fu X.D. Non-coding RNA: a new frontier in regulatory biology. Natl Sci Rev. 2014;1(2):190–204. - PMC - PubMed
    1. Sheng N., Huang L., Lu Y., et al. Data resources and computational methods for lncRNA-disease association prediction. Comput Biol Med. 2023 - PubMed
    1. Savulescu A.F., Brackin R., Bouilhol E., et al. Interrogating RNA and protein spatial subcellular distribution in smFISH data with DypFISH. Cell Rep Methods. 2021;1(5) - PMC - PubMed
    1. Zappulo A., Van Den Bruck D., Ciolli Mattioli C., et al. RNA localization is a key determinant of neurite-enriched proteome. Nat Commun. 2017;8(1):583. - PMC - PubMed
    1. Jopling C.L., Schütz S., Sarnow P. Position-dependent function for a tandem microRNA miR-122-binding site located in the hepatitis C virus RNA genome. Cell Host Microbe. 2008;4(1):77–85. - PMC - PubMed

LinkOut - more resources