×

Annotating the protein-RNA interaction sites in proteins using evolutionary information and protein backbone structure. (English) Zbl 1337.92160

Summary: RNA-protein interactions play important roles in various biological processes. The precise detection of RNA-protein interaction sites is very important for understanding essential biological processes and annotating the function of the proteins. In this study, based on various features from amino acid sequence and structure, including evolutionary information, solvent accessible surface area and torsion angles \((\varphi,\psi)\) in the backbone structure of the polypeptide chain, a computational method for predicting RNA-binding sites in proteins is proposed. When the method is applied to predict RNA-binding sites in three datasets: RBP86 containing 86 protein chains, RBP107 containing 107 proteins chains and RBP109 containing 109 proteins chains, better sensitivities and specificities are obtained compared to previously published methods in five-fold cross-validation tests. In order to make further examination for the efficiency of our method, the RBP107 dataset is used as training set, RBP86 and RBP109 datasets are used as the independent test sets. In addition, as examples of our prediction, RNA-binding sites in a few proteins are presented. The annotated results are consistent with the PDB annotation. These results show that our method is useful for annotating RNA binding sites of novel proteins.

MSC:

92D20 Protein sequences, DNA sequences
92D15 Problems related to evolution
Full Text: DOI

References:

[1] Ahmad, S.; Gromiha, M. M.; Sarai, A., Real value prediction of solvent accessibility from amino acid sequence, Proteins, 50, 629-635 (2003)
[2] Allers, J.; Shamoo, Y., Structure-based analysis of protein-RNA interactions using the program ENTANGLE, J. Mol. Biol., 311, 75-86 (2001)
[3] Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res., 25, 3389-3402 (1997)
[4] Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E., The Protein Data Bank, Nucleic Acids Res., 28, 235-242 (2000)
[5] Brown, V.; Jin, P.; Ceman, S.; Darnell, J. C.; O’Donnell, W. T.; Tenenbaum, S. A.; Jin, X.; Feng, Y.; Wilkinson, K. D.; Keene, J. D.; Darnell, R. B.; Warren, S. T., Microarray identification of FMRP-associated brain mRNAs and altered mRNA translational profiles in fragile X syndrome, Cell, 107, 477-487 (2001)
[6] Cai, Y. D.; Ricardo, P. W.; Jen, C. H.; Chou, K. C., Application of SVM to predict membrane protein types, J. Theor. Biol., 226, 373-376 (2004)
[7] Carter, A. P.; Clemons, W. M.; Brodersen, D. E.; Morgan-Warren, R. J.; Wimberly, B. T.; Ramakrishnan, V., Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics, Nature, 407, 340-348 (2000)
[8] Chang, C.C., Lin, C.J., 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:21-27:27. Software available at 〈http://www.csie.ntu.edu.tw/∼cjlin/libsvm〉; Chang, C.C., Lin, C.J., 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:21-27:27. Software available at 〈http://www.csie.ntu.edu.tw/∼cjlin/libsvm〉
[9] Chen, C.; Chen, L.; Zou, X.; Cai, P., Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine, Protein Pept. Lett., 16, 27-31 (2009)
[10] Cheng, C. W.; Su, E. C.; Hwang, J. K.; Sung, T. Y.; Hsu, W. L., Predicting RNA-binding sites of proteins using support vector machines and evolutionary information, BMC Bioinf., 9, Suppl 12, S6 (2008)
[11] Chou, K. C., Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins, 43, 246-255 (2001)
[12] Chou, K. C., Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics, 21, 10-19 (2005)
[13] Chou, K. C., Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology, Curr. Proteomics, 6, 262-274 (2009)
[14] Chou, K. C., Some remarks on protein attribute prediction and pseudo amino acid composition, J. Theor. Biol., 273, 236-247 (2011) · Zbl 1405.92212
[15] Chou, K. C.; Zhang, C. T., Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol., 30, 275-349 (1995)
[16] Chou, K. C.; Shen, H. B., MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM, Biochem. Biophys. Res. Commun., 360, 339-345 (2007)
[17] Chou, K. C.; Shen, H. B., Recent progress in protein subcellular location prediction, Anal. Biochem., 370, 1-16 (2007)
[18] Chou, K. C.; Shen, H. B., Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms, Nat. Protoc., 3, 153-162 (2008)
[19] Chou, K. C.; Shen, H. B., Review: recent advances in developing web-servers for predicting protein attributes, Nat. Sci., 2, 63-92 (2009)
[20] Chou, K. C.; Shen, H. B., A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0, PLoS One, 5, e9931 (2010)
[21] Chou, K. C.; Shen, H. B., Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization, PLoS One, 5, e11335 (2010)
[22] Chou, K. C.; Shen, H. B., Cell-PLoc 2.0: An improved package of web-servers for predicting subcellular localization of proteins in various organisms, Nat. Sci., 2, 1090-1103 (2010)
[23] Chou, K. C.; Wu, Z. C.; Xiao, X., iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins, PLoS One, 6, e18258 (2011)
[24] Chou, K. C.; Wu, Z. C.; Xiao, X., iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites, Mol. Biosyst., 8, 629-641 (2012)
[25] Connolly, M. L., Solvent-accessible surfaces of proteins and nucleic acids, Science, 221, 709-713 (1983)
[26] Curtis, D.; Lehmann, R.; Zamore, P. D., Translational regulation in development, Cell, 81, 171-178 (1995)
[27] de Moor, C. H.; Richter, J. D., Translational control in vertebrate development, Int. Rev. Cytol., 203, 567-608 (2001)
[28] Draper, D. E., Themes in RNA-protein recognition, J. Mol. Biol., 293, 255-270 (1999)
[29] Du, P.; Li, Y., Prediction of C-to-U RNA editing sites in plant mitochondria using both biochemical and evolutionary information, J. Theor. Biol., 253, 579-586 (2008)
[30] Ellis, J. J.; Broom, M.; Jones, S., Protein-RNA interactions: structural analysis and functional classes, Proteins, 66, 903-911 (2007)
[31] Fujita, M.; Hawkinson, D.; King, K. V.; Hall, D. H.; Sakamoto, H.; Buechner, M., The role of the ELAV homologue EXC-7 in the development of the Caenorhabditis elegans excretory canals, Dev. Biol., 256, 290-301 (2003)
[32] Gu, Q.; Ding, Y. S.; Zhang, T. L., Prediction of G-protein-coupled receptor classes in low homology using Chou’s pseudo amino acid composition with approximate entropy and hydrophobicity patterns, Protein Pept. Lett., 17, 559-567 (2010)
[33] Hall, K. B., RNA-protein interactions, Curr. Opin. Struct. Biol., 12, 283-288 (2002)
[34] Hayat, M.; Khan, A., MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM, J. Theor. Biol., 292, 93-102 (2012) · Zbl 1307.92308
[35] He, Z.; Zhang, J.; Shi, X. H.; Hu, L. L.; Kong, X.; Cai, Y. D.; Chou, K. C., Predicting drug-target interaction networks based on functional groups and biological features, PLoS One, 5, e9603 (2010)
[36] Henikoff, S.; Henikoff, J. G., Amino acid substitution matrices from protein blocks, Proc. Nat. Acad. Sci. U.S.A., 89, 10915-10919 (1992)
[37] Hopcroft, N. H.; Wendt, A. L.; Gollnick, P.; Antson, A. A., Specificity of TRAP-RNA interactions: crystal structures of two complexes with different RNA sequences, Acta Crystallogr., Sect. D: Biol. Crystallogr., 58, 615-621 (2002)
[38] Hu, L.; Huang, T.; Shi, X.; Lu, W. C.; Cai, Y. D.; Chou, K. C., Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties, PLoS One, 6, e14556 (2011)
[39] Hu, L. L.; Huang, T.; Cai, Y. D.; Chou, K. C., Prediction of body fluids where proteins are secreted into based on protein interaction network, PLoS One, 6, e22989 (2011)
[40] Hu, X. Z.; Li, Q. Z., Prediction of the beta-hairpins in proteins using support vector machine, Protein J., 27, 115-122 (2008)
[41] Huang, C.; Zhang, R.; Chen, Z.; Jiang, Y.; Shang, Z.; Sun, P.; Zhang, X.; Li, X., Predict potential drug targets from the ion channel proteins based on SVM, J. Theor. Biol., 262, 750-756 (2010) · Zbl 1403.92205
[42] Huang, T.; Niu, S.; Xu, Z.; Huang, Y.; Kong, X.; Cai, Y. D.; Chou, K. C., Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties, PLoS One, 6, e22940 (2011)
[43] Huang, T.; Shi, X. H.; Wang, P.; He, Z.; Feng, K. Y.; Hu, L.; Kong, X.; Li, Y. X.; Cai, Y. D.; Chou, K. C., Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks, PLoS One, 5, e10972 (2010)
[44] Ison, R. E.; Hovmoller, S.; Kretsinger, R. H., Proteins and their shape strings. An exemplary computer representation of protein structure, IEEE Eng. Med. Biol. Mag., 24, 41-49 (2005)
[45] Jeong, E.; Chung, I. F.; Miyano, S., A neural network method for identification of RNA-interacting residues in protein. Genome informatics, Int. Conf. Genome Inf., 15, 105-116 (2004)
[46] Jeong, E.; Miyano, S., A weighted profile based method for Protein-RNA interacting residues prediction, Lect. Notes Comput. Sci., 3939, 123-139 (2006) · Zbl 1179.92019
[47] Johnstone, O.; Lasko, P., Translational regulation and RNA localization in Drosophila oocytes and embryos, Annu. Rev. Genet., 35, 365-406 (2001)
[48] Kabsch, W.; Sander, C., Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers, 22, 2577-2637 (1983)
[49] Klein, D. J.; Schmeing, T. M.; Moore, P. B.; Steitz, T. A., The kink-turn: a new RNA secondary structure motif, EMBO J., 20, 4214-4221 (2001)
[50] Kretsinger, R. H.; Ison, R. E.; Hovmoller, S., Prediction of protein structure, Methods Enzymol., 383, 1-27 (2004)
[51] Kumar, M.; Gromiha, M. M.; Raghava, G. P., Prediction of RNA binding sites in a protein using SVM and PSSM profile, Proteins, 71, 189-194 (2008)
[52] Kuznetsov, I. B.; Gou, Z.; Li, R.; Hwang, S., Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins, Proteins, 64, 19-27 (2006)
[53] Lee, B.; Richards, F. M., The interpretation of protein structures: estimation of static accessibility, J. Mol. Biol., 55, 379-400 (1971)
[54] Li, D.; Jiang, Z.; Yu, W.; Du, L., Predicting caspase substrate cleavage sites based on a hybrid SVM-PSSM method, Protein Pept. Lett., 17, 1566-1571 (2010)
[55] Li, Q.; Cao, Z.; Liu, H., Improve the prediction of RNA-binding residues using structural neighbours, Protein Pept. Lett., 17, 287-296 (2010)
[56] Lin, H., The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition, J. Theor. Biol., 252, 350-356 (2008) · Zbl 1398.92076
[57] Lin, J.; Wang, Y., Using a novel AdaBoost algorithm and Chou’s pseudo amino acid composition for predicting protein subcellular localization, Protein Pept. Lett., 18, 1219-1225 (2011)
[58] Liu, Z. P.; Wu, L. Y.; Wang, Y.; Zhang, X. S.; Chen, L., Prediction of protein-RNA binding sites by a random forest method with combined features, Bioinformatics, 26, 1616-1622 (2010)
[59] Lundquist, E. A.; Herman, R. K.; Rogalski, T. M.; Mullen, G. P.; Moerman, D. G.; Shaw, J. E., The mec-8 gene of C. elegans encodes a protein with two RNA recognition motifs and regulates alternative splicing of unc-52 transcripts, Development, 122, 1601-1610 (1996)
[60] Luo, X.; Lv, F.; Pan, Y.; Kong, X.; Li, Y.; Yang, Q., Structure-based prediction of the mobility and disorder of water molecules at protein-DNA interface, Protein Pept. Lett., 18, 203-209 (2011)
[61] Mohabatkar, H.; Mohammad Beigi, M.; Esmaeili, A., Prediction of GABAA receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine, J. Theor. Biol., 281, 18-23 (2011) · Zbl 1397.92215
[62] Morozova, N.; Allers, J.; Myers, J.; Shamoo, Y., Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures, Bioinformatics, 22, 2746-2752 (2006)
[63] Ogura, K.; Kishimoto, N.; Mitani, S.; Gengyo-Ando, K.; Kohara, Y., Translational control of maternal glp-1 mRNA by POS-1 and its interacting protein SPN-4 in Caenorhabditis elegans, Development, 130, 2495-2503 (2003)
[64] Pu, X.; Guo, J.; Leung, H.; Lin, Y., Prediction of membrane protein types from sequences and position-specific scoring matrices, J. Theor. Biol., 247, 259-265 (2007) · Zbl 1455.92115
[65] Qiu, J. D.; Huang, J. H.; Shi, S. P.; Liang, R. P., Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform, Protein Pept. Lett., 17, 715-722 (2010)
[66] Shao, X.; Tian, Y.; Wu, L.; Wang, Y.; Jing, L.; Deng, N., Predicting DNA- and RNA-binding proteins from sequences with kernel methods, J. Theor. Biol., 258, 289-293 (2009) · Zbl 1402.92332
[67] Shen, H. B.; Chou, K. C., Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM, Protein Eng. Des. Sel., 20, 561-567 (2007)
[68] Shen, H. B.; Chou, K. C., Predicting protein fold pattern with functional domain and sequential evolution information, J. Theor. Biol., 256, 441-446 (2009) · Zbl 1400.92413
[69] Shu, N.; Zhou, T.; Hovmoller, S., Prediction of zinc-binding sites in proteins from sequence, Bioinformatics, 24, 775-782 (2008)
[70] Spike, C. A.; Davies, A. G.; Shaw, J. E.; Herman, R. K., MEC-8 regulates alternative splicing of unc-52 transcripts in C. elegans hypodermal cells, Development, 129, 4999-5008 (2002)
[71] Tenenbaum, S. A.; Carson, C. C.; Lager, P. J.; Keene, J. D., Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays, Proc. Nat. Acad. Sci. U.S.A., 97, 14085-14090 (2000)
[72] Tenenbaum, S. A.; Lager, P. J.; Carson, C. C.; Keene, J. D., Ribonomics: identifying mRNA subsets in mRNP complexes using antibodies to RNA-binding proteins and genomic arrays, Methods, 26, 191-198 (2002)
[73] Terribilini, M.; Lee, J. H.; Yan, C.; Jernigan, R. L.; Honavar, V.; Dobbs, D., Prediction of RNA binding sites in proteins from amino acid sequence, RNA, 12, 1450-1462 (2006)
[74] Tuschl, T., Functional genomics: RNA sets the standard, Nature, 421, 220-221 (2003)
[75] Varani, G.; Nagai, K., RNA recognition by RNP proteins during RNA processing, Annu. Rev. Biophys. Biomol. Struct., 27, 407-445 (1998)
[76] Vargason, J. M.; Szittya, G.; Burgyan, J.; Hall, T. M., Size selective recognition of siRNA by an RNA silencing suppressor, Cell, 115, 799-811 (2003)
[77] Wang, C. C.; Fang, Y.; Xiao, J.; Li, M., Identification of RNA-binding sites in proteins by integrating various sequence information, Amino Acids, 40, 239-248 (2011)
[78] Wang, G.; Dunbrack, R. L., PISCES: a protein sequence culling server, Bioinformatics, 19, 1589-1591 (2003)
[79] Wang, L.; Brown, S. J., BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences, Nucleic Acids Res., 34, W243-248 (2006)
[80] Wang, P.; Xiao, X.; Chou, K. C., NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features, PLoS One, 6, e23505 (2011)
[81] Wang, P.; Hu, L.; Liu, G.; Jiang, N.; Chen, X.; Xu, J.; Zheng, W.; Li, L.; Tan, M.; Chen, Z.; Song, H.; Cai, Y. D.; Chou, K. C., Prediction of antimicrobial peptides based on sequence alignment and feature selection methods, PLoS One, 6, e18476 (2011)
[82] Wang, Y.; Xue, Z.; Shen, G.; Xu, J., PRINTR: prediction of RNA binding sites in proteins using SVM and profiles, Amino Acids, 35, 295-302 (2008)
[83] Wu, Z. C.; Xiao, X.; Chou, K. C., iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites, Mol. Biosyst., 7, 3287-3297 (2011)
[84] Wu, Z. C.; Xiao, X.; Chou, K. C., iLoc-Gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex gram-positive bacterial proteins, Protein Pept. Lett., 19, 4-14 (2012)
[85] Xiao, X.; Wu, Z. C.; Chou, K. C., iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites, J. Theor. Biol., 284, 42-51 (2011) · Zbl 1397.92238
[86] Xiao, X.; Wu, Z. C.; Chou, K. C., A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites, PLoS One, 6, e20592 (2011)
[87] Xiao, X.; Wang, P.; Chou, K. C., GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions, Mol. Biosyst., 7, 911-919 (2011)
[88] Xiong, W.; Guo, Y.; Li, M., Prediction of lipid-binding sites based on support vector machine and position specific scoring matrix, Protein J., 29, 427-431 (2010)
[89] Zeng, Y. H.; Guo, Y. Z.; Xiao, R. Q.; Yang, L.; Yu, L. Z.; Li, M. L., Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach, J. Theor. Biol., 259, 366-372 (2009) · Zbl 1402.92193
[90] Zhou, T.; Shu, N.; Hovmoller, S., A novel method for accurate one-dimensional protein structure prediction based on fragment matching, Bioinformatics, 26, 470-477 (2010)
[91] Zhou, X. B.; Chen, C.; Li, Z. C.; Zou, X. Y., Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes, J. Theor. Biol., 248, 546-551 (2007) · Zbl 1451.92245
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.