[1] |
Asgari, E. and Mofrad, M.R.K. (2015). ProtVec: a continuous distributed representation of biological sequences for proteomics and genomics. PLoS One 10: e0141287. doi:10.1371/journal.pone.0141287. · doi:10.1371/journal.pone.0141287 |
[2] |
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.. (2000). Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature genetics 25: 25-29. doi:10.1038/75556. · doi:10.1038/75556 |
[3] |
Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. |
[4] |
Barrell, D., Dimmer, E., Huntley, R.P., Binns, D., O’Donovan, C., and Apweiler, R. (2009). The Goa database in 2009-an integrated gene ontology annotation resource. Nucleic Acids Res. 37: D396-D403. doi:10.1093/nar/gkn803. · doi:10.1093/nar/gkn803 |
[5] |
Cai, Y., Wang, J., and Deng, L. (2020). SDN2GO: an integrated deep learning model for protein function prediction. Front. Bioeng. Biotechnol. 8: 391, doi:10.3389/fbioe.2020.00391. · doi:10.3389/fbioe.2020.00391 |
[6] |
Cao, R., Freitas, C., Chan, L., Sun, M., Jiang, H., and Chen, Z. (2017). ProLanGO: protein function prediction using neural machine translation based on a recurrent neural network. Molecules 22: 1732. doi:10.3390/molecules22101732. · doi:10.3390/molecules22101732 |
[7] |
Chen, H., Sun, M., Tu, C., Lin, Y., and Liu, Z. (2016). Neural sentiment classification with user and product attention. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 1650-1659. |
[8] |
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing. Association for Computational Linguistics. |
[9] |
Choi, K., Lee, Y., Kim, C., Yoon, M. (2021). An effective GCN-based hierarchical multilabel classification for protein function prediction. arXiv:2112.02810. |
[10] |
Clark, W.T. and Radivojac, P. (2011a). Analysis of protein function and its prediction from amino acid sequence. Proteins Struct. Funct. Bioinf. 79: 2086-2096. doi:10.1002/prot.23029. · doi:10.1002/prot.23029 |
[11] |
Clark, W.T. and Radivojac, P. (2011b). Analysis of protein function and its prediction from amino acid sequence. Proteins Struct. Funct. Bioinf. 79: 2086-2096. doi:10.1002/prot.23029. · doi:10.1002/prot.23029 |
[12] |
Consortium, U. (2015). Uniprot: a hub for protein information. Nucleic Acids Res. 43: D204-D212. doi:10.1093/nar/gku989. · doi:10.1093/nar/gku989 |
[13] |
Dutta, P. and Saha, S. (2017). Fusion of expression values and protein interaction information using multi-objective optimization for improving gene clustering. Comput. Biol. Med. 89: 31-43. doi:10.1016/j.compbiomed.2017.07.015. · doi:10.1016/j.compbiomed.2017.07.015 |
[14] |
Dutta, P. and Saha, S. (2020). Amalgamation of protein sequence, structure and textual information for improving protein-protein interaction identification. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 6396-6407. |
[15] |
Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., Gibbs, T., Feher, T., Angerer, C., Steinegger, M., et al.. (2021). ProtTrans: towards cracking the language of life’s code through self-supervised deep learning and high performance computing. IEEE Trans. Pattern Anal. Mach. Intell. 14: 1. |
[16] |
Elsayed, N., Maida, A.S., and Bayoumi, M. (2019). Deep gated recurrent and convolutional network hybrid model for univariate time series classification. Int. J. Adv. Comput. Sci. Appl. 10: 654-664. doi:10.14569/ijacsa.2019.0100582. · doi:10.14569/ijacsa.2019.0100582 |
[17] |
Forslund, K. and Sonnhammer, E.L. (2008). Predicting protein function from domain content. Bioinformatics 24: 1681-1687. doi:10.1093/bioinformatics/btn447. · doi:10.1093/bioinformatics/btn447 |
[18] |
Giri, S.J., Dutta, P.Student Member, Halan, P., and Saha, S. (2020). MultiPredGO: deep multi-modal protein function prediction by amalgamating protein structure, sequence, and interaction information. IEEE J. Biomed. Health Inform 25: 1832-1838. |
[19] |
Heinzinger, M., Elnaggar, A., Wang, Y., Dallago, C., Nechaev, D., Matthes, F., and Rost, B. (2019). Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinform. 20: 723. doi:10.1186/s12859-019-3220-8. · doi:10.1186/s12859-019-3220-8 |
[20] |
Hunter, S., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bork, P., Das, U., Daugherty, L., Duquenne, L., et al.. (2009). Interpro: the integrative protein signature database. Nucleic Acids Res. 37: D211-D215. doi:10.1093/nar/gkn785. · doi:10.1093/nar/gkn785 |
[21] |
Jiang, Y., Oron, T.R., Clark, W.T., Bankapur, A.R., D’Andrea, D., Lepore, R., Funk, C.S., Kahanda, I., Verspoor, K.M., Ben-Hur, A., et al.. (2016). An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17: 184. doi:10.1186/s13059-016-1037-6. · doi:10.1186/s13059-016-1037-6 |
[22] |
Jinbao, T., Weiwei, K., Qiaoxin, T., and Zhaoqian, W. (2021). Text classification method based on LSTM-attention and CNN hybrid model. Comput. Eng. Appl. 57: 154-162. |
[23] |
Kabir, A. and Shehu, A. (2022). Transformer neural networks attending to both sequence and structure for protein prediction tasks, arXiv:2206.11057. |
[24] |
Kabir, A. and Shehu, A. (2022). GOProFormer: a multi-modal transformer method for gene ontology protein function prediction. |
[25] |
Kingma, D.P. and Ba, J. (2014). Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. |
[26] |
Kuang, S., Li, J., Branco, A., Luo, W.-H., and Xiong, D. (2018). Attention focusing for neural machine translation by bridging source and target embeddings. In: Proceedings of the 56th annual meeting of the association for computational linguistics, Vol. 1, Long Papers, pp. 1767-1776. |
[27] |
Kulmanov, M. and Hoehndorf, R. (2020). Deepgoplus: improved protein function prediction from sequence. Bioinformatics 36: 422-429. doi:10.1093/bioinformatics/btz595. · doi:10.1093/bioinformatics/btz595 |
[28] |
Kulmanov, M., Khan, M.A., and Hoehndorf, R. (2018). DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 34: 660-668. doi:10.1093/bioinformatics/btx624. · doi:10.1093/bioinformatics/btx624 |
[29] |
Le, N.Q.K., Yapp, E.K.Y., and Yeh, H.Y. (2019). ET-GRU: using multi-layer gated recurrent units to identify electron transport proteins. BMC Bioinf. 20: 377. doi:10.1186/s12859-019-2972-5. · doi:10.1186/s12859-019-2972-5 |
[30] |
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., and Jackel, L.D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Comput. 1: 541-551. doi:10.1162/neco.1989.1.4.541. · doi:10.1162/neco.1989.1.4.541 |
[31] |
Li, Y., Wang, X., and Xu, P. (2018). Chinese text classification model based on deep learning. Future Internet 10: 113, doi:10.3390/fi10110113. · doi:10.3390/fi10110113 |
[32] |
Li, J., Wang, L., Zhang, X., Liu, B., and Wang, Y. (2020). Gonet: a deep network to annotate proteins via recurrent convolution networks. In: 2020 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp. 29-34. |
[33] |
Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., dos Santos Costa, A., Fazel-Zarandi, M., Sercu, T., Candido, S., et al.. (2022). Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv 379: 1123-1130, doi:10.1101/2022.07.20.500902. · doi:10.1101/2022.07.20.500902 |
[34] |
Marquet, C., Heinzinger, M., Olenyi, T., Dallago, C., Erckert, K., Bernhofer, M., Nechaev, D., and Rost, B. (2022). Embeddings from protein language models predict conservation and variant effects. Hum. Genet. 141: 1629-1647. |
[35] |
Nambiar, A., Heflin, M., Liu, S., Maslov, S., Hopkins, M., and Ritz, A. (2020). Transforming the language of life: transformer neural networks for protein prediction tasks. In: Proceedings of the international conference on bioinformatics, computational biology, and health informatics (BCB). ACM, pp. 1-8. |
[36] |
Pearson, W.R. (2013). An introduction to sequence similarity (“homology”) searching. Curr. Protoc. Bioinformatics 42: 3-1. doi:10.1002/0471250953.bi0301s42. · doi:10.1002/0471250953.bi0301s42 |
[37] |
Piovesan, D., Giollo, M., Leonardi, E., Ferrari, C., and Tosatto, S.C. (2015). INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity. Nucleic Acids Res. 43: W134-W140. doi:10.1093/nar/gkv523. · doi:10.1093/nar/gkv523 |
[38] |
Ranjan, A., Fahad, M.S., Fernandez-Baca, D., Deepak, A., and Tripathi, S. (2019). Deep robust framework for protein function prediction using variable-length protein sequences. IEEE ACM Trans. Comput. Biol. Bioinf. 17: 1648-1659, doi:10.1109/tcbb.2019.2911609. · doi:10.1109/tcbb.2019.2911609 |
[39] |
Ranjan, A., Fernandez-Baca, D., Tripathi, S., and Deepak, A. (2021a). An ensemble Tf-Idf based approach to protein function prediction via sequence segmentation. IEEE ACM Trans. Comput. Biol. Bioinf. 19: 2685-2696. doi:10.1109/TCBB.2021.3093060. · doi:10.1109/TCBB.2021.3093060 |
[40] |
Ranjan, A., Tiwari, A., and Deepak, A. (2021b). A sub-sequence based approach to protein function prediction via multi-attention based multi-aspect network. IEEE ACM Trans. Comput. Biol. Bioinf. 20: 94-105. doi:10.1109/TCBB.2021.3130923. · doi:10.1109/TCBB.2021.3130923 |
[41] |
Ranjan, A., Fahad, M.S., Fernandez-Baca, D., Tripathi, S., and Deepak, A. (2022). MCWS-transformers: towards an efficient modeling of protein sequences via multi context-window based scaled self-attention. IEEE ACM Trans. Comput. Biol. Bioinf. 20: 1188-1199, doi:10.1109/TCBB.2022.3173789. · doi:10.1109/TCBB.2022.3173789 |
[42] |
Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C.L., Ma, J., et al.. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. U. S. A. 118: e2016239118. doi:10.1073/pnas.2016239118. · doi:10.1073/pnas.2016239118 |
[43] |
Roy, A., Yang, J., and Zhang, Y. (2012). COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res. 40: W471-W477, doi:10.1093/nar/gks372. · doi:10.1093/nar/gks372 |
[44] |
Sharan, R., Ulitsky, I., and Shamir, R. (2007). Network-based prediction of protein function. Mol. Syst. Biol. 3: 88-100, doi:10.1038/msb4100129. · doi:10.1038/msb4100129 |
[45] |
Stark, H., Dallago, C., Heinzinger, M., and Rost, B. (2021). Light attention predicts protein location from the language of life. Bioinform. Adv. 1: vbab035. doi:10.1093/bioadv/vbab035. · doi:10.1093/bioadv/vbab035 |
[46] |
Strodthoff, N., Wagner, P., Wenzel, M., and Samek, W. (2020). UDSMProt: universal deep sequence models for protein classification. Bioinformatics 36: 2401-2409. doi:10.1093/bioinformatics/btaa003. · doi:10.1093/bioinformatics/btaa003 |
[47] |
Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., HuertaCepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K.P., et al.. (2015). String v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43: D447-D452. doi:10.1093/nar/gku1003. · doi:10.1093/nar/gku1003 |
[48] |
Wang, H., Yan, L., Huang, H., and Ding, C. (2016). From protein sequence to protein function via multi-label linear discriminant analysis. IEEE ACM Trans. Comput. Biol. Bioinf. 14: 503-513. doi:10.1109/tcbb.2016.2591529. · doi:10.1109/tcbb.2016.2591529 |
[49] |
Wang, H., Yan, L., Huang, H., and Ding, C. (2017). From protein sequence to protein function via multi-label linear discriminant analysis. IEEE ACM Trans. Comput. Biol. Bioinf. 14: 503-513, doi:10.1109/tcbb.2016.2591529. · doi:10.1109/tcbb.2016.2591529 |
[50] |
Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., and Zhang, Y. (2015). The itasser suite: protein structure and function prediction. Nat. Methods 12: 7. doi:10.1038/nmeth.3213. · doi:10.1038/nmeth.3213 |
[51] |
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., and Hovy, E.H. (2016). Hierarchical attention networks for document classification. In: Proc. HLT-NAACL, pp. 1480-1489. |
[52] |
Yang, L., Wei, P., Zhong, C., Li, X., and Tang, Y. Y. (2020). Protein structure prediction based on BN-GRU method. Int. J. Wavelets Multiresolut. Inf. Process. 18: 2050045, doi:10.1142/s0219691320500459. · Zbl 1538.92034 · doi:10.1142/s0219691320500459 |
[53] |
You, R., Yao, S., Xiong, Y., Huang, X., Sun, F., Mamitsuka, H., and Zhu, S. (2019). Netgo: improving large-scale protein function prediction with massive network information. Nucleic Acids Res. 47: W379-W387. doi:10.1093/nar/gkz388. · doi:10.1093/nar/gkz388 |
[54] |
Zhang, Y., Yuan, H., Wang, J., and Zhang, X. (2017). Using a CNN-LSTM model for sentiment Intensity prediction [C]. In: Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis. Association for Computational Linguistics, pp. 200-204. |
[55] |
Zhang, C., Zheng, W., Freddolino, P.L., and Zhang, Y. (2018). Metago: predicting gene ontology of non-homologous proteins through low-resolution protein structure prediction and protein-protein network mapping. J. Mol. Biol. 430: 2256-2265. doi:10.1016/j.jmb.2018.03.004. · doi:10.1016/j.jmb.2018.03.004 |