Analysing the localisation sites of proteins through neural networks ensembles

121 Accesses
8 Citations
Explore all metrics

Abstract

Scientists involved in the area of proteomics are currently seeking integrated, customised and validated research solutions to better expedite their work in proteomics analyses and drug discoveries. Some drugs and most of their cell targets are proteins, because proteins dictate biological phenotype. In this context, the automated analysis of protein localisation is more complex than the automated analysis of DNA sequences; nevertheless the benefits to be derived are of same or greater importance. In order to accomplish this target, the right choice of the kind of the methods for these applications, especially when the data set is drastically imbalanced, is very important and crucial. In this paper we investigate the performance of some commonly used classifiers, such as the K nearest neighbours and feed-forward neural networks with and without cross-validation, in a class of imbalanced problems from the bioinformatics domain. Furthermore, we construct ensemble-based schemes using the notion of diversity, and we empirically test their performance on the same problems. The experimental results favour the generation of neural network ensembles as these are able to produce good generalisation ability and significant improvement compared to other single classifier methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of Protein Structure Classes with Ensemble Classifiers

Augmented sequence features and subcellular localization for functional characterization of unknown protein sequences

Article 20 September 2021

Deep learning model with ensemble techniques to compute the secondary structure of proteins

Article 02 November 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Boland MV, Murphy RF (1999) After sequencing: quantitative analysis of protein localization. IEEE Eng Med Biol Sept/Oct:115–119
Liang P, Labedan B, Riley M (2002) Physiological genomics of Escherichia coli protein families. Physiol Genomics 9(1):15–26
Google Scholar
Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, Anvik J, Macdonell C, Eisner R (2004) Predicting subcellular localization of proteins using machine learned classifiers. Bioinformatics 20:547–556
Article Google Scholar
Clare A, King RD (2003) Predicting gene function in Saccharomyces cerevisiae. Bioinformatics 19:42–49
Article Google Scholar
Neagu D, Palade V (2003) A neuro-fuzzy approach for fuctional genomics data interpretation and analysis. Neural Comput Appl 12:153–159
Article Google Scholar
Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in gram-negative bacteria. Proteins: Struct Funct Genet 11:95–110
Article Google Scholar
Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14:897–911
Article Google Scholar
Horton P, Nakai K (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. In: Proceedings of the 4th international conference on intelligent systems for molecular biology, AAAI Press, St. Louis, pp 109–115
Horton P, Nakai K (1997) Better prediction of protein cellular localization sites with the k nearest neighbors classifier. In: Proceedings of intelligent systems in molecular biology, Halkidiki, Greece, pp 368–383
Cairns P, Huyck C, Mitchell I, Wu W (2001) A comparison of categorisation algorithms for predicting the cellular localization sites of proteins. In: Proceedings of IEEE international workshop on database and expert systems applications, pp 296–300
Bolat B, Yıldırım T (2003) A data selection method for probabilistic neural networks. In: International XII. Turkish symposium on artificial intelligence and neural networks—TAINN, pp 1137–1140
Tan AC, Gilbert D (2003) An empirical comparison of supervised machine learning techniques in bioinformatics. In: Proceedings of the first Asia Pacific bioinformatics conference (APBC 2003), Adelaide, Australia. Australian Computer Society, Sydney. Chen P (ed) Conferences in research and practice in information technology, vol 19, pp 219–222
Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York
MATH Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClellend JL (eds) Parallel distributed processing: explorations in the microstructure of cognition. MIT Press, Cambridge, pp 318–362
Google Scholar
Sima J (1996) Back propagation is not efficient. Neural Netw 6:1017–1023
Article Google Scholar
Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: Proceedings of international conference on neural networks, San Francisco, CA, pp 586–591
Riedmiller M (1994) RPROP-description and implementation details. Technical Report, University of Karlsruhe, Germany
Udelhoven T, Schutt B (2000) Capability of feed-forward neural networks for a chemical evaluation of sediments with diffuse reflectance spectroscopy. Chemometr Intell Lab Syst 51:9–22
Article Google Scholar
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001
Article Google Scholar
Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Tesauro G, Touretzky D, Leen T (eds) Advances in neural information processing systems, vol 2, pp 650–659
Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198
MATH Google Scholar
Sharkey AJC (1996) On combining artificial neural nets. Connect Sci 8:299–314
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
MATH MathSciNet Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international machine learning conference, pp 148–156
Sharkey AJC, Sharkey NE (1997) Combining diverse neural nets. Knowl Eng Rev 12:231–247
Article Google Scholar
Zenobi G, Cunningham P (2001) Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: Proceedings of the European conference on machine learning, pp 576–587
Murphy PM, Aha DW (1996) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn
Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y (1997) The complete genome sequence of Escherichia coli K-12. Science 277(5331):1453–1474
Article Google Scholar
Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, James Darnell J (2003) Molecular cell biology, 5th edn. Freeman, San Francisco, CA
Google Scholar
Van Belle D, Andre B (2001) A genomic view of yeast membrane transporters. Curr Opin Cell Biol 13(4):389–398
Article Google Scholar
Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting Subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300:1005–1016
Article Google Scholar
Igel C, Husken M (2003) Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 50:105–123
Article MATH Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence, AAAI Press and MIT Press, pp 223–228
Nugent CD, Lopez JA, Smith AE 1, Black ND (2002) Prediction models in the design of neural network based ECG classifiers: a neural network and genetic programming approach. BMC Med Inform Decis Making 2(1)
Snedecor G, Cochran W (1989) Statistical methods, 8th edn. Iowa State University Press, Ames, IA
MATH Google Scholar

Download references

Acknowledgements

We would like to thank Dr Maria Roubelakis of Oxford University for assistance in biological aspects of this work.

Author information

Authors and Affiliations

School of Computer Science and Information Systems, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK
Aristoklis D. Anastasiadis & George D. Magoulas

Authors

Aristoklis D. Anastasiadis
View author publications
You can also search for this author in PubMed Google Scholar
George D. Magoulas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aristoklis D. Anastasiadis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anastasiadis, A.D., Magoulas, G.D. Analysing the localisation sites of proteins through neural networks ensembles. Neural Comput & Applic 15, 277–288 (2006). https://doi.org/10.1007/s00521-006-0029-y

Download citation

Received: 21 July 2004
Accepted: 10 January 2006
Published: 15 February 2006
Issue Date: June 2006
DOI: https://doi.org/10.1007/s00521-006-0029-y

Analysing the localisation sites of proteins through neural networks ensembles

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Prediction of Protein Structure Classes with Ensemble Classifiers

Augmented sequence features and subcellular localization for functional characterization of unknown protein sequences

Deep learning model with ensemble techniques to compute the secondary structure of proteins

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Analysing the localisation sites of proteins through neural networks ensembles

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Prediction of Protein Structure Classes with Ensemble Classifiers

Augmented sequence features and subcellular localization for functional characterization of unknown protein sequences

Deep learning model with ensemble techniques to compute the secondary structure of proteins

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation