×

Protein classification by stochastic modeling and optimal filtering of amino-acid sequences. (English) Zbl 0791.92012

Summary: The prediction of a protein’s tertiary structural class from its amino- acid sequence is formulated as a signal-processing problem. The amino- acid sequence is treated as a “time series” of symbols containing signals that determine the protein’s structural class. A methodology is described for building detailed stochastic signal models for recognized structural classes of single-domain proteins. We solve the problem of determining that model, from a set of candidates, which is the most probable generator of a protein’s entire amino-acid sequence.
The solution employs a nonlinear, optimal filtering algorithm, which is suited for implementation on parallel computer architectures. Previous approaches have only been able to classify correctly 80% of single- domain proteins with three very broad strucural types, while our approach achieves this level across twelve much more detailed classes.

MSC:

92C40 Biochemistry, molecular biology
60G35 Signal detection and filtering (aspects of stochastic processes)
62M20 Inference from stochastic processes and prediction
92-08 Computational methods for problems pertaining to biology
Full Text: DOI

References:

[1] Anfinsen, C. B., Principles that govern the folding of protein chains, Science, 181, 233-240 (1973)
[2] Argos, P., Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures, J. Mol. Biol., 197, 2, 331-348 (1987)
[3] Bowie, J. U.; Luthy, R.; Eisenberg, D., A method to identify protein sequences that fold into a known three-dimensional structure, Science, 253, 164-169 (1991)
[4] Brunger, A. T.; Karplus, M.; Petsko, G. A., Crystallographic refinement by simulated annealing: application to Crambin, Acta Crystallogr., A45, 50-61 (1989)
[5] Bucy, R. S., Bayes theorem and digital realizations for nonlinear filters, J. Astronaut. Sci., 17, 80-94 (1969)
[6] Bucy, R. S.; Hecht, C.; Senne, K. D., An Engineer’s guide to building nonlinear filters, (Report AD-746921, vols. I and II (1972), F.J. Seiler Research Laboratory, United States Air Force Academy: F.J. Seiler Research Laboratory, United States Air Force Academy Colorado Springs, CO) · Zbl 0258.93022
[7] Chou, P. Y., Prediction of protein structural classes from amino acid compositions, (Fasman, G. D., Prediction of Protein Structures and the Principles of Protein Conformation (1989), Plenum: Plenum New York), 549-586
[8] Chou, P. Y.; Fasman, G., Empirical predictions of protein conformation, Ann. Rev. Biochem., 13, 211 (1978)
[9] Cornette, J. C.; Cease, K. B.; Margalit, H.; Spouge, J. L.; Berzofsky, J. A.; DeLisi, C., Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins, J. Mol. Biol., 195, 3, 685-695 (1987)
[10] Eisenfeld, J.; Vajda, S.; Sugar, I.; DeLisi, C., Constrained optimization and protein structure determination, Am. J. Physiol., 261, 2.1, C376-386 (1991)
[11] Garnier, J.; Osguthorpe, D. J.; Robson, R., Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins, J. Mol. Biol., 120, 97-120 (1978)
[12] Gelb, A., Applied Optimal Estimation (1974), MIT Press: MIT Press Cambridge
[13] Greer, J., Comparative modeling methods: application to the family of the mammalian serine proteases, Proteins, 7, 317-334 (1990)
[14] Henrissat, B.; Saloheimo, M.; Lavaitte, S.; Knowles, J. K.C., Structural homology among the peroxidase enzyme family revealed by hydrophobic cluster analysis, Proteins, 8, 251-257 (1990)
[15] Holley, L. H.; Karplus, M., Proc. Natl. Acad. Sci. USA, 86, 152-156 (1989)
[16] Kabsch, W.; Sander, C., Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features, Biopolymers, 22, 2577-2637 (1983)
[17] Lathrop, R. H.; Webster, T. A.; Smith, T. F., Ariadne: pattern-directed inference and hierarchical abstraction in protein structure recognition, Commun. ACM, 30, 909-921 (1987) · Zbl 0642.92005
[18] Mackay, D. H.J.; Cross, A. J.; Hagler, A. T., The role of energy minimization in simulation strategies of biomolecular systems, (Fasman, G. D., Prediction of Protein Structures and the Principles of Protein Conformation (1989), Plenum: Plenum New York), 317-358
[19] Nishikawa, K.; Kubota, Y.; Ooi, T., Classification of proteins into groups based on amino acid composition and other characteristics, J. Theor. Biol., 146, 145-150 (1990)
[20] Qian, N.; Sejnowski, T. J., Predicting the secondary structure of globular proteins using neural network models, J. Mol. Biol., 202, 865-884 (1988)
[21] Rabiner, L. R., A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, 77, 2, 257-286 (1989)
[22] Richardson, J. S., The anatomy and taxonomy of protein structure, Advances in Protein Chemistry, 34, 167-339 (1981)
[23] Richardson, J. S.; Richardson, D. C., Amino acid preferences for specific locations at the ends of α helices, Science, 240, 1648-1652 (1988)
[24] Schultz, G. E., A critical evaluation of methods for prediction of protein secondary structures, Ann. Rev. Biophys. Biophys. Chem., 17, 1-21 (1988)
[25] Schweppe, F., Evaluation of likelihood functions for Gaussian signals, Trans. IEEE Inform. Theory, IT-11, 1, 61-70 (1965) · Zbl 0127.10805
[26] Smith, R. F.; Smith, T. F., Automatic generation of primary sequence patterns from sets of related protein sequences, Proc. Natl. Acad. Sci. USA, 87, 118-122 (1990)
[27] Smith, T. F.; Waterman, M. S., Comparison of biosequences, Adv. Appl. Math., 2, 482-489 (1981) · Zbl 0489.92004
[28] Stultz, C. M.; White, J. V.; Smith, T. F., Structural analysis based on state-space modeling, Protein Science, 2, 305-314 (1993)
[29] White, J. V., Modeling and filtering for discretely valued time series, (Spall, J. C., Bayesian Analysis of Time Series and Dynamic Models (1988), Dekker: Dekker New York), chap. 10
[30] Churchill, G. A., Stochastic models for heterogeneous DNA sequences, Bull. Math. Biol., 51, 79-94 (1989) · Zbl 0662.92012
[31] Zhu, Q.-L.; Smith, T. F.; Lathrop, R. H.; Figge, J., Acid helix-turn activator motif, Proteins, 8, 156-163 (1990)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.