×

Component-based discriminative classification for hidden Markov models. (English) Zbl 1175.68311

Summary: Hidden Markov Models (HMMs) have been successfully applied to a wide range of sequence modeling problems. In the classification context, one of the simplest approaches is to train a single HMM per class. A test sequence is then assigned to the class whose HMM yields the maximum a posteriori probability. This generative scenario works well when the models are correctly estimated. However, the results can become poor when improper models are employed, due to the lack of prior knowledge, poor estimates, violated assumptions or insufficient training data.
To improve the results in these cases we propose to combine the descriptive strengths of HMMs with discriminative classifiers. This is achieved by training feature-based classifiers in an HMM-induced vector space defined by specific components of individual hidden Markov models.
We introduce four major ways of building such vector spaces and study which trained combiners are useful in which context. Moreover, we motivate and discuss the merit of our method in comparison to dynamic kernels, in particular, to the Fisher Kernel approach.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T10 Pattern recognition, speech recognition

Software:

PRTools
Full Text: DOI

References:

[1] Rabiner, L., A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, 77, 2, 257-286 (1989)
[2] Jelinek, F., Statistical Methods for Speech Recognition (1998), MIT Press: MIT Press Cambridge, MA
[3] E.V.F. Casacuberta, J.M. Vilar, Architectures for speech-to-speech translation using finite-state models, in: Workshop on Speech-to-Speech Translation: Algorithms and Systems, ACL, Philadelphia, 2002, pp. 39-44.; E.V.F. Casacuberta, J.M. Vilar, Architectures for speech-to-speech translation using finite-state models, in: Workshop on Speech-to-Speech Translation: Algorithms and Systems, ACL, Philadelphia, 2002, pp. 39-44.
[4] Molina, A.; Pla, F., Shallow parsing using specialized HMMs, Journal on Machine Learning Research, 2, 595-613 (2002) · Zbl 1033.68126
[5] H. Bunke, T. Caelli, Hidden Markov Models Applications in Computer Vision, Series in Machine Perception and Artificial Intelligence, vol. 45, World Scientific, Singapore, 2001.; H. Bunke, T. Caelli, Hidden Markov Models Applications in Computer Vision, Series in Machine Perception and Artificial Intelligence, vol. 45, World Scientific, Singapore, 2001.
[6] He, Y.; Kundu, A., 2-D shape classification using hidden Markov model, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 11, 1172-1184 (1991)
[7] Bicego, M.; Murino, V., Investigating hidden Markov models’ capabilities in 2D shape classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 2, 281-286 (2004)
[8] M. Bicego, U. Castellani, V. Murino, Using hidden Markov models and wavelets for face recognition, in: International Conference on Image Analysis and Processing, 2003, pp. 52-56.; M. Bicego, U. Castellani, V. Murino, Using hidden Markov models and wavelets for face recognition, in: International Conference on Image Analysis and Processing, 2003, pp. 52-56.
[9] S. Eickeler, A. Kosmala, G. Rigoll, Hidden Markov model based continuous online gesture recognition, in: International Conference on Pattern Recognition, vol. 2, 1998, pp. 1206-1208.; S. Eickeler, A. Kosmala, G. Rigoll, Hidden Markov model based continuous online gesture recognition, in: International Conference on Pattern Recognition, vol. 2, 1998, pp. 1206-1208.
[10] R. Nowak, Multiscale hidden Markov models for Bayesian image analysis, in: B. Vidakovic, P. Muller (Eds.), Bayesian Inference in Wavelet Based Models, Lecture Notes in Statistics, vol. 141, Springer, Berlin, 1999.; R. Nowak, Multiscale hidden Markov models for Bayesian image analysis, in: B. Vidakovic, P. Muller (Eds.), Bayesian Inference in Wavelet Based Models, Lecture Notes in Statistics, vol. 141, Springer, Berlin, 1999. · Zbl 0940.62091
[11] Krogh, A.; Brown, M.; Mian, I.; Sjolander, K.; Haussler, D., Hidden Markov models in computational biology: applications to protein modeling, Journal of Molecular Biology, 235, 1501-1531 (1994)
[12] Durbin, R.; Eddy, S.; Krogh, A.; Mitchison, G., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (1998), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0929.92010
[13] Gough, J.; Karplus, K.; Hughey, R.; Chothia, C., Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure, Journal of Molecular Biology, 313, 903-919 (2001)
[14] Baum, L.; Egon, J., An inequality with applications to statistical estimation for probabilistic functions of a Markov process and to a model for ecology, Bulletin of the American Meteorology Society, 73, 360-363 (1967) · Zbl 0157.11101
[15] Baum, L., An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, Inequality, 3, 1-8 (1970)
[16] Duda, R.; Hart, P.; Stork, D., Pattern Classification (2001), Wiley: Wiley New York · Zbl 0968.68140
[17] Fine, S.; Singer, Y.; Tishby, N., The hierarchical hidden Markov model: analysis and applications, Machine Learning, 32, 41-62 (1998) · Zbl 0901.68178
[18] Bengio, Y.; Lauzon, V.-P.; Ducharme, R., Experiments on the application of IOHMMs to model financial returns series, IEEE Transactions on Neural Networks, 12, 1, 113-123 (2001)
[19] Ghahramani, Z.; Jordan, M., Factorial hidden Markov models, Machine Learning, 29, 245-273 (1997) · Zbl 0892.68080
[20] M. Brand, N. Oliver, S. Pentland, Coupled hidden Markov models for complex action recognition, in: IEEE International Conference on Computer Vision and Pattern Recognition, 1997.; M. Brand, N. Oliver, S. Pentland, Coupled hidden Markov models for complex action recognition, in: IEEE International Conference on Computer Vision and Pattern Recognition, 1997.
[21] L. Bahl, P. Brown, P. de Souza, R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 86), vol. 11, 1986, pp. 49-52.; L. Bahl, P. Brown, P. de Souza, R. Mercer, Maximum mutual information estimation of hidden Markov model parameters for speech recognition, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 86), vol. 11, 1986, pp. 49-52.
[22] Z. Kaiser, B. Horvat, Z. Kacic, A novel loss function for the overall risk criterion based discriminative training of HMM models, in: International Conference on Spoken Language Processing, vol. 2, Beijing, China, 2000, pp. 887-890.; Z. Kaiser, B. Horvat, Z. Kacic, A novel loss function for the overall risk criterion based discriminative training of HMM models, in: International Conference on Spoken Language Processing, vol. 2, Beijing, China, 2000, pp. 887-890.
[23] K. Na, B. Jeon, D. Chang, S. Chae, S. Ann, Discriminative training of hidden Markov models using overall risk criterion and reduced gradient method, in: European Conference on Speech Communication and Technology, Madrid, Spain, 1995, pp. 97-100.; K. Na, B. Jeon, D. Chang, S. Chae, S. Ann, Discriminative training of hidden Markov models using overall risk criterion and reduced gradient method, in: European Conference on Speech Communication and Technology, Madrid, Spain, 1995, pp. 97-100.
[24] Kaiser, Z.; Horvat, B.; Kacic, Z., Overall risk criterion estimation of hidden Markov model parameters, Speech Communication, 38, 3-4, 383-398 (2002) · Zbl 1005.68819
[25] Woodland, P. C.; Povey, D., Large scale discriminative training of hidden Markov models for speech recognition, Computer Speech and Language, 16, 25-47 (2002)
[26] M. Gales, Discriminative models for speech recognition, in: Information Theory and Applications Workshop, 2007.; M. Gales, Discriminative models for speech recognition, in: Information Theory and Applications Workshop, 2007.
[27] J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labelling sequence data, in: International Conference on Machine Learning, 2001, pp. 591-598.; J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labelling sequence data, in: International Conference on Machine Learning, 2001, pp. 591-598.
[28] A. Gunawardana, M. Mahajan, A. Acero, J. Platt, Hidden conditional random fields for phone classification, in: Interspeech, Lisbon, Portugal, 2005, pp. 1117-1120.; A. Gunawardana, M. Mahajan, A. Acero, J. Platt, Hidden conditional random fields for phone classification, in: Interspeech, Lisbon, Portugal, 2005, pp. 1117-1120.
[29] Y. Altun, I. Tsochantaridis, T. Hofmann, Hidden Markov support vector machines, in: International Conference on Machine Learning, 2003, pp. 3-10.; Y. Altun, I. Tsochantaridis, T. Hofmann, Hidden Markov support vector machines, in: International Conference on Machine Learning, 2003, pp. 3-10.
[30] M. Collins, Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithm, in: Conference on Empirical Methods in Natural Language Processing, 2002.; M. Collins, Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithm, in: Conference on Empirical Methods in Natural Language Processing, 2002.
[31] S. Chakrabartty, G. Cauwenberghs, Forward decoding kernel machines: a hybrid HMM/SVM approach to sequence recognition, in: International Conference on Pattern Recognition: SVM Workshop, 2002.; S. Chakrabartty, G. Cauwenberghs, Forward decoding kernel machines: a hybrid HMM/SVM approach to sequence recognition, in: International Conference on Pattern Recognition: SVM Workshop, 2002. · Zbl 1064.68578
[32] Wolpert, D., Stacked generalization, Neural Networks, 5, 2, 241-260 (1992)
[33] Ting, K.; Witten, I. H., Issues in stacked generalization, Journal of Artificial Intelligence Research, 10, 271-289 (1999) · Zbl 0915.68075
[34] R. Duin, The combining classifier: to train or not to train? in: International Conference on Pattern Recognition, vol. II, Canada, 2002, pp. 765-770.; R. Duin, The combining classifier: to train or not to train? in: International Conference on Pattern Recognition, vol. II, Canada, 2002, pp. 765-770.
[35] Bicego, M.; Murino, V.; Figueiredo, M., Similarity-based classification of sequences using hidden Markov models, Pattern Recognition, 37, 12, 2281-2291 (2004)
[36] M. Bicego, E. Pe¸kalska, R. Duin, Group-induced vector spaces, in: M. Haindl, J. Kittler, F. Roli (Eds.), Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 4472, Springer, Berlin, 2007, pp. 190-199.; M. Bicego, E. Pe¸kalska, R. Duin, Group-induced vector spaces, in: M. Haindl, J. Kittler, F. Roli (Eds.), Multiple Classifier Systems, Lecture Notes in Computer Science, vol. 4472, Springer, Berlin, 2007, pp. 190-199.
[37] Lai, C.; Tax, D.; Duin, R.; Pe¸kalska, E.; Paclík, P., A study on combining image representations for image classification and retrieval, International Journal of Pattern Recognition and Artificial Intelligence, 18, 5, 867-890 (2004)
[38] T. Jaakkola, D. Haussler, Exploiting generative models in discriminative classifiers, in: Advances in Neural Information Processing Systems, 1999, pp. 487-493.; T. Jaakkola, D. Haussler, Exploiting generative models in discriminative classifiers, in: Advances in Neural Information Processing Systems, 1999, pp. 487-493.
[39] Lodhi, H.; Saunders, C.; Shawe-Taylor, J.; Cristianini, N.; Watkins, C., Text classification using string kernels, Journal of Machine Learning Research, 2, 419-444 (2002) · Zbl 1013.68176
[40] Cortes, C.; Haffner, P.; Mohri, M., Rational kernels: theory and algorithms, Journal of Machine Learning Research, 5, 1035-1062 (2004) · Zbl 1222.68175
[41] N. Smith, M. Gales, Speech recognition using svms, in: Advances in Neural Information Processing Systems, 2002, pp. 1197-1204.; N. Smith, M. Gales, Speech recognition using svms, in: Advances in Neural Information Processing Systems, 2002, pp. 1197-1204.
[42] L. Chen, H. Man, Combination of Fisher Scores and appearance based features for face recognition, in: ACM SIGMM Multimedia Biometrics Methods and Applications Workshop, 2003, pp. 74-81.; L. Chen, H. Man, Combination of Fisher Scores and appearance based features for face recognition, in: ACM SIGMM Multimedia Biometrics Methods and Applications Workshop, 2003, pp. 74-81.
[43] L. Chen, H. Man, Discriminant analysis of stochastic models and its application to face recognition, in: International Workshop on Analysis and Modeling of Faces and Gestures, 2003, pp. 5-10.; L. Chen, H. Man, Discriminant analysis of stochastic models and its application to face recognition, in: International Workshop on Analysis and Modeling of Faces and Gestures, 2003, pp. 5-10.
[44] M. Layton, M. Gales, Augmented statistical models: exploiting generative models in discriminative classifiers, in: Advances in Neural Information Processing Systems, 2005.; M. Layton, M. Gales, Augmented statistical models: exploiting generative models in discriminative classifiers, in: Advances in Neural Information Processing Systems, 2005.
[45] N. Smith, Using augmented statistical models and score spaces for classification, Ph.D. Thesis, Engineering Department, Cambridge University, 2003.; N. Smith, Using augmented statistical models and score spaces for classification, Ph.D. Thesis, Engineering Department, Cambridge University, 2003.
[46] R. Duin, P. Juszczak, P. Paclík, E. Pe¸kalska, D. De Ridder, D. Tax, Prtools4, a matlab toolbox for pattern recognition, Delft University of Technology, \(2004 \langle;\) http://www.prtools.org \(\rangle;\).; R. Duin, P. Juszczak, P. Paclík, E. Pe¸kalska, D. De Ridder, D. Tax, Prtools4, a matlab toolbox for pattern recognition, Delft University of Technology, \(2004 \langle;\) http://www.prtools.org \(\rangle;\).
[47] Fukanaga, K., Introduction to Statistical Pattern Recognition (1990), Academic Press: Academic Press San Diego · Zbl 0711.62052
[48] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2001), Springer: Springer Berlin · Zbl 0973.62007
[49] Everitt, B. S., The Cambridge Dictionary of Statistics (2003) · Zbl 0995.62001
[50] Schölkopf, B.; Smola, A., Learning with Kernels (2002), MIT Press: MIT Press Cambridge, MA
[51] Bennett, K., Combining support vector and mathematical programming methods for induction, (Schölkopf, B.; Burges, C.; Smola, A., Advances in Kernel Methods—SV Learning (1999), MIT Press: MIT Press Cambridge, MA), 307-326
[52] Bellman, R., Adaptive Control Processes: A Guided Tour (1961), Princeton University Press: Princeton University Press New York · Zbl 0103.12901
[53] Vapnik, V., Statistical Learning Theory (1998), Wiley: Wiley New York · Zbl 0935.62007
[54] Bhattacharyya, C.; Grate, L.; Rizki, A.; Radisky, D.; Molina, F.; Jordan, M.; Bissel, M.; Mian, I., Simultaneous classification and relevant feature identification in high-dimensional spaces: application to molecular profiling data, Signal Processing, 83, 729-743 (2003) · Zbl 1144.90448
[55] Chen, L.; Man, H.; Nefian, A. V., Face recognition based on multi-class mapping of Fisher Scores, Pattern Recognition, 38, 799-811 (2005)
[56] Tsuda, K.; Kin, T.; Asai, K., Marginalised kernels for biological sequences, Bioinformatics, 18, 268-275 (2002)
[57] Layton, M.; Gales, M., Acoustic modelling using continuous rational kernels, Journal of VLSI Signal Processing, 48, 67-82 (2007)
[58] Amari, S., Natural gradient works efficiently in learning, Neural Computation, 10, 251-276 (1998)
[59] N. Arica, F. Yarman-Vural, A shape descriptor based on circular hidden Markov model, in: International Conference on Pattern Recognition, vol. 1, 2000, pp. 924-927.; N. Arica, F. Yarman-Vural, A shape descriptor based on circular hidden Markov model, in: International Conference on Pattern Recognition, vol. 1, 2000, pp. 924-927.
[60] G. Andreu, A. Crespo, J. Valiente, Selecting the toroidal self-organizing feature maps (TSOFM) best organized to object recognition, in: International Conference on Neural Networks, vol. 2, 1997, pp. 1341-1346.; G. Andreu, A. Crespo, J. Valiente, Selecting the toroidal self-organizing feature maps (TSOFM) best organized to object recognition, in: International Conference on Neural Networks, vol. 2, 1997, pp. 1341-1346.
[61] Mollineda, R.; Vidal, E.; Casacuberta, F., Cyclic sequence alignments: approximate versus optimal techniques, International Journal of Pattern Recognition and Artificial Intelligence, 16, 3, 291-299 (2002)
[62] M. Kadous, Temporal classification: extending the classification paradigm to multivariate time series, Ph.D. Thesis, School of Computer Science and Engineering, University of New South Wales, 2002.; M. Kadous, Temporal classification: extending the classification paradigm to multivariate time series, Ph.D. Thesis, School of Computer Science and Engineering, University of New South Wales, 2002.
[63] M. Kadous, Learning comprehensible descriptions of multivariate time series, in: International Conference on Machine Learning, 1999, pp. 454-463.; M. Kadous, Learning comprehensible descriptions of multivariate time series, in: International Conference on Machine Learning, 1999, pp. 454-463.
[64] Kudo, M.; Toyama, J.; Shimbo, M., Multidimensional curve classification using passing-through regions, Pattern Recognition Letters, 20, 11-13, 1103-1111 (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.