×

Classification of acoustic events using SVM-based clustering schemes. (English) Zbl 1122.68507

Summary: Acoustic events produced in controlled environments may carry information useful for perceptually aware interfaces. In this paper we focus on the problem of classifying 16 types of meeting-room acoustic events. First of all, we have defined the events and gathered a sound database. Then, several classifiers based on support vector machines (SVM) are developed using confusion matrix based clustering schemes to deal with the multi-class problem. Also, several sets of acoustic features are defined and used in the classification tests. In the experiments, the developed SVM-based classifiers are compared with an already reported binary tree scheme and with their correlative Gaussian mixture model (GMM) classifiers. The best results are obtained with a tree SVM-based classifier that may use a different feature set at each node. With it, a 31.5% relative average error reduction is obtained with respect to the best result from a conventional binary tree scheme.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Bregman, A., Auditory Scene Analysis (1990), MIT Press: MIT Press Cambridge
[2] CHIL—Computers in the Human Interaction Loop, http://chil.server.de/; CHIL—Computers in the Human Interaction Loop, http://chil.server.de/
[3] Lu, L.; Zhang, H.-J.; Jiang, H., Content analysis for audio classification and segmentation, IEEE Trans. Speech Audio Process., 10, 7, 504-516 (2002)
[4] D. Hoiem, Y. Ke, R. Sukthankar, SOLAR: sound object localization and retrieval in complex audio environments, in: International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, March 2005.; D. Hoiem, Y. Ke, R. Sukthankar, SOLAR: sound object localization and retrieval in complex audio environments, in: International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, March 2005.
[5] M. Slaney, Mixtures of probability experts for audio retrieval and indexing, in: IEEE International Conference on Multimedia and Expo, Lausanne, August 2002.; M. Slaney, Mixtures of probability experts for audio retrieval and indexing, in: IEEE International Conference on Multimedia and Expo, Lausanne, August 2002.
[6] L. Kennedy, D. Ellis, Laughter detection in meetings, in: NIST Meeting Recognition Workshop, International Conference on Acoustics, Speech, and Signal Processing, Montreal, May 2004.; L. Kennedy, D. Ellis, Laughter detection in meetings, in: NIST Meeting Recognition Workshop, International Conference on Acoustics, Speech, and Signal Processing, Montreal, May 2004.
[7] J. Pinquier, J. Arias, R. André-Obrecht, Audio classification by search of primary components, in: International Workshop on Image, Video and Audio Retrieval and Mining, Sherbrooke, October 2004.; J. Pinquier, J. Arias, R. André-Obrecht, Audio classification by search of primary components, in: International Workshop on Image, Video and Audio Retrieval and Mining, Sherbrooke, October 2004.
[8] T. Nishiura, S. Nakamura, K. Miki, K. Shikano, Environmental sound source identification based on hidden Markov model for robust speech recognition, in: Eurospeech 2003, Geneva, September 2003, pp. 2157-2160.; T. Nishiura, S. Nakamura, K. Miki, K. Shikano, Environmental sound source identification based on hidden Markov model for robust speech recognition, in: Eurospeech 2003, Geneva, September 2003, pp. 2157-2160.
[9] S. Nakamura, K. Hiyane, F. Asano, T. Nishiura, T. Yamada, Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition, in: Second International Conference on Language Resources & Evaluation, Athens, 2000.; S. Nakamura, K. Hiyane, F. Asano, T. Nishiura, T. Yamada, Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition, in: Second International Conference on Language Resources & Evaluation, Athens, 2000.
[10] D. Gerhard, Audio signal classification: history and current techniques, Technical Report TR-CS 2003-07, November 2003.; D. Gerhard, Audio signal classification: history and current techniques, Technical Report TR-CS 2003-07, November 2003.
[11] Guo, G.; Li, Z., Content-based audio classification and retrieval using support vector machines, IEEE Trans. Neural Networks, 14, 209-215 (2003)
[12] Lu, L.; Li, S. Z.; Zhang, H., Content-based audio classification and segmentation by using support vector machines, ACM Multimedia Systems J., 8, 6, 482-492 (2003)
[13] Hsu, C. W.; Lin, C. J., A comparison of methods for multi-class support vector machines, IEEE Trans. Neural Networks, 13, 415-425 (2002)
[14] ShATR Multiple Simultaneous Speaker Corpus, http://www.dcs.shef.ac.uk/research/groups/spandh/projects/shatrweb/index.html; ShATR Multiple Simultaneous Speaker Corpus, http://www.dcs.shef.ac.uk/research/groups/spandh/projects/shatrweb/index.html
[15] Rabiner, L.; Juang, B. H., Fundamentals of Speech Recognition (1993), Prentice-Hall: Prentice-Hall Englewood Cliffs, NJ
[16] C. Nadeu, J. Hernando, M. Gorricho, On the decorrelation of filter-bank energies in speech recognition, in: European Speech Processing Conference (Eurospeech ’95), Madrid, September 1995, pp. 1381-1384.; C. Nadeu, J. Hernando, M. Gorricho, On the decorrelation of filter-bank energies in speech recognition, in: European Speech Processing Conference (Eurospeech ’95), Madrid, September 1995, pp. 1381-1384.
[17] Burges, C., A tutorial on support vector machines for pattern recognition, Data Mining Knowledge Discovery, 2, 955-975 (1998)
[18] Schölkopf, B.; Smola, A., Learning with Kernels (2002), MIT Press: MIT Press Cambridge, MA
[19] Müller, K.; Mika, S.; Rätsch, G.; Tsuda, K.; Schölkopf, B., An introduction to kernel-based learning algorithms, IEEE Trans. Neural Networks, 12, 181-202 (2001)
[20] Bersekas, D., Nonlinear Programming (1995), Athena Scientific · Zbl 0935.90037
[21] Veropoulos, K.; Campbell, C.; Cristianini, N., Controlling the sensitivity of support vector machines, (Proceedings of International Joint Conference on Artificial Intelligence (1999)), 55-60
[22] I. Gradshteyn, I. Ryzhik, Tables of Integrals, Series, and Products, fifth ed., Academic Press, New York, 1979, p. 1101.; I. Gradshteyn, I. Ryzhik, Tables of Integrals, Series, and Products, fifth ed., Academic Press, New York, 1979, p. 1101. · Zbl 0918.65002
[23] Rifkin, R.; Klautau, A., In defense of one-vs-all classification, J. Mach. Learning Res., 5, 101-141 (2004) · Zbl 1222.68287
[24] Duda, R.; Hart, P.; Stork, D., Pattern Classification (2000), Wiley-Interscience: Wiley-Interscience New York
[25] Voorhees, E. M., Implementing agglomerative hierarchical clustering algorithms for use in document retrieval, Inf. Process. Manage., 22, 465-476 (1986)
[26] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik, Feature selection for SVMS, in: Proceedings of NIPS, 2000.; J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik, Feature selection for SVMS, in: Proceedings of NIPS, 2000.
[27] Y. Liu Y. Yang, J. Carbonell, Boosting to correct inductive bias in text classification, in: International Conference on Information and Knowledge Management (CIKM), McLean, November 2002, pp. 348-355.; Y. Liu Y. Yang, J. Carbonell, Boosting to correct inductive bias in text classification, in: International Conference on Information and Knowledge Management (CIKM), McLean, November 2002, pp. 348-355.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.