Abstract
Automatic recognition of human emotions is a relatively new field and is attracting significant attention in research and development areas because of the major contribution it could make to real applications. Previously, several studies reported speech emotion recognition using acted emotional corpus. For real world applications, however, spontaneous corpora should be used in recognizing human emotions from speech. This study focuses on speech emotion recognition using the FAU Aibo spontaneous children’s corpus. A method based on the integration of feed-forward deep neural networks (DNN) and the i-vector paradigm is proposed, and another method based on deep convolutional neural networks (DCNN) for feature extraction and extremely randomized trees as classifier is presented. For the classification of five emotions using balanced data, the proposed methods showed unweighted average recalls (UAR) of 61.1% and 59.2%, respectively. These results are very promising showing the effectiveness of the proposed methods in speech emotion recognition. The two proposed methods based on deep learning (DL) were compared to a support vector machines (SVM) based method and they demonstrated superior performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22, 1533–1545 (2014)
Attabi, Y., Alam, J., Dumouchel, P., Kenny, P., Shaughnessy, D.O.: Multiple windowed spectral features for emotion recognition. In: Proceedings of ICASSP, pp. 7527–7531 (2013)
Bielefeld, B.: Language identification using shifted delta cepstrum. In: Fourteenth Annual Speech Research Symposium (1994)
Busso, C., Bulut, M., Narayanan, S.: Toward effective automatic recognition systems of emotion in speech. In: Gratch, J., Marsella, S. (eds.) Social Emotions in Nature and Artifact: Emotions in Human and Human-Computer Interaction, pp. 110–127. Oxford University Press, New York (2013)
Cao, H., Verma, R., Nenkova, A.: Combining ranking and classification to improve emotion recognition in spontaneous speech. In: Proceedings of INTERSPEECH (2012)
Friedman, J., Hastie, T., et al.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)
Ganapathy, S., Han, K., Thomas, S., Omar, M., Segbroeck, M.V., Narayanan, S.S.: Robust language identification using convolutional neural network features. In: Proceedings of Interspeech (2014)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of Interspeech, pp. 2023–2027 (2014)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Ho, T.K.: Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp. 278–282 (1995)
Huynh, X.-P., Tran, T.-D., Kim, Y.-G.: Convolutional Neural Network Models for Facial Expression Recognition Using BU-3DFE Database. In: Information Science and Applications (ICISA) 2016. LNEE, vol. 376, pp. 441–450. Springer, Singapore (2016). https://doi.org/10.1007/978-981-10-0557-2_44
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012)
Le, D., Provost, E.M.: Emotion recognition from spontaneous speech using Hidden Markov models with deep belief networks. In: Proceedings of IEEE ASRU, pp. 216–221 (2013)
Lim, W., Jang, D., Lee, T.: Speech emotion recognition using convolutional and recurrent neural networks. In: Proceedings of Signal and Information Processing Association Annual Summit and Conference (APSIPA) (2016)
Liu, R.X.Y.: Using i-vector space model for emotion recognition. In: Proceedings of Interspeech, pp. 2227–2230 (2012)
Metallinou, A., Lee, S., Narayanan, S.: Decision level combination of multiple modalities for recognition and analysis of emotional expression. In: Proceedings of ICASSP, pp. 2462–2465 (2010)
Mohammad, Y., Matsumoto, K., Hoashi, K.: Deep feature learning and selection for activity recognition. In: Proceedings of the 33rd ACM/SIGAPP Symposium On Applied Computing, pp. 926–935. ACM SAC (2018)
Nicholson, J., Takahashi, K., Nakatsu, R.: Emotion recognition in speech using neural networks. Neural Comput. Appl. 9(4), 290–296 (2000)
Pan, Y., Shen, P., Shen, L.: Speech emotion recognition using support vector machine. Int. J. Smart Home 6(2), 101–108 (2012)
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Commun. 29, 2352–2449 (2017)
Sahidullah, M., Saha, G.: Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Commun. 54(4), 543–565 (2012)
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: Proceedings of the IEEE ICASSP, vol. 1, pp. 401–404 (2003)
Steidl, S.: Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech. Logos Verlag, Berlin (2009)
Stuhlsatz, A., Meyer, C., Eyben, F., Zielke1, T., Meier, G., Schuller, B.: Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of ICASSP, pp. 5688–5691 (2011)
Torres-Carrasquillo, P., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller, J.R.: Approaches to language identification using gaussian mixture models and shifted delta cepstral features. In: Proceedings of ICSLP2002-INTERSPEECH2002, pp. 16–20 (2002)
Tang, H., Chu, S., Johnson, M.H.: Emotion recognition from speech via boosted Gaussian mixture models. In: Proceedings of ICME, pp. 294–297 (2009)
Xu, S., Liu, Y., Liu, X.: Speaker recognition and speech emotion recognition based on GMM. In: 3rd International Conference on Electric and Electronics (EEIC 2013), pp. 434–436 (2013)
Zhang, T., Wu, J.: Speech emotion recognition with i-vector feature and RNN model. In: 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), pp. 524–528 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Heracleous, P., Mohammad, Y., Yasuda, K., Yoneyama, A. (2023). Speech Emotion Recognition Using Spontaneous Children’s Corpus. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-24340-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24339-4
Online ISBN: 978-3-031-24340-0
eBook Packages: Computer ScienceComputer Science (R0)