Abstract
This paper presents the Irish Political Speech Database, an English-language database collected from Irish political recordings. The database is collected with automated indexing and content retrieval in mind, and thus is gathered from real-world recordings (such as television interviews and election rallies) which represent the nature and quality of recordings which will be encountered in practical applications. The database is labelled for six speaker attributes: boring; charismatic; enthusiastic; inspiring; likeable; and persuasive. Each of these traits is linked to the perceived ability or appeal of the speaker, and as such are relevant to a range of content retrieval and speech analysis tasks. The six base attributes are combined to form a metric of Overall Speaker Appeal. A set of baseline experiments is presented, which demonstrate the potential of this database for affective computing studies. Classification accuracies of up to 76% are achieved, with little feature or system optimisation.
Similar content being viewed by others
Notes
Available at: https://www.ffmpeg.org/.
For example, the following phrase was segmented into four parts: “Young people are seeing their bosses”, “cutting jobs, pulling back,”, “circling the wagons”, “sending out the message, postpone your ambitions”. This is manually corrected to form a single audio clip.
References
Afzal, S., & Robinson, P. (2009). Natural affect data: Collection and annotation in a learning context. In Proceedings of the 3rd international conference on affective computing and intelligent interaction and workshops, ACII 2009 (pp. 1–7).
Apple, W., Streeter, L. A., & Krauss, R. M. (1979). Effects of pitch and speech rate on personal attributions. Journal of Personality and Social Psychology, 37(5), 715–727.
Astolfi, A., Carullo, A., Pavese, L., & Puglisi, G. E. (2015). Duration of voicing and silence periods of continuous speech in different acoustic environments. Journal of the Acoustical Society of America, 137(2), 565–579.
Awamleh, R., & Gardner, W. L. (1999). Perceptions of leader charisma and effectiveness: The effects of vision content, delivery, and organizational performance. The Leadership Quarterly, 10(3), 345–373.
Bänziger, T., Mortillaro, M., & Scherer, K. (2011). Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion (Washington, DC), 12(5), 1161.
Belin, P., Fillion-Bilodeau, S., & Gosselin, F. (2008). The Montreal affective voices: A validated set of nonverbal affect bursts for research on auditory affective processing. Behavior Research Methods, 40(2), 531–9.
Biadsy, F., Rosenberg, A., Carlson, R., Hirschberg, J., & Strangert, E. (2008). A cross-cultural comparison of American, Palestinian, and Swedish perception of charismatic speech. In Speech prosody (pp. 579–582).
Biel, J., & Gatica-Perez, D. (2013). The YouTube lens: Crowdsourced personality impressions and audiovisual analysis of vlogs. IEEE Transactions on Multimedia, 15(1), 41–55.
Briggs, S. R. (1992). Assessing the five-factor model of personality description. Journal of Personality, 60(2), 253–293.
Burkhardt, F., Eckert, M., Johannsen, W., & Stegmann, J. (2010). A database of age and gender annotated telephone speech. In International conference on language resources and evaluation (LREC) (pp. 1562–1565).
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Proceedings of the Interspeech.
Burkhardt, F., Schuller, B., Weiss, B., & Weninger, F. (2011). Would you buy a car from me? On the likability of telephone voices. In Interspeech. ISCA (pp. 1557–1560).
Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., et al. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. International Conference on Language Resources and Evaluation (LREC), 42(4), 335–359.
Cabezas, F., Carlier, A., Charvillat, V., Salvador, A., & Giro-i Nieto, X. (2015). Quality control in crowdsourced object segmentation. In Proceedings of the 2015 IEEE international conference on image processing (ICIP) (pp. 4243–4247).
Calix, R., Khazaeli, M., Javadpour, L., & Knapp, G. (2011). Dimensionality reduction and classification analysis on the audio section of the SEMAINE database, lecture notes in computer science (Vol. 6975, pp. 323–331). Berlin: Springer.
Conger, J. A., & Kanungo, R. N. (1987). Toward a behavioral theory of charismatic leadership in organizational settings. Academy of Management Review, 12(4), 637–647.
Cowie, R., & Cornelius, R. R. (2003). Describing the emotional states that are expressed in speech. Speech Communication, 40(12), 5–32.
Cullen, A., Hines, A., & Harte, N. (2014). Building a database of political speech: Does culture matter in charisma annotations? In Audio visual emotion challenge (AVEC’14) (pp. 27–31). ACM.
D’Errico, F., Signorello, R., Demolin, D., & Poggi, I. (2013). The perception of charisma from voice: A cross-cultural study. In Humaine association conference on affective computing and intelligent interaction (ACII) (pp. 552–557).
D’Mello, S. K., Dowell, N., & Graesser, A. (2013). Unimodal and multimodal human perception of naturalistic non-basic affective states during human–computer interactions. IEEE Transaction on Affective Computing, 4(4), 452–465.
De Raad, B., Barelds, D. P. H., Levert, E., Ostendorf, F., Mlačić, B., Blas, L. D., et al. (2010). Only three factors of personality description are fully replicable across languages: A comparison of 14 trait taxonomies. Journal of Personality and Social Psychology, 98(1), 160–173.
De Silva, L.C., & Pei Chi, N. (2000). Bimodal emotion recognition. In Proceedings of the fourth IEEE international conference on automatic face and gesture recognition, 2000 (pp. 332–335).
Dhall, A., Goecke, R., Joshi, J., Sikka, K., & Gedeon, T. (2014). Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In Proceedings of the 16th international conference on multimodal interaction (pp. 461–466). ACM, 2666275.
Ekman, P. (2003). Emotions revealed. London: Weidenfeld & Nicholson.
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Eyben, F., Wollmer, M., & Schuller, B. (2010). Opensmile: The Munich versatile and fast open-source audio feature extractor. In ACM international conference on multimedia (pp. 1459–1462). ACM.
Finlayson, A., & Martin, J. (2008). ‘It ain’t what you say..’: British political studies and the analysis of speech and rhetoric. British Politics, 3(4), 445–464.
Gatica-Perez, D. (2009). Automatic nonverbal analysis of social interaction in small groups: A review. Image and Vision Computing, 27(12), 1775–1787.
Glodek, M., Tschechne, S., Layher, G., Schels, M., Brosch, T., Scherer, S., et al. (2011). Multiple classifier systems for the classification of audio-visual emotional states, lecture notes in computer science (Vol. 6975, pp. 359–368). Berlin: Springer.
Gobl, C., & Ni Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(12), 189–212.
Goldberg, L. R. (1990). An alternative “description of personality”: The big-five factor structure. Journal of Personality and Social Psychology, 59(6), 1216–1229.
Gravano, A., Levitan, R., Willson, L., Beňuš, V., Hirschberg, J., & Nenkova, A. (2011). Acoustic and prosodic correlates of social behaviour. In Interspeech (pp. 97–100). ISCA.
Grimm, M., & Kroschel, K. (2005). Evaluation of natural emotions using self assessment manikins. In Proceedings of the 2005 IEEE workshop on automatic speech recognition and understanding (pp. 381–385).
Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera am Mittag German audio-visual emotional speech database. In IEEE international conference on multimedia and expo (pp. 865–868).
Guerini, M., Giampiccolo, D., Moretti, G., Sprugnoli, R., & Strapparava, C. (2013). The new release of CORPS: A corpus of political speeches annotated with audience reactions. Lecture Notes in Computer Science, 7688, 86–98.
Guntuku, S.C., Lin, W., Scott, M.J., & Ghinea, G. (2015). Modelling the influence of personality and culture on affect and enjoyment in multimedia. In Proceedings of the 2015 international conference on affective computing and intelligent interaction (ACII) (pp. 236–242).
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. SIGKDD Explorations Newsletter, 11(1), 10–18.
Hanjalic, A., & Li-Qun, X. (2005). Affective video content representation and modeling. IEEE Transactions on Multimedia, 7(1), 143–154.
Hart, R. P., & Lind, C. J. (2010). Words and their ways in campaign ’08. American Behavioral Scientist, 54(4), 355–381.
Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77–89.
Hung, H., & Gatica-Perez, D. (2010). Estimating cohesion in small groups using audio-visual nonverbal behavior. IEEE Transaction on Multimedia, 12(6), 563–575.
Irish government news channel. https://www.youtube.com/user/MerrionStreetNews.
Joho, H., Staiano, J., Sebe, N., & Jose, J. M. (2010). Looking at the viewer: Analysing facial activity to detect personal highlights of multimedia contents. Multimedia Tools and Applications, 51(2), 505–523.
Junqua, J. C., Wakita, H., & Hermansky, H. (1993). Evaluation and optimization of perceptually-based ASR front-end. IEEE Transactions on Speech and Audio Processing, 1(1), 39–48.
Kim, S., Valente, F., Filippone, M., & Vinciarelli, A. (2014). Predicting continuous conflict perception with bayesian gaussian processes. IEEE Transactions on Affective Computing, 5(2), 187–200.
Kockmann, M., Burget, L., & Honza Černocký, J. (2011). Application of speaker- and language identification state-of-the-art techniques for emotion recognition. Speech Communication, 53(910), 1172–1185.
Koelstra, S., Muhl, C., Soleymani, M., Lee, J. S., Yazdani, A., Ebrahimi, T., et al. (2012). DEAP: A database for emotion analysis using physiological signals. IEEE Transactions on Affective Computing, 3(1), 18–31.
Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., & Narayanan, S. (2004). Emotion recognition based on phoneme classes. In Interspeech (pp. 889–892).
Lijun, Y., Xiaozhou, W., Yi, S., Jun, W., & Rosato, M.J. (2006). A 3D facial expression database for facial behavior research. In Proceedings of the 7th international conference on automatic face and gesture recognition, FGR 2006 (pp. 211–216).
Lucas, S. (2012). The art of public speaking (11th ed.). New York: McGraw-Hill.
Mahmoud, M., Baltrušaitis, T., Robinson, P., & Riek, L. D. (2011). 3D corpus of spontaneous complex mental states (pp. 205–214). Berlin: Springer.
Mana, N., Lepri, B., Chippendale, P., Cappelletti, A., Pianesi, F., Svaizer, P., & Zancanaro, M. (2007). Multimodal corpus of multi-party meetings for automatic social behavior analysis and personality traits detection. In Proceedings of the 2007 workshop on tagging, mining and retrieval of human related activity information (pp. 9–14). ACM, 1330590.
Mariooryad, S., Kannan, A., Hakkani-Tur, D., & Shriberg, E. (2014). Automatic characterization of speaking styles in educational videos. In Proceedings of the 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4848–4852).
McAleer, P., Todorov, A., & Belin, P. (2014). How do you say ‘Hello’? Personality impressions from brief novel voices. PLoS ONE, 9(3), 1–9.
McDuff, D., Kaliouby, R. E., & Picard, R. W. (2012). Crowdsourcing facial responses to online videos. IEEE Transactions on Affective Computing, 3(4), 456–468.
McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.
Meng, H., & Bianchi-Berthouze, N. (2011). Naturalistic affective expression classification by a multi-stage approach based on hidden Markov models, lecture notes in computer science (Vol. 6975, pp. 378–387). Berlin: Springer.
Mohammadi, G., & Vinciarelli, A. (2012). Automatic personality perception: Prediction of trait attribution based on prosodic features. IEEE Transaction on Affective Computing, 3(3), 273–284.
Mohammadi, G., Vinciarelli, A., & Mortillaro, M. (2010). The voice of personality: Mapping nonverbal vocal behaviour into trait attributions. In International workshop on social signal processing (SSPW) (pp. 17–20).
Mower, E., Metallinou, A., Chi-Chun, L., Kazemzadeh, A., Busso, C., Sungbok, L., & Narayanan, S. (2009). Interpreting ambiguous emotional expressions. In Proceedings of the 3rd international conference on affective computing and intelligent interaction and workshops (ACII) (pp. 1–8).
Niebuhr, O., Voße, J., & Brem, A. (2016). What makes a charismatic speaker? A computer-based acoustic-prosodic analysis of steve jobs tone of voice. Computers in Human Behavior, 64, 366–382.
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41(4), 603–623.
Oireachtas debates archive. http://oireachtas.heanet.ie/FullArchive/.
Olivola, C. Y., & Todorov, A. (2010). Elected in 100 milliseconds: Appearance-based trait inferences and voting. Journal of Nonverbal Behavior, 34(2), 83–110.
Pammi, S., & Schröder, M. (2011). Evaluating the meaning of synthesized listener vocalisations. In Interspeech (pp. 329–332). ISCA.
Pennebaker, J. W., & Lay, T. C. (2002). Language use and personality during crises: Analyses of Mayor Rudolph Giuliani’s press conferences. Journal of Research in Personality, 36(3), 271–282.
Pfister, T., & Robinson, P. (2011). Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. IEEE Transaction on Affective Computing, 2(2), 66–78.
Pianesi, F., Zancanaro, M., Lepri, B., & Cappelletti, A. (2008). A multimodal annotated corpus of consensus decision making meetings. International Conference on Language Resources and Evaluation (LREC), 41(3), 409–429.
Pon-Barry, H., & Nelakurthi, A.R. (2014). Challenges for robust prosody-based affect recognition. In Speech Prosody (pp. 144–148).
Rammstedt, B., & John, O. P. (2007). Measuring personality in one minute or less: A 10-item short version of the big five inventory in english and german. Journal of Research in Personality, 41(1), 203–212.
Ribeiro, F., Florencio, D., & Nascimento, V. (2011). Crowdsourcing subjective image quality evaluation. In Proceedings of the 18th IEEE international conference on image processing (ICIP) (pp. 3097–3100).
Riek, L.D., OConnor, M.F., & Robinson, P. (2011). Guess what? A game for affective annotation of video using crowd sourcing, lecture notes in computer science (Vol. 6974, chap. 31, pp. 277–285). Berlin: Springer.
Rosenberg, A., & Hirschberg, J. (2009). Charisma perception from text and speech. Speech Communication, 51(7), 640–655.
Salamin, H., Favre, S., & Vinciarelli, A. (2009). Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction. IEEE Transactions on Multimedia, 11(7), 1373–1380.
Sanchez-Cortes, D., Aran, O., & Gatica-Perez, D. (2011). An audio visual corpus for emergent leader analysis. In Multimodal corpora for machine learning: Taking stock and road mapping the future. ACM ICMI.
Scherer, K. R. (1972). Judging personality from voice: A cross-cultural approach to an old issue in interpersonal perception. Journal of Personality, 40(2), 191–210.
Scherer, K. R. (1979). Personality markers in speech, chap 5 (pp. 147–209). Cambridge: Cambridge University Press.
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1), 227–256.
Scherer, K. R., Bnziger, T., & Roesch, E. B. (Eds.). (2010). Blueprint for affective computing: A sourcebook and manual. Oxford: Oxford University Press.
Schiel, F., & Heinrich, C. (2009). Laying the foundation for in-car alcohol detection by speech. In Interspeech (pp. 983–986). ISCA.
Schuller, B., & Batliner, A. (2014). Computational paralinguistics. Hoboken: Wiley.
Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2011). Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication, 53(910), 1062–1087.
Schuller, B., Mller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., et al. (2009). Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing, 27(12), 1760–1774.
Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model-based speech emotion recognition. In IEEE international conference on acoustics, speech, and signal processing (ICASSP) (Vol. 2, pp. 1–4).
Schuller, B., Steidl, S., & Batliner, A. (2009). The INTERSPEECH 2009 emotion challenge. In Interspeech (pp. 312–315).
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C., & Narayanan, S.S. (2010). The INTERSPEECH 2010 paralinguistic challenge. In Interspeech (pp. 2794–2797). ISCA.
Schuller, B., Valstar, M., Eyben, F., McKeown, G., Cowie, R., & Pantic, M. (2011). AVEC 2011: The first international audio/visual emotion challenge, lecture notes in computer science (Vol. 6975, pp. 415–424). Berlin: Springer.
Siddiquie, B., Chisholm, D., & Divakaran, A. (2015). Exploiting multimodal affect and semantics to identify politically persuasive web videos. In International conference on multimodal interactino (ICMI) (pp. 203–210). ACM.
Signorello, R., D’Errico, F., Poggi, I., & Demolin, D. (2012). How charisma is perceived from speech: A multidimensional approach. In Privacy, security, risk and trust (PASSAT), 2012 international conference on social computing (SocialCom) (pp. 435–440).
Slatcher, R. B., Chung, C. K., Pennebaker, J. W., & Stone, L. D. (2007). Winning words: Individual differences in linguistic style among US presidential and vice presidential candidates. Journal of Research in Personality, 41(1), 63–75.
Snel, J., Tarasov, A., Cullen, C., & Delany, S.J. (2012). A crowdsourcing approach to labeling a mood induced speech corpora. In Proceedings of the 4th international workshop on corpora for research on emotion sentiment and social signals (ES \(^3\) 2012).
Snow, R., O’Connor, B., Jurafsky, D., & Ng, A.Y. (2008). Cheap and fast: But is it good? Evaluating non-expert annotations for natural language tasks. In Empirical methods in natural language processing (pp. 254–263). Association for Computational Linguistics.
Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3(1), 42–55.
Steidl, S. (2009). Automatic classification of emotion-related user states in spontaneous children’s speech. Berlin: Logos Verlag.
Steidl, S., Polzehl, T., Bunnell, H.T., Dou, Y., Muthukumar, P.K., Perry, D., Prahallad, K., Vaughn, C., Black, A.W., & Metze, F. (2012). Emotion identification for evaluation of synthesized emotional speech. In Speech Prosody (pp. 661–664).
Strangert, E., & Gustafson, J. (2008). What makes a good speaker? Subject ratings, acoustic measurements, and perceptual evaluations. In Interspeech (pp. 1688–1691). ISCA.
Strapparava, C., Guerini, M., & Stock, O. (2010). Predicting persuasiveness in political discourses. In International conference on language resources and evaluation (LREC) (pp. 1342–1345).
Strapparava, C., & Mihalcea, R. (2008). Learning to identify emotions in text. In Proceedings of the 2008 ACM symposium on applied computing (pp. 1556–1560). ACM.
Tahon, M., Delaborde, A., & Devillers, L. (2012). Corpus of children voices for mid-level markers and affect bursts analysis. In International conference on language resources and evaluation (LREC) (pp. 2366–2369).
Touati, P. (1993). Prosodic aspects of political rhetoric. In ESCA workshop on prosody (pp. 168–171).
Truong, K.P., Neerincx, M.A., & van Leeuwen, D.A. (2008) Assessing agreement of observer-and self-annotations in spontaneous multimodal emotion data. In Interspeech (pp. 318–321). ISCA.
Tsai, T. (2015). Are you TED material? Comparing prosody in professors and TED speakers. In Interspeech (pp. 2534–2538). ISCA.
Vinciarelli, A., Dielmann, A., Favre, S., & Salamin, H. (2009). Canal9: A database of political debates for analysis of social interactions. In Proceedings of the 3rd international conference on affective computing and intelligent interaction and workshops (ACII) (pp. 1–4).
Vinciarelli, A., Esposito, A., André, E., Bonin, F., Chetouani, M., Cohn, J. F., et al. (2015). Open challenges in modelling, analysis and synthesis of human behaviour in human–human and human–machine interactions. Cognitive Computation, 7(4), 397–413.
Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12), 1743–1759.
Wang, W.Y., & Hirschberg, J. (2011). Detecting levels of interest from spoken dialog with multistream prediction feedback and similarity based hierarchical fusion learning. In SIGDIAL (pp. 152–161). Association for Computational Linguistics.
Weiss, B. (2005). Prosodic elements of a political speech and its effects on listeners. In: International conference on speech and computer (SPECOM) (pp. 127–130).
Weiss, B., & Burkhardt, F. (2010). Voice attributes affecting likability perception. In Interspeech (pp. 2014–2017). ISCA.
Weninger, F., Krajewski, J., Batliner, A., & Schuller, B. (2012). The voice of leadership: Models and performances of automatic analysis in online speeches. IEEE Transactions on Affective Computing, 3(4), 496–508.
Wörtwein, T., Chollet, M., Schauerte, B., Morency, L.P., Stiefelhagen, R., & Scherer, S. (2015). Multimodal public speaking performance assessment. In Proceedings of the ACM international conference on multimodal interaction (pp. 43–50). ACM.
Zhang, J.R., Sherwin, J., Dmochowski, J., Sajda, P., & Kender, J.R. (2014). Correlating speaker gestures in political debates with audience engagement measured via EEG. In Proceedings of the 22nd ACM international conference on multimedia (pp. 387–396). ACM, 2654909.
Zhihong, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.
Acknowledgements
This work was supported by the Irish Research Council (IRC) under the Embark initiative, and was partly funded by the ADAPT Centre for Digital Content Technology, which is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is cofunded under the European Regional Development Fund.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cullen, A., Harte, N. A longitudinal database of Irish political speech with annotations of speaker ability. Lang Resources & Evaluation 52, 401–432 (2018). https://doi.org/10.1007/s10579-017-9401-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-017-9401-z