Skip to main content

Automatic Head-Nod Generation Using Utterance Text Considering Personality Traits

  • Chapter
  • First Online:
Increasing Naturalness and Flexibility in Spoken Dialogue Interaction

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 714))

  • 493 Accesses

Abstract

We propose a model for generating head nods from an utterance text considering personality traits. We have been investigating the automatic generation of body motion, such as nodding, from utterance text in dialog agent systems. Human body motion varies greatly depending on personality. Therefore, it is important to appropriately generate body motion according to the personality of the dialog agent. To construct our model, we first compiled a Japanese corpus of 24 dialogues including utterance, nod information, and personality traits (Big Five) of participants. Our nod-generation model also estimates the presence, frequency, and depth during each phrase by using various types of language information extracted from utterance text and personality traits. We evaluated how well the model can generate and estimate nods based on individual personality traits. The results indicate that our model using language information and personality trails outperformed a model using only language information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 149.00
Price excludes VAT (USA)
Softcover Book
USD 199.99
Price excludes VAT (USA)
Hardcover Book
USD 199.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amos B, Ludwiczuk B, Satyanarayanan M (2016) Openface: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science

    Google Scholar 

  2. Beskow J, Granstrom B, House D (2006) Visual correlates to prominence in several expressive modes. In: INTERSPEECH

    Google Scholar 

  3. BirdWhistell RL (1970) Kinesics and context. University of Pennsylvania Press

    Google Scholar 

  4. Busso C, Deng Z, Grimm M, Neumann U, Narayanan S (2007) Rigid head motion in expressive speech animation: analysis and synthesis. In: IEEE transactions on audio, speech, and language processing, pp 1075–1086

    Google Scholar 

  5. Fuchi T, Takagi S (1998) Japanese morphological analyzer using word cooccurrence -jtag. In: International conference on computational linguistics, pp 409–413

    Google Scholar 

  6. Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying speech. In: IEEE international conference on automatic face and gesture recognition, pp 381–386

    Google Scholar 

  7. Higashinaka R, Imamura K, Meguro T, Miyazaki C, Kobayashi N, Sugiyama H, Hirano T, Makino T, Matsuo Y (2014) Towards an open-domain conversational system fully based on natural language processing. In: International conference on computational linguistics, pp 928–939

    Google Scholar 

  8. Ishi CT, Haas J, Wilbers FP, Ishiguro H, Hagita N (2007) Analysis of head motions and speech, and head motion control in an android. In: IEEE/RSJ international conference on intelligent robots and systems, pp 548–553

    Google Scholar 

  9. Ishi CT, Ishiguro H, Hagita N (2010) Head motion during dialogue speech and nod timing control in humanoid robots. In: ACM/IEEE international conference on human-robot interaction, pp 293–300

    Google Scholar 

  10. Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Automatic generation of head nods using utterance texts. In: 2018 27th IEEE international symposium on robot and human interactive communication (RO-MAN), pp 1143–1149

    Google Scholar 

  11. Ishii R, Higashinaka R, Nishida K, Katayama T, Kobayashi N, Tomita J (2018) Automatically generating head nods with linguistic information. In: Meiselwitz G (ed) Social computing and social media. Springer International Publishing, Cham, Technologies and analytics, pp 383–391

    Google Scholar 

  12. Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Automatic generation system of virtual agent’s motion using natural language. In: Proceedings of the 18th international conference on intelligent virtual agents, IVA ’18, New York, NY, USA, 2018. ACM, pp 357–358

    Google Scholar 

  13. Ishii R, Katayama T, Higashinaka R, Tomita J (2018) Generating body motions using spoken language in dialogue. In: Intelligent virtual agents (IVA’18)

    Google Scholar 

  14. Iwano Y, Kageyama S, Morikawa E, Nakazato S, Shirai K (1996) Analysis of head movements and its role in spoken dialogue. In: International conference on spoken language, pp 2167–2170

    Google Scholar 

  15. Munhall KG, Jones JA, Callan DE, Kuratate T, Vatikiotis-Bateson E (2004) Visual prosody and speech intelligibility: head movement improves auditory speech perception 15(2):133–137

    Google Scholar 

  16. Koiso H, Horiuchi Y, Tutiya S, Ichikawa A, Den Y (1998) An analysis of turn-taking and backchannels based on prosodic and syntactic features in japanese map task dialogs. Lang Speech 41:295–321

    Article  Google Scholar 

  17. Lohse M, Rothuis R, Gallego-Pérez J, Karreman DE, Evers V (2014) Robot gestures make difficult tasks easier: the impact of gestures on perceived workload and task performance. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’14, New York, NY, USA, 2014. ACM, pp 1459–1466

    Google Scholar 

  18. McBreen HM, Jack MA (2001) Evaluating humanoid synthetic agents in e-retail applications. IEEE Trans Syst, Man, Cybern - Part A: Syst Humans 31:5

    Article  Google Scholar 

  19. Meguro T, Higashinaka R, Minami Y, Dohsaka K (2010) Controlling listening-oriented dialogue using partially observable markov decision processes. In: International conference on computational linguistics, pp 761–769

    Google Scholar 

  20. Quinlan JR (1996) Improved use of continuous attributes in c4.5. J Artif Intell Res 4:77–90

    Article  Google Scholar 

  21. Watanabe T, Danbara R, Okubo M (2003) Effects of a speech-driven embodied interactive actor interactor on talker’s speech characteristics. In: IEEE international workshop on robot-human interactive communication, pp 211–216

    Google Scholar 

  22. Wittenburg P, Brugman H, Russel A, Klassmann A, Sloetjes H (2006) Elan a professional framework for multimodality research. In: International conference on language resources and evaluation

    Google Scholar 

  23. Yehia HC, Kuratate T, Vatikiotis-Bateson E (2002) Linking facial animation, head motion and speech acoustics 30(3):555–568

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ryo Ishii .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ishii, R., Katayama, T., Higashinaka, R., Tomita, J. (2021). Automatic Head-Nod Generation Using Utterance Text Considering Personality Traits. In: Marchi, E., Siniscalchi, S.M., Cumani, S., Salerno, V.M., Li, H. (eds) Increasing Naturalness and Flexibility in Spoken Dialogue Interaction. Lecture Notes in Electrical Engineering, vol 714. Springer, Singapore. https://doi.org/10.1007/978-981-15-9323-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-9323-9_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-9322-2

  • Online ISBN: 978-981-15-9323-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics