skip to main content
research-article

“Rewind to the Jiggling Meat Part”: Understanding Voice Control of Instructional Videos in Everyday Tasks

Published: 28 April 2022 Publication History

Abstract

Voice interaction has long been envisioned as enabling users to transform physical interaction into hands-free, such as allowing fine-grained control of instructional videos without physically disengaging from the task at hand. While significant engineering advances have brought us closer to this ideal, we do not fully understand the user requirements for voice interactions that should be supported in such contexts. This paper presents an ecologically-valid wizard-of-oz elicitation study exploring realistic user requirements for an ideal instructional video playback control while cooking. Through the analysis of the issued commands and performed actions during this non-linear and complex task, we identify (1) patterns of command formulation, (2) challenges for design, and (3) how task and voice-based commands are interwoven in real-life. We discuss implications for the design and research of voice interactions for navigating instructional videos while performing complex tasks.

Supplementary Material

MP4 File (3491102.3502036-video-preview.mp4)
Video Preview

References

[1]
Saul Albert and Magnus Hamann. 2021. Putting wake words to bed: We speak wake words with systematically varied prosody, but CUIs don’t listen. In CUI 2021-3rd Conference on Conversational User Interfaces. 1–5. https://doi.org/10.1145/3469595.3469608
[2]
Rami Alkhatib, Tarek El Bobo, Afif Swaidan, Jad Al Soussi, Mohamad O. Diab, and Nassim Khaled. 2021. Design of Robotic Manipulator to Hollow Out Zucchini. American Society of Mechanical Engineers Digital Collection. https://doi.org/10.1115/IMECE2020-24389
[3]
Matthew P. Aylett, Per Ola Kristensson, Steve Whittaker, and Yolanda Vazquez-Alvarez. 2014. None of a CHInd: Relationship Counselling for HCI and Speech Technology. ACM, New York, NY, USA, 749–760. https://doi.org/10.1145/2559206.2578868
[4]
Erin Beneteau, Olivia K. Richards, Mingrui Zhang, Julie A. Kientz, Jason Yip, and Alexis Hiniker. 2019. Communication Breakdowns Between Families and Alexa. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19. ACM Press, Glasgow, Scotland Uk, 1–13. https://doi.org/10.1145/3290605.3300473
[5]
Dan Bohus and Alexander I Rudnicky. 2003. RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda. (2003), 4.
[6]
V. Braun and V. Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (2006), 77–101. http://dx.doi.org/10.1191/1478088706qp063oa
[7]
Fabio Catania, Micol Spitale, Giulia Cosentino, and Franca Garzotto. 2020. What is the Best Action for Children to” Wake Up” and” Put to Sleep” a Conversational Agent? A Multi-Criteria Decision Analysis Approach. In Proceedings of the 2nd Conference on Conversational User Interfaces. 1–10. https://doi.org/10.1145/3405755.3406129
[8]
Minsuk Chang, Mina Huh, and Juho Kim. 2021. RubySlippers: Supporting Content-based Voice Navigation for How-to Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14. https://doi.org/10.1145/3411764.3445131
[9]
Minsuk Chang, Anh Truong, Oliver Wang, Maneesh Agrawala, and Juho Kim. 2019. How to design voice based navigation for how-to videos. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–11. https://doi.org/10.1145/3290605.3300931
[10]
Jing-Jing Chen, Chong-Wah Ngo, Fu-Li Feng, and Tat-Seng Chua. 2018. Deep understanding of cooking procedure for cross-modal recipe retrieval. In Proceedings of the 26th ACM international conference on Multimedia. 1020–1028. https://doi.org/10.1145/3240508.3240627
[11]
Yi Cheng, Kate Yen, Yeqi Chen, Sijin Chen, and Alexis Hiniker. 2018. Why doesn’t it work?: voice-driven interfaces and young children’s communication repair strategies. In Proceedings of the 17th ACM Conference on Interaction Design and Children - IDC ’18. ACM Press, Trondheim, Norway, 337–348. https://doi.org/10.1145/3202185.3202749
[12]
Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Emer Gilmartin, Christine Murad, and Cosmin Munteanu. 2019. What makes a good conversation? Challenges in designing truly conversational agents. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
[13]
Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. ”What can i help you with?”: infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services - MobileHCI ’17. ACM Press, Vienna, Austria, 1–12. https://doi.org/10.1145/3098279.3098539
[14]
Andy Crabtree and Tom Rodden. 2004. Domestic routines and design for the home. Computer Supported Cooperative Work 13, 2 (2004), 191–220. https://doi.org/10.1023/B:COSU.0000045712.26840.a4
[15]
Sebastien Cuendet, Indrani Medhi, Kalika Bali, and Edward Cutrell. 2013. VideoKheti: making video content accessible to low-literate and novice users. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI ’13. ACM Press, Paris, France, 2833. https://doi.org/10.1145/2470654.2481392
[16]
Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. 2008. Video browsing by direct manipulation. In Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI ’08. ACM Press, Florence, Italy, 237. https://doi.org/10.1145/1357054.1357096
[17]
Justin Edwards, He Liu, Tianyu Zhou, Sandy J. J. Gould, Leigh Clark, Philip Doyle, and Benjamin R. Cowan. 2019. Multitasking with Alexa: how using intelligent personal assistants impacts language-based primary task performance. In Proceedings of the 1st International Conference on Conversational User Interfaces - CUI ’19. ACM Press, Dublin, Ireland, 1–7. https://doi.org/10.1145/3342775.3342785
[18]
Huan Feng, Kassem Fawaz, and Kang G. Shin. 2017. Continuous Authentication for Voice Assistants. arXiv:1701.04507 [cs] (Jan. 2017). https://doi.org/10.1145/3117811.3117823
[19]
Joel E. Fischer, Stuart Reeves, Martin Porcheron, and Rein Ove Sikveland. 2019. Progressivity for voice interface design. In Proceedings of the 1st International Conference on Conversational User Interfaces - CUI ’19. ACM Press, Dublin, Ireland, 1–8. https://doi.org/10.1145/3342775.3342788
[20]
Peter Grasch, Alexander Felfernig, and Florian Reinfrank. 2013. ReComment: towards critiquing-based recommendation with speech interaction. In Proceedings of the 7th ACM conference on Recommender systems - RecSys ’13. ACM Press, Hong Kong, China, 157–164. https://doi.org/10.1145/2507157.2507161
[21]
Philip J. Guo and Katharina Reinecke. 2014. Demographic differences in how students navigate through MOOCs. In Proceedings of the first ACM conference on Learning @ scale conference(L@S ’14). Association for Computing Machinery, New York, NY, USA, 21–30. https://doi.org/10.1145/2556325.2566247
[22]
Kotaro Hara and Shamsi T. Iqbal. 2015. Effect of Machine Translation in Interlingual Conversation: Lessons from a Formative Study. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems - CHI ’15. ACM Press, Seoul, Republic of Korea, 3473–3482. https://doi.org/10.1145/2702123.2702407
[23]
Chiori Hori, Takaaki Hori, Tim K. Marks, and John R. Hershey. 2017. Early and late integration of audio features for automatic video description. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 430–436. https://doi.org/10.1109/ASRU.2017.8268968
[24]
Mohit Jain, Ramachandra Kota, Pratyush Kumar, and Shwetak N. Patel. 2018. Convey: Exploring the Use of a Context View for Chatbots. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI ’18. ACM Press, Montreal QC, Canada, 1–6. https://doi.org/10.1145/3173574.3174042
[25]
Jiepu Jiang, Ahmed Hassan Awadallah, Rosie Jones, Umut Ozertem, Imed Zitouni, Ranjitha Gurunath Kulkarni, and Omar Zia Khan. 2015. Automatic Online Evaluation of Intelligent Assistants. In Proceedings of the 24th International Conference on World Wide Web(WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 506–516. https://doi.org/10.1145/2736277.2741669
[26]
Jiepu Jiang, Wei Jeng, and Daqing He. 2013. How do users respond to voice input errors?: lexical and phonetic query reformulation in voice search. (2013), 10. https://doi.org/10.1145/2484028.2484092
[27]
John F. Kelley. 1983. An empirical methodology for writing user-friendly natural language computer applications. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 193–196. https://doi.org/10.1145/800045.801609
[28]
Chloé Kiddon, Ganesa Thandavam Ponnuraj, Luke Zettlemoyer, and Yejin Choi. 2015. Mise en Place: Unsupervised Interpretation of Instructional Recipes. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 982–992. https://doi.org/10.18653/v1/D15-1114
[29]
Juho Kim, Phu Tran Nguyen, Sarah Weir, Philip J. Guo, Robert C. Miller, and Krzysztof Z. Gajos. 2014. Crowdsourcing step-by-step information extraction to enhance existing how-to videos. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’14). Association for Computing Machinery, New York, NY, USA, 4017–4026. https://doi.org/10.1145/2556288.2556986
[30]
Stefan Kopp, Lars Gesellensetter, Nicole C. Krämer, and Ipke Wachsmuth. 2005. A conversational agent as museum guide–design and evaluation of a real-world application. In International workshop on intelligent virtual agents. Springer, 329–343. https://doi.org/10.1007/11550617_28
[31]
Philip Kortum. 2008. HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces. Elsevier.
[32]
Anuj Kumar, Pooja Reddy, Anuj Tewari, Rajat Agrawal, and Matthew Kam. 2012. Improving literacy in developing countries using speech recognition-supported games on mobile devices. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI ’12. ACM Press, Austin, Texas, USA, 1149. https://doi.org/10.1145/2207676.2208564
[33]
Sanna Kuoppamäki, Sylvaine Tuncer, Sara Eriksson, and Donald McMillan. 2021. Designing Kitchen Technologies for Ageing in Place: A Video Study of Older Adults’ Cooking at Home. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 2 (June 2021), 1–19. https://doi.org/10.1145/3463516
[34]
Lin-shan Lee, James Glass, Hung-yi Lee, and Chun-an Chan. 2015. Spoken Content Retrieval—Beyond Cascading Speech Recognition with Text Retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 9 (Sept. 2015), 1389–1420. https://doi.org/10.1109/TASLP.2015.2438543
[35]
Q. Vera Liao, Matthew Davis, Werner Geyer, Michael Muller, and N. Sadat Shami. 2016. What Can You Do?: Studying Social-Agent Orientation and Agent Proactive Interactions with an Agent for Employees. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems - DIS ’16. ACM Press, Brisbane, QLD, Australia, 264–275. https://doi.org/10.1145/2901790.2901842
[36]
Mike Ligthart, Timo Fernhout, Mark A. Neerincx, Kelly L. A. van Bindsbergen, Martha A. Grootenhuis, and Koen V. Hindriks. 2019. A child and a robot getting acquainted - interaction design for eliciting self-disclosure. In Proceedings of the 18th international conference on autonomous agents and MultiAgent systems(AAMAS ’19). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 61–70.
[37]
Nichola Lubold, Erin Walker, and Heather Pon-Barry. 2016. Effects of voice-adaptation and social dialogue on perceptions of a robotic learning companion. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, Christchurch, New Zealand, 255–262. https://doi.org/10.1109/HRI.2016.7451760
[38]
Ewa Luger and Abigail Sellen. 2016. ”Like Having a Really Bad PA”: The Gulf Between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems(CHI ’16). ACM, New York, NY, USA, 5286–5297. https://doi.org/10.1145/2858036.2858288
[39]
Donald McMillan, Barry Brown, Ikkaku Kawaguchi, Razan Jaber, Jordi Solsona Belenguer, and Hideaki Kuzuoka. 2019. Designing with Gaze: Tama – a Gaze Activated Smart-Speaker. Proc. ACM Hum.-Comput. Interact. 3, CSCW (Nov. 2019), 176:1–176:26. https://doi.org/10.1145/3359278
[40]
Cosmin Munteanu, Gerald Penn, and Christine Murad. 2021. Conversational Voice User Interfaces: Connecting Engineering Fundamentals to Design Considerations. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–3. https://doi.org/10.1145/3411763.3445008
[41]
Christine Murad and Cosmin Munteanu. 2019. I don’t know what you’re talking about, HALexa” the case for voice user interface guidelines. In Proceedings of the 1st International Conference on Conversational User Interfaces. 1–3. https://doi.org/10.1145/3342775.3342795
[42]
Dania Murad, Riwu Wang, Douglas Turnbull, and Ye Wang. 2018. SLIONS: A Karaoke Application to Enhance Foreign Language Learning. In 2018 ACM Multimedia Conference on Multimedia Conference - MM ’18. ACM Press, Seoul, Republic of Korea, 1679–1687. https://doi.org/10.1145/3240508.3240691
[43]
Sharon Oviatt, Jon Bernard, and Gina-Anne Levow. 1998. Linguistic Adaptations During Spoken and Multimodal Error Resolution. Lang Speech 41, 3-4 (July 1998), 419–442. https://doi.org/10.1177/002383099804100409
[44]
Jamie Pearson, Jiang Hu, Holly P Branigan, Martin J Pickering, and Clifford I Nass. 2006. Adaptive Language Behavior in HCI: How Expectations and Beliefs about a System Affect Users’ Word Choice. (2006), 4. https://doi.org/10.1145/1124772.1124948
[45]
Hannah R.M. Pelikan and Mathias Broth. 2016. Why That Nao?: How Humans Adapt to a Conventional Humanoid Robot in Taking Turns-at-Talk. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems(CHI ’16). ACM, New York, NY, USA, 4921–4932. https://doi.org/10.1145/2858036.2858478
[46]
John Sören Pettersson and Malin Wik. 2014. Perspectives on Ozlab in the cloud: A literature review of tools supporting Wizard-of-Oz experimentation, including an historical overview of 1971-2013 and notes on methodological issues and supporting generic tools. (2014).
[47]
Martin J. Pickering and Simon Garrod. 2006. Alignment as the basis for successful communication. Research on Language and Computation 4, 2 (2006), 203–228. https://doi.org/0.1007/s11168-006-9004-0
[48]
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems(CHI ’18). ACM, New York, NY, USA, 640:1–640:12. https://doi.org/10.1145/3173574.3174214
[49]
Gisela Reyes-Cruz, Joel Fischer, and Stuart Reeves. 2019. An ethnographic study of visual impairments for voice user interface design. arXiv preprint arXiv:1904.06123(2019).
[50]
Simon Robinson, Jennifer Pearson, Shashank Ahire, Rini Ahirwar, Bhakti Bhikne, Nimish Maravi, and Matt Jones. 2018. Revisiting “hole in the wall” computing: Private smart speakers and public slum settings. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–11. https://doi.org/10.1145/3173574.3174072
[51]
Lucas Rosenblatt, Patrick Carrington, Kotaro Hara, and Jeffrey P. Bigham. 2018. Vocal Programming for People with Upper-Body Motor Impairments. In Proceedings of the Internet of Accessible Things on - W4A ’18. ACM Press, Lyon, France, 1–10. https://doi.org/10.1145/3192714.3192821
[52]
Ameneh Shamekhi, Q. Vera Liao, Dakuo Wang, Rachel K. E. Bellamy, and Thomas Erickson. 2018. Face Value? Exploring the Effects of Embodiment for a Group Facilitation Agent. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI ’18. ACM Press, Montreal QC, Canada, 1–13. https://doi.org/10.1145/3173574.3173965
[53]
Vaishaal Shankar, Rebecca Roelofs, Horia Mania, Alex Fang, Benjamin Recht, and Ludwig Schmidt. 2020. Evaluating machine accuracy on imagenet. In International Conference on Machine Learning. PMLR, 8634–8644.
[54]
Tanya Stivers, Nicholas J. Enfield, and Stephen C. Levinson. 2010. Question-response sequences in conversation across ten languages: an introduction. Journal of Pragmatics 42(2010), 2615–2619. https://doi.org/10.1016/j.pragma.2010.04.001
[55]
Lucy Suchman. 1987. Plans and Situated Actions: The Problem of Human-Machine Communication. Cambridge University Press.
[56]
K. Tanaka, T. Sasaki, Y. Tonomura, T. Nakanishi, and N. Babaguchi. 2005. PlayWatch: chart-style video playback interface. In 2005 IEEE International Conference on Multimedia and Expo. 4 pp.–. https://doi.org/10.1109/ICME.2005.1521527
[57]
Johanne R. Trippas, Damiano Spina, Lawrence Cavedon, Hideo Joho, and Mark Sanderson. 2018. Informing the Design of Spoken Conversational Search: Perspective Paper. In Proceedings of the 2018 Conference on Human Information Interaction&Retrieval - CHIIR ’18. ACM Press, New Brunswick, NJ, USA, 32–41. https://doi.org/10.1145/3176349.3176387
[58]
Sylvaine Tuncer, Barry Brown, and Oskar Lindwall. 2020. On Pause: How Online Instructional Videos are Used to Achieve Practical Tasks. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376759
[59]
Ruolin Wang, Chun Yu, Xing-Dong Yang, Weijie He, and Yuanchun Shi. 2019. EarTouch: Facilitating Smartphone Use for Visually Impaired People in Mobile and Public Scenarios. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19. ACM Press, Glasgow, Scotland Uk, 1–13. https://doi.org/10.1145/3290605.3300254
[60]
Steve Whittaker, Vaiva Kalnikaité, and Patrick Ehlen. 2012. Markup as you talk: establishing effective memory cues while still contributing to a meeting. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work - CSCW ’12. ACM Press, Seattle, Washington, USA, 349. https://doi.org/10.1145/2145204.2145260
[61]
Cara Wilson, Margot Brereton, Bernd Ploderer, and Laurianne Sitbon. 2018. MyWord: enhancing engagement, interaction and self-expression with minimally-verbal children on the autism spectrum through a personal audio-visual dictionary. In Proceedings of the 17th ACM Conference on Interaction Design and Children - IDC ’18. ACM Press, Trondheim, Norway, 106–118. https://doi.org/10.1145/3202185.3202755
[62]
Kuldeep Yadav, Kundan Shrivastava, S. Mohana Prasad, Harish Arsikere, Sonal Patil, Ranjeet Kumar, and Om Deshmukh. 2015. Content-driven Multi-modal Techniques for Non-linear Video Navigation. In Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, Atlanta Georgia USA, 333–344. https://doi.org/10.1145/2678025.2701408
[63]
Akiko Yamazaki, Keiichi Yamazaki, Matthew Burdelski, Yoshinori Kuno, and Mihoko Fukushima. 2010. Coordination of verbal and non-verbal actions in human–robot interaction at museums and exhibitions. Journal of Pragmatics 42, 9 (Sept. 2010), 2398–2414. https://doi.org/10.1016/j.pragma.2009.12.023

Cited By

View all
  • (2024)Improving Video Navigation for Spatial Task Tutorials by Spatially Segmenting and Situating How-To VideosProceedings of the 2024 ACM Symposium on Spatial User Interaction10.1145/3677386.3682103(1-13)Online publication date: 7-Oct-2024
  • (2024)SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language ModelsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652942(217-225)Online publication date: 4-Apr-2024
  • (2024)AQuA: Automated Question-Answering in Software Tutorial Videos with Visual AnchorsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642752(1-19)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. “Rewind to the Jiggling Meat Part”: Understanding Voice Control of Instructional Videos in Everyday Tasks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
    April 2022
    10459 pages
    ISBN:9781450391573
    DOI:10.1145/3491102
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Conversational Interaction
    2. Non-Linear Instructional Video
    3. Voice-Based Navigation
    4. Wizard-of-Oz

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CHI '22
    Sponsor:
    CHI '22: CHI Conference on Human Factors in Computing Systems
    April 29 - May 5, 2022
    LA, New Orleans, USA

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)125
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 24 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Improving Video Navigation for Spatial Task Tutorials by Spatially Segmenting and Situating How-To VideosProceedings of the 2024 ACM Symposium on Spatial User Interaction10.1145/3677386.3682103(1-13)Online publication date: 7-Oct-2024
    • (2024)SkillsInterpreter: A Case Study of Automatic Annotation of Flowcharts to Support Browsing Instructional Videos in Modern Martial Arts using Large Language ModelsProceedings of the Augmented Humans International Conference 202410.1145/3652920.3652942(217-225)Online publication date: 4-Apr-2024
    • (2024)AQuA: Automated Question-Answering in Software Tutorial Videos with Visual AnchorsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642752(1-19)Online publication date: 11-May-2024
    • (2024)Cooking With Agents: Designing Context-aware Voice InteractionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642183(1-13)Online publication date: 11-May-2024
    • (2023)Human-Centered Deferred Inference: Measuring User Interactions and Setting Deferral Criteria for Human-AI TeamsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584092(681-694)Online publication date: 27-Mar-2023
    • (2023)Exploring Audio Icons for Content-Based Navigation in Voice User InterfacesProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3604302(1-9)Online publication date: 19-Jul-2023
    • (2023)Rewriting the Script: Adapting Text Instructions for Voice InteractionProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596059(2233-2248)Online publication date: 10-Jul-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media