Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.19467 (cs)

[Submitted on 28 Mar 2024]

Title:Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication

Authors:Mingze Sun, Chao Xu, Xinyu Jiang, Yang Liu, Baigui Sun, Ruqi Huang

Abstract:In this paper, we introduce an innovative task focused on human communication, aiming to generate 3D holistic human motions for both speakers and listeners. Central to our approach is the incorporation of factorization to decouple audio features and the combination of textual semantic information, thereby facilitating the creation of more realistic and coordinated movements. We separately train VQ-VAEs with respect to the holistic motions of both speaker and listener. We consider the real-time mutual influence between the speaker and the listener and propose a novel chain-like transformer-based auto-regressive model specifically designed to characterize real-world communication scenarios effectively which can generate the motions of both the speaker and the listener simultaneously. These designs ensure that the results we generate are both coordinated and diverse. Our approach demonstrates state-of-the-art performance on two benchmark datasets. Furthermore, we introduce the HoCo holistic communication dataset, which is a valuable resource for future research. Our HoCo dataset and code will be released for research purposes upon acceptance.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.19467 [cs.CV]
	(or arXiv:2403.19467v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.19467

Submission history

From: Mingze Sun [view email]
[v1] Thu, 28 Mar 2024 14:47:32 UTC (17,490 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators