Flow2Flow: Audio-visual cross-modality generation for talking face videos with rhythmic head

Z Wang, W He, Y Wei, Y Luo�- Displays, 2023 - Elsevier
Audio-visual cross-modality generation refers to the generation of audio or visual content
based on input from another modality. One of the key tasks in this field is the generation of
realistic talking facial videos from audio and head pose information, which has significant
applications in human–computer interaction, virtual reality, and video production. However,
previous work has limitations such as the inability to generate natural head poses or interact
with audio, which compromises the realism and expressive power of the generated videos�…
Showing the best result for this search. See all results