We gratefully acknowledge support from
the Simons Foundation and member institutions.

Chiori Hori Ph.D. is qualified to endorse.

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

Chiori Hori Ph.D.: Is registered as an author of this paper.
Can endorse for cs.CL, cs.CV, cs.MM, cs.SD. (why?)

Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks and Jonathan Le Roux are not registered as owners of this paper. (why?)