Skip to main content

Showing 1–3 of 3 results for author: Yariv, G

  1. arXiv:2406.13621  [pdf, other

    cs.CL cs.CV cs.LG

    Improving Visual Commonsense in Language Models via Multiple Image Generation

    Authors: Guy Yariv, Idan Schwartz, Yossi Adi, Sagie Benaim

    Abstract: Commonsense reasoning is fundamentally based on multimodal knowledge. However, existing large language models (LLMs) are primarily trained using textual data only, limiting their ability to incorporate essential visual information. In contrast, Visual Language Models, which excel at visually-oriented tasks, often fail at non-visual tasks such as basic commonsense reasoning. This divergence highlig… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2309.16429  [pdf, other

    cs.LG cs.AI

    Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

    Authors: Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi

    Abstract: We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresp… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 9 pages, 6 figures

  3. arXiv:2305.13050  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

    Authors: Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz

    Abstract: In recent years, image generation has shown a great leap in performance, where diffusion models play a central role. Although generating high-quality images, such models are mainly conditioned on textual descriptions. This begs the question: "how can we adopt such models to be conditioned on other modalities?". In this paper, we propose a novel method utilizing latent diffusion models trained for… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to INTERSPEECH 2023