Skip to main content

Showing 1–12 of 12 results for author: Mokady, R

  1. arXiv:2211.09794  [pdf, other

    cs.CV

    Null-text Inversion for Editing Real Images using Guided Diffusion Models

    Authors: Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or

    Abstract: Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit a real image using these state-of-the-art tools, one must first invert the image with a meaningful text prompt into the pretrained model's domain. In this paper,… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  2. arXiv:2211.00575  [pdf, other

    cs.CV cs.AI cs.LG

    Text-Only Training for Image Captioning using Noise-Injected CLIP

    Authors: David Nukrai, Ron Mokady, Amir Globerson

    Abstract: We consider the task of image-captioning using only the CLIP model and additional text data at training time, and no additional captioned images. Our approach relies on the fact that CLIP is trained to make visual and textual embeddings similar. Therefore, we only need to learn how to translate CLIP textual embeddings back into text, and we can learn how to do this by learning a decoder for the fr… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: Will be presented at EMNLP 2022. GitHub: https://github.com/DavidHuji/CapDec

    Journal ref: EMNLP 2022

  3. arXiv:2208.01626  [pdf, other

    cs.CV cs.CL cs.GR cs.LG

    Prompt-to-Prompt Image Editing with Cross Attention Control

    Authors: Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, Daniel Cohen-Or

    Abstract: Recent large-scale text-driven synthesis models have attracted much attention thanks to their remarkable capabilities of generating highly diverse images that follow given text prompts. Such text-based synthesis methods are particularly appealing to humans who are used to verbally describe their intent. Therefore, it is only natural to extend the text-driven image synthesis to text-driven image ed… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

  4. arXiv:2202.14020  [pdf, other

    cs.CV cs.GR cs.LG

    State-of-the-Art in the Architecture, Methods and Applications of StyleGAN

    Authors: Amit H. Bermano, Rinon Gal, Yuval Alaluf, Ron Mokady, Yotam Nitzan, Omer Tov, Or Patashnik, Daniel Cohen-Or

    Abstract: Generative Adversarial Networks (GANs) have established themselves as a prevalent approach to image synthesis. Of these, StyleGAN offers a fascinating case study, owing to its remarkable visual quality and an ability to support a large array of downstream tasks. This state-of-the-art report covers the StyleGAN architecture, and the ways it has been employed since its conception, while also analyzi… ▽ More

    Submitted 28 February, 2022; originally announced February 2022.

  5. arXiv:2202.12211  [pdf, other

    cs.CV

    Self-Distilled StyleGAN: Towards Generation from Internet Photos

    Authors: Ron Mokady, Michal Yarom, Omer Tov, Oran Lang, Daniel Cohen-Or, Tali Dekel, Michal Irani, Inbar Mosseri

    Abstract: StyleGAN is known to produce high-fidelity images, while also offering unprecedented semantic editing. However, these fascinating abilities have been demonstrated only on a limited set of datasets, which are usually structurally aligned and well curated. In this paper, we show how StyleGAN can be adapted to work on raw uncurated images collected from the Internet. Such image collections impose two… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  6. arXiv:2201.08361  [pdf, other

    cs.CV cs.GR cs.LG

    Stitch it in Time: GAN-Based Facial Editing of Real Videos

    Authors: Rotem Tzaban, Ron Mokady, Rinon Gal, Amit H. Bermano, Daniel Cohen-Or

    Abstract: The ability of Generative Adversarial Networks to encode rich semantics within their latent space has been widely adopted for facial image editing. However, replicating their success with videos has proven challenging. Sets of high-quality facial videos are lacking, and working with videos introduces a fundamental barrier to overcome - temporal coherency. We propose that this barrier is largely ar… ▽ More

    Submitted 21 January, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: Project website: https://stitch-time.github.io/

  7. arXiv:2111.15666  [pdf, other

    cs.CV

    HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing

    Authors: Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal, Amit H. Bermano

    Abstract: The inversion of real images into StyleGAN's latent space is a well-studied problem. Nevertheless, applying existing approaches to real-world scenarios remains an open challenge, due to an inherent trade-off between reconstruction and editability: latent space regions which can accurately represent real images typically suffer from degraded semantic control. Recent work proposes to mitigate this t… ▽ More

    Submitted 29 March, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

    Comments: Accepted to CVPR 2022; Project page available at http://yuval-alaluf.github.io/hyperstyle/

  8. arXiv:2111.09734  [pdf, other

    cs.CV

    ClipCap: CLIP Prefix for Image Captioning

    Authors: Ron Mokady, Amir Hertz, Amit H. Bermano

    Abstract: Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image. In this paper, we present a simple approach to address this task. We use CLIP encoding as a prefix to the caption, by employing a simple mapping network, and then fine-tunes a language model to generate the image captions. The recently proposed CLI… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

  9. arXiv:2106.09679  [pdf, other

    cs.CV

    JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting

    Authors: Ron Mokady, Rotem Tzaban, Sagie Benaim, Amit H. Bermano, Daniel Cohen-Or

    Abstract: The task of unsupervised motion retargeting in videos has seen substantial advancements through the use of deep neural networks. While early works concentrated on specific object priors such as a human face or body, recent work considered the unsupervised case. When the source and target videos, however, are of different shapes, current methods fail. To alleviate this problem, we introduce JOKR -… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  10. arXiv:2106.05744  [pdf, other

    cs.CV

    Pivotal Tuning for Latent-based Editing of Real Images

    Authors: Daniel Roich, Ron Mokady, Amit H. Bermano, Daniel Cohen-Or

    Abstract: Recently, a surge of advanced facial editing techniques have been proposed that leverage the generative power of a pre-trained StyleGAN. To successfully edit an image this way, one must first project (or invert) the image into the pre-trained generator's domain. As it turns out, however, StyleGAN's latent space induces an inherent tradeoff between distortion and editability, i.e. between maintaini… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  11. Structural-analogy from a Single Image Pair

    Authors: Sagie Benaim, Ron Mokady, Amit Bermano, Daniel Cohen-Or, Lior Wolf

    Abstract: The task of unsupervised image-to-image translation has seen substantial advancements in recent years through the use of deep neural networks. Typically, the proposed solutions learn the characterizing distribution of two large, unpaired collections of images, and are able to alter the appearance of a given image, while keeping its geometry intact. In this paper, we explore the capabilities of neu… ▽ More

    Submitted 6 January, 2021; v1 submitted 5 April, 2020; originally announced April 2020.

    Comments: Published in 'Computer Graphics Forum'

  12. arXiv:1906.06558  [pdf, other

    cs.CV

    Mask Based Unsupervised Content Transfer

    Authors: Ron Mokady, Sagie Benaim, Lior Wolf, Amit Bermano

    Abstract: We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other. The proposed method disentangles the common and separate parts of these domains and, through the generation of a mask, focuses the attention of the underlying network to the desired augmentation alone, without wastefully reconstructing the ent… ▽ More

    Submitted 13 January, 2020; v1 submitted 15 June, 2019; originally announced June 2019.