Skip to main content

Showing 1–35 of 35 results for author: Benaim, S

  1. arXiv:2410.09792  [pdf, other

    cs.CV

    Generating Intermediate Representations for Compositional Text-To-Image Generation

    Authors: Ran Galun, Sagie Benaim

    Abstract: Text-to-image diffusion models have demonstrated an impressive ability to produce high-quality outputs. However, they often struggle to accurately follow fine-grained spatial information in an input text. To this end, we propose a compositional approach for text-to-image generation based on two stages. In the first stage, we design a diffusion-based generative model to produce one or more aligned… ▽ More

    Submitted 20 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024 Workshop on Compositional Learning: Perspectives, Methods, and Paths Forward

  2. arXiv:2406.13621  [pdf, other

    cs.CL cs.CV cs.LG

    Improving Visual Commonsense in Language Models via Multiple Image Generation

    Authors: Guy Yariv, Idan Schwartz, Yossi Adi, Sagie Benaim

    Abstract: Commonsense reasoning is fundamentally based on multimodal knowledge. However, existing large language models (LLMs) are primarily trained using textual data only, limiting their ability to incorporate essential visual information. In contrast, Visual Language Models, which excel at visually-oriented tasks, often fail at non-visual tasks such as basic commonsense reasoning. This divergence highlig… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.04332  [pdf, other

    cs.CV cs.LG

    Coarse-To-Fine Tensor Trains for Compact Visual Representations

    Authors: Sebastian Loeschcke, Dan Wang, Christian Leth-Espensen, Serge Belongie, Michael J. Kastoryano, Sagie Benaim

    Abstract: The ability to learn compact, high-quality, and easy-to-optimize representations for visual data is paramount to many applications such as novel view synthesis and 3D reconstruction. Recent work has shown substantial success in using tensor networks to design such compact and high-quality representations. However, the ability to optimize tensor-based representations, and in particular, the highly… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project webpage: https://sebulo.github.io/PuTT_website/

  4. arXiv:2405.19321  [pdf, other

    cs.CV

    DGD: Dynamic 3D Gaussians Distillation

    Authors: Isaac Labe, Noam Issachar, Itai Lang, Sagie Benaim

    Abstract: We tackle the task of learning dynamic 3D semantic radiance fields given a single monocular video as input. Our learned semantic radiance field captures per-point semantics as well as color and geometric properties for a dynamic 3D scene, enabling the generation of novel views and their corresponding semantics. This enables the segmentation and tracking of a diverse set of 3D semantic entities, sp… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  5. arXiv:2310.19080  [pdf, other

    cs.CV

    Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

    Authors: Katie Z Luo, Zhenzhen Liu, Xiangyu Chen, Yurong You, Sagie Benaim, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q. Weinberger

    Abstract: Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper,… ▽ More

    Submitted 5 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

  6. arXiv:2309.16429  [pdf, other

    cs.LG cs.AI

    Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

    Authors: Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi

    Abstract: We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresp… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 9 pages, 6 figures

  7. arXiv:2303.17155  [pdf, other

    cs.CV cs.AI

    Discriminative Class Tokens for Text-to-Image Diffusion Models

    Authors: Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

    Abstract: Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This approach has two disadvantages: (i) supervised da… ▽ More

    Submitted 10 September, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: ICCV 2023

  8. arXiv:2302.04862  [pdf, other

    cs.CV cs.LG

    Polynomial Neural Fields for Subband Decomposition and Manipulation

    Authors: Guandao Yang, Sagie Benaim, Varun Jampani, Kyle Genova, Jonathan T. Barron, Thomas Funkhouser, Bharath Hariharan, Serge Belongie

    Abstract: Neural fields have emerged as a new paradigm for representing signals, thanks to their ability to do it compactly while being easy to optimize. In most applications, however, neural fields are treated like black boxes, which precludes many signal manipulation tasks. In this paper, we propose a new class of neural fields called polynomial neural fields (PNFs). The key advantage of a PNF is that it… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted to NeurIPS 2022

  9. arXiv:2211.09782  [pdf, other

    cs.CV cs.CR cs.LG

    Assessing Neural Network Robustness via Adversarial Pivotal Tuning

    Authors: Peter Ebert Christensen, Vésteinn Snæbjarnarson, Andrea Dittadi, Serge Belongie, Sagie Benaim

    Abstract: The robustness of image classifiers is essential to their deployment in the real world. The ability to assess this resilience to manipulations or deviations from the training data is thus crucial. These modifications have traditionally consisted of minimal changes that still manage to fool classifiers, and modern approaches are increasingly robust to them. Semantic manipulations that modify elemen… ▽ More

    Submitted 6 January, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Major changes include new experiments in Table 1 on page 5 and Table 2-4 on page 6, new figure 5 on page 8. Paper accepted at WACV (oral)

  10. arXiv:2207.11226  [pdf, other

    cs.CV cs.LG

    FewGAN: Generating from the Joint Distribution of a Few Images

    Authors: Lior Ben-Moshe, Sagie Benaim, Lior Wolf

    Abstract: We introduce FewGAN, a generative model for generating novel, high-quality and diverse images whose patch distribution lies in the joint patch distribution of a small number of N>1 training samples. The method is, in essence, a hierarchical patch-GAN that applies quantization at the first coarse scale, in a similar fashion to VQ-GAN, followed by a pyramid of residual fully convolutional GANs at fi… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  11. arXiv:2206.12396  [pdf, other

    cs.CV

    Text-Driven Stylization of Video Objects

    Authors: Sebastian Loeschcke, Serge Belongie, Sagie Benaim

    Abstract: We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging task as the resulting video must satisfy multiple properties: (1) it has to be temporally consistent and avoid jittering or similar artifacts, (2) the resulting stylization must preserve both the global semantics of the object and its fine-grained details,… ▽ More

    Submitted 27 June, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

  12. arXiv:2206.02776  [pdf, other

    cs.CV

    Volumetric Disentanglement for 3D Scene Manipulation

    Authors: Sagie Benaim, Frederik Warburg, Peter Ebert Christensen, Serge Belongie

    Abstract: Recently, advances in differential volumetric rendering enabled significant breakthroughs in the photo-realistic and fine-detailed reconstruction of complex 3D scenes, which is key for many virtual reality applications. However, in the context of augmented reality, one may also wish to effect semantic manipulations or augmentations of objects within a scene. To this end, we propose a volumetric fr… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  13. arXiv:2205.02673  [pdf, other

    cs.LG cs.AI

    On Disentangled and Locally Fair Representations

    Authors: Yaron Gurovich, Sagie Benaim, Lior Wolf

    Abstract: We study the problem of performing classification in a manner that is fair for sensitive groups, such as race and gender. This problem is tackled through the lens of disentangled and locally fair representations. We learn a locally fair representation, such that, under the learned representation, the neighborhood of each sample is balanced in terms of the sensitive attribute. For instance, when a… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

  14. arXiv:2112.05080  [pdf, other

    cs.CV cs.AI

    Locally Shifted Attention With Early Global Integration

    Authors: Shelly Sheynin, Sagie Benaim, Adam Polyak, Lior Wolf

    Abstract: Recent work has shown the potential of transformers for computer vision applications. An image is first partitioned into patches, which are then used as input tokens for the attention mechanism. Due to the expensive quadratic cost of the attention mechanism, either a large patch size is used, resulting in coarse-grained global interactions, or alternatively, attention is applied only on a local re… ▽ More

    Submitted 22 December, 2021; v1 submitted 9 December, 2021; originally announced December 2021.

  15. arXiv:2112.03221  [pdf, other

    cs.CV cs.CL cs.GR

    Text2Mesh: Text-Driven Neural Stylization for Meshes

    Authors: Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, Rana Hanocka

    Abstract: In this work, we develop intuitive controls for editing the style of 3D objects. Our framework, Text2Mesh, stylizes a 3D mesh by predicting color and local geometric details which conform to a target text prompt. We consider a disentangled representation of a 3D object using a fixed mesh input (content) coupled with a learned neural network, which we term neural style field network. In order to mo… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: project page: https://threedle.github.io/text2mesh/

  16. arXiv:2110.12427  [pdf, other

    cs.CV

    Image-Based CLIP-Guided Essence Transfer

    Authors: Hila Chefer, Sagie Benaim, Roni Paiss, Lior Wolf

    Abstract: We make the distinction between (i) style transfer, in which a source image is manipulated to match the textures and colors of a target image, and (ii) essence transfer, in which one edits the source image to include high-level semantic attributes from the target. Crucially, the semantic attributes that constitute the essence of an image may differ from image to image. Our blending operator combin… ▽ More

    Submitted 11 October, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

    Comments: To appear in ECCV'22

  17. arXiv:2106.09679  [pdf, other

    cs.CV

    JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting

    Authors: Ron Mokady, Rotem Tzaban, Sagie Benaim, Amit H. Bermano, Daniel Cohen-Or

    Abstract: The task of unsupervised motion retargeting in videos has seen substantial advancements through the use of deep neural networks. While early works concentrated on specific object priors such as a human face or body, recent work considered the unsupervised case. When the source and target videos, however, are of different shapes, current methods fail. To alleviate this problem, we introduce JOKR -… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  18. arXiv:2105.14609  [pdf, other

    cs.CV

    Identity and Attribute Preserving Thumbnail Upscaling

    Authors: Noam Gat, Sagie Benaim, Lior Wolf

    Abstract: We consider the task of upscaling a low resolution thumbnail image of a person, to a higher resolution image, which preserves the person's identity and other attributes. Since the thumbnail image is of low resolution, many higher resolution versions exist. Previous approaches produce solutions where the person's identity is not preserved, or biased solutions, such as predominantly Caucasian faces.… ▽ More

    Submitted 30 May, 2021; originally announced May 2021.

    Comments: ICIP 2021

  19. arXiv:2104.14535  [pdf, other

    cs.CV cs.LG

    A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection

    Authors: Shelly Sheynin, Sagie Benaim, Lior Wolf

    Abstract: Anomaly detection, the task of identifying unusual samples in data, often relies on a large set of training samples. In this work, we consider the setting of few-shot anomaly detection in images, where only a few images are given at training. We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image. We further enhance the representation of o… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

  20. arXiv:2010.05785  [pdf, other

    cs.CV

    Permuted AdaIN: Reducing the Bias Towards Global Statistics in Image Classification

    Authors: Oren Nuriel, Sagie Benaim, Lior Wolf

    Abstract: Recent work has shown that convolutional neural network classifiers overly rely on texture at the expense of shape cues. We make a similar but different distinction between shape and local image cues, on the one hand, and global image statistics, on the other. Our method, called Permuted Adaptive Instance Normalization (pAdaIN), reduces the representation of global statistics in the hidden layers… ▽ More

    Submitted 23 June, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: 8 pages, 3 figures

    ACM Class: I.4.0

  21. arXiv:2006.12226  [pdf, other

    cs.CV cs.LG

    Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

    Authors: Shir Gur, Sagie Benaim, Lior Wolf

    Abstract: We consider the task of generating diverse and novel videos from a single video sample. Recently, new hierarchical patch-GAN based approaches were proposed for generating diverse images, given only a single sample at training time. Moving to videos, these approaches fail to generate diverse samples, and often collapse into generating samples similar to the training video. We introduce a novel patc… ▽ More

    Submitted 22 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

  22. arXiv:2004.12361  [pdf, other

    cs.CV cs.LG eess.IV

    Evaluation Metrics for Conditional Image Generation

    Authors: Yaniv Benny, Tomer Galanti, Sagie Benaim, Lior Wolf

    Abstract: We present two new metrics for evaluating generative models in the class-conditional image generation setting. These metrics are obtained by generalizing the two most popular unconditional metrics: the Inception Score (IS) and the Fre'chet Inception Distance (FID). A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterpart… ▽ More

    Submitted 8 February, 2021; v1 submitted 26 April, 2020; originally announced April 2020.

    Comments: To be published in "INTERNATIONAL JOURNAL OF COMPUTER VISION"

  23. arXiv:2004.06130  [pdf, other

    cs.CV

    SpeedNet: Learning the Speediness in Videos

    Authors: Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Michal Irani, Tali Dekel

    Abstract: We wish to automatically predict the "speediness" of moving objects in videos---whether they move faster, at, or slower than their "natural" speed. The core component in our approach is SpeedNet---a novel deep network trained to detect if a video is playing at normal rate, or if it is sped up. SpeedNet is trained on a large corpus of natural videos in a self-supervised manner, without requiring an… ▽ More

    Submitted 26 July, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: Accepted to CVPR 2020 (oral). Project webpage: http://speednet-cvpr20.github.io

  24. Structural-analogy from a Single Image Pair

    Authors: Sagie Benaim, Ron Mokady, Amit Bermano, Daniel Cohen-Or, Lior Wolf

    Abstract: The task of unsupervised image-to-image translation has seen substantial advancements in recent years through the use of deep neural networks. Typically, the proposed solutions learn the characterizing distribution of two large, unpaired collections of images, and are able to alter the appearance of a given image, while keeping its geometry intact. In this paper, we explore the capabilities of neu… ▽ More

    Submitted 6 January, 2021; v1 submitted 5 April, 2020; originally announced April 2020.

    Comments: Published in 'Computer Graphics Forum'

  25. arXiv:2001.05026  [pdf, other

    cs.LG stat.ML

    Unsupervised Learning of the Set of Local Maxima

    Authors: Lior Wolf, Sagie Benaim, Tomer Galanti

    Abstract: This paper describes a new form of unsupervised learning, whose input is a set of unlabeled points that are assumed to be local maxima of an unknown value function v in an unknown subset of the vector space. Two functions are learned: (i) a set indicator c, which is a binary classifier, and (ii) a comparator function h that given two nearby samples, predicts which sample has the higher value of th… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Comments: ICLR 2019

  26. arXiv:2001.05017  [pdf, other

    cs.CV cs.LG

    Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer

    Authors: Ori Press, Tomer Galanti, Sagie Benaim, Lior Wolf

    Abstract: We study the problem of learning to map, in an unsupervised way, between domains A and B, such that the samples b in B contain all the information that exists in samples a in A and some additional information. For example, ignoring occlusions, B can be people with glasses, A people without, and the glasses, would be the added information. When mapping a sample a from the first domain to the other… ▽ More

    Submitted 14 January, 2020; originally announced January 2020.

    Journal ref: ICLR 2019

  27. arXiv:1908.11628  [pdf, other

    cs.CV

    Domain Intersection and Domain Difference

    Authors: Sagie Benaim, Michael Khaitov, Tomer Galanti, Lior Wolf

    Abstract: We present a method for recovering the shared content between two visual domains as well as the content that is unique to each domain. This allows us to map from one domain to the other, in a way in which the content that is specific for the first domain is removed and the content that is specific for the second is imported from any image in the second domain. In addition, our method enables gener… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Journal ref: ICCV 2019

  28. arXiv:1906.06558  [pdf, other

    cs.CV

    Mask Based Unsupervised Content Transfer

    Authors: Ron Mokady, Sagie Benaim, Lior Wolf, Amit Bermano

    Abstract: We consider the problem of translating, in an unsupervised manner, between two domains where one contains some additional information compared to the other. The proposed method disentangles the common and separate parts of these domains and, through the generation of a mask, focuses the attention of the underlying network to the desired augmentation alone, without wastefully reconstructing the ent… ▽ More

    Submitted 13 January, 2020; v1 submitted 15 June, 2019; originally announced June 2019.

  29. arXiv:1812.06087  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Semi-Supervised Monaural Singing Voice Separation With a Masking Network Trained on Synthetic Mixtures

    Authors: Michael Michelashvili, Sagie Benaim, Lior Wolf

    Abstract: We study the problem of semi-supervised singing voice separation, in which the training data contains a set of samples of mixed music (singing and instrumental) and an unmatched set of instrumental music. Our solution employs a single mapping function g, which, applied to a mixed sample, recovers the underlying instrumental music, and, applied to an instrumental sample, returns the same sample. Th… ▽ More

    Submitted 6 May, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

  30. arXiv:1807.08501  [pdf, other

    cs.LG stat.ML

    Risk Bounds for Unsupervised Cross-Domain Mapping with IPMs

    Authors: Tomer Galanti, Sagie Benaim, Lior Wolf

    Abstract: The recent empirical success of unsupervised cross-domain mapping algorithms, between two domains that share common characteristics, is not well-supported by theoretical justifications. This lacuna is especially troubling, given the clear ambiguity in such mappings. We work with adversarial training methods based on IPMs and derive a novel risk bound, which upper bounds the risk between the lear… ▽ More

    Submitted 2 November, 2020; v1 submitted 23 July, 2018; originally announced July 2018.

    Comments: arXiv admin note: text overlap with arXiv:1709.00074

  31. arXiv:1806.06029  [pdf, other

    cs.CV

    One-Shot Unsupervised Cross Domain Translation

    Authors: Sagie Benaim, Lior Wolf

    Abstract: Given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B. We argue that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and present empirical evidence that the existing unsupervised domain translation methods fail on this task. Our method follows a two step process. First, a va… ▽ More

    Submitted 23 October, 2018; v1 submitted 15 June, 2018; originally announced June 2018.

    Comments: Published at NIPS 2018

  32. arXiv:1712.07886  [pdf, other

    cs.LG

    Estimating the Success of Unsupervised Image to Image Translation

    Authors: Sagie Benaim, Tomer Galanti, Lior Wolf

    Abstract: While in supervised learning, the validation error is an unbiased estimator of the generalization (test) error and complexity-based generalization bounds are abundant, no such bounds exist for learning a mapping in an unsupervised way. As a result, when training GANs and specifically when using GANs for learning to map between domains in a completely unsupervised way, one is forced to select the h… ▽ More

    Submitted 22 March, 2018; v1 submitted 21 December, 2017; originally announced December 2017.

    Comments: The first and second authors contributed equally

  33. arXiv:1709.00074  [pdf, other

    cs.LG

    The Role of Minimal Complexity Functions in Unsupervised Learning of Semantic Mappings

    Authors: Tomer Galanti, Lior Wolf, Sagie Benaim

    Abstract: We discuss the feasibility of the following learning problem: given unmatched samples from two domains and nothing else, learn a mapping between the two, which preserves semantics. Due to the lack of paired samples and without any definition of the semantic information, the problem might seem ill-posed. Specifically, in typical cases, it seems possible to build infinitely many alternative mappings… ▽ More

    Submitted 15 January, 2020; v1 submitted 31 August, 2017; originally announced September 2017.

  34. arXiv:1706.00826  [pdf, other

    cs.CV

    One-Sided Unsupervised Domain Mapping

    Authors: Sagie Benaim, Lior Wolf

    Abstract: In unsupervised domain mapping, the learner is given two unmatched datasets $A$ and $B$. The goal is to learn a mapping $G_{AB}$ that translates a sample in $A$ to the analog sample in $B$. Recent approaches have shown that when learning simultaneously both $G_{AB}$ and the inverse mapping $G_{BA}$, convincing mappings are obtained. In this work, we present a method of learning $G_{AB}$ without le… ▽ More

    Submitted 18 November, 2017; v1 submitted 2 June, 2017; originally announced June 2017.

    Comments: to be published in NIPS 2017

  35. arXiv:1304.6925  [pdf, other

    cs.LO

    Controlling the Depth, Size, and Number of Subtrees for Two-variable Logic on Trees

    Authors: Saguy Benaim, Michael Benedikt, Rastislav Lenhardt, James Worrell

    Abstract: Verification of properties of first order logic with two variables FO2 has been investigated in a number of contexts. Over arbitrary structures it is known to be decidable with NEXPTIME complexity, with finitely satisfiable formulas having exponential-sized models. Over word structures, where FO2 is known to have the same expressiveness as unary temporal logic, the same properties hold. Over finit… ▽ More

    Submitted 30 May, 2013; v1 submitted 25 April, 2013; originally announced April 2013.

    Comments: 28 pages