Skip to main content

Showing 1–16 of 16 results for author: Vandenhende, S

  1. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  2. arXiv:2309.15807  [pdf, other

    cs.CV

    Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

    Authors: Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda , et al. (1 additional authors not shown)

    Abstract: Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusivel… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  3. arXiv:2301.02280  [pdf, other

    cs.CV

    Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training

    Authors: Filip Radenovic, Abhimanyu Dubey, Abhishek Kadian, Todor Mihaylov, Simon Vandenhende, Yash Patel, Yi Wen, Vignesh Ramanathan, Dhruv Mahajan

    Abstract: Vision-language models trained with contrastive learning on large-scale noisy data are becoming increasingly popular for zero-shot recognition problems. In this paper we improve the following three aspects of the contrastive pre-training pipeline: dataset noise, model initialization and the training objective. First, we propose a straightforward filtering strategy titled Complexity, Action, and Te… ▽ More

    Submitted 29 March, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: CVPR 2023

  4. arXiv:2206.06363  [pdf, other

    cs.CV cs.LG

    Discovering Object Masks with Transformers for Unsupervised Semantic Segmentation

    Authors: Wouter Van Gansbeke, Simon Vandenhende, Luc Van Gool

    Abstract: The task of unsupervised semantic segmentation aims to cluster pixels into semantically meaningful groups. Specifically, pixels assigned to the same cluster should share high-level semantic properties like their object or part category. This paper presents MaskDistill: a novel framework for unsupervised semantic segmentation based on three key ideas. First, we advocate a data-driven strategy to ge… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Code: https://github.com/wvangansbeke/MaskDistill

  5. arXiv:2203.14896  [pdf, other

    cs.CV cs.AI cs.LG

    Multi-Task Learning for Visual Scene Understanding

    Authors: Simon Vandenhende

    Abstract: Despite the recent progress in deep learning, most approaches still go for a silo-like solution, focusing on learning each task in isolation: training a separate neural network for each individual task. Many real-world problems, however, call for a multi-modal approach and, therefore, for multi-tasking models. Multi-task learning (MTL) aims to leverage useful information across tasks to improve th… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: PhD Thesis

  6. arXiv:2203.12892  [pdf, other

    cs.CV

    Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals

    Authors: Simon Vandenhende, Dhruv Mahajan, Filip Radenovic, Deepti Ghadiyaram

    Abstract: A visual counterfactual explanation replaces image regions in a query image with regions from a distractor image such that the system's decision on the transformed image changes to the distractor class. In this work, we present a novel framework for computing visual counterfactual explanations based on two key ideas. First, we enforce that the replaced and replacer regions contain the same semanti… ▽ More

    Submitted 16 July, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Camera-ready version ECCV 2022

  7. arXiv:2106.05967  [pdf, other

    cs.CV cs.LG

    Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

    Authors: Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc Van Gool

    Abstract: Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. In this paper, we first study how biases in the dataset affect existing methods. Our results show that current contrastive approaches work surprisingly well across: (i) o… ▽ More

    Submitted 14 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021. Code: https://github.com/wvangansbeke/Revisiting-Contrastive-SSL

  8. arXiv:2102.06191  [pdf, other

    cs.CV cs.LG

    Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals

    Authors: Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Luc Van Gool

    Abstract: Being able to learn dense semantic representations of images without supervision is an important problem in computer vision. However, despite its significance, this problem remains rather unexplored, with a few exceptions that considered unsupervised semantic segmentation on small-scale datasets with a narrow visual domain. In this paper, we make a first attempt to tackle the problem on datasets t… ▽ More

    Submitted 3 August, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: ICCV 2021 - Code: https://github.com/wvangansbeke/Unsupervised-Semantic-Segmentation

  9. arXiv:2009.08792  [pdf, other

    cs.CV cs.AI

    Commands 4 Autonomous Vehicles (C4AV) Workshop Summary

    Authors: Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Yu Liu, Luc Van Gool, Matthew Blaschko, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: The task of visual grounding requires locating the most relevant region or object in an image, given a natural language query. So far, progress on this task was mostly measured on curated datasets, which are not always representative of human spoken language. In this work, we deviate from recent, popular task settings and consider the problem under an autonomous vehicle scenario. In particular, we… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

  10. arXiv:2005.12320  [pdf, other

    cs.CV cs.LG

    SCAN: Learning to Classify Images without Labels

    Authors: Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool

    Abstract: Can we automatically group images into semantically meaningful clusters when ground-truth annotations are absent? The task of unsupervised image classification remains an important, and open challenge in computer vision. Several recent approaches have tried to tackle this problem in an end-to-end fashion. In this paper, we deviate from recent works, and advocate a two-step approach where feature l… ▽ More

    Submitted 3 July, 2020; v1 submitted 25 May, 2020; originally announced May 2020.

    Comments: Accepted at ECCV 2020. Includes supplementary. Code and pretrained models at https://github.com/wvangansbeke/Unsupervised-Classification

  11. arXiv:2004.13822  [pdf, other

    cs.CL cs.LG

    A Baseline for the Commands For Autonomous Vehicles Challenge

    Authors: Simon Vandenhende, Thierry Deruyttere, Dusan Grujicic

    Abstract: The Commands For Autonomous Vehicles (C4AV) challenge requires participants to solve an object referral task in a real-world setting. More specifically, we consider a scenario where a passenger can pass free-form natural language commands to a self-driving car. This problem is particularly challenging, as the language is much less constrained compared to existing benchmarks, and object references… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

    Comments: Technical Report

  12. Multi-Task Learning for Dense Prediction Tasks: A Survey

    Authors: Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, Luc Van Gool

    Abstract: With the advent of deep learning, many dense prediction tasks, i.e. tasks that produce pixel-level predictions, have seen significant performance improvements. The typical approach is to learn these tasks in isolation, that is, a separate neural network is trained for each individual task. Yet, recent multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computation… ▽ More

    Submitted 24 January, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

    Comments: Accepted to T-PAMI. Code + Suppl. Mat. can be found here: https://github.com/SimonVandenhende/Multi-Task-Learning-PyTorch IEEE Copyright Notice

  13. arXiv:2001.06902  [pdf, other

    cs.CV

    MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning

    Authors: Simon Vandenhende, Stamatios Georgoulis, Luc Van Gool

    Abstract: In this paper, we argue about the importance of considering task interactions at multiple scales when distilling task information in a multi-task learning setup. In contrast to common belief, we show that tasks with high affinity at a certain scale are not guaranteed to retain this behaviour at other scales, and vice versa. We propose a novel architecture, namely MTI-Net, that builds upon this fin… ▽ More

    Submitted 8 July, 2020; v1 submitted 19 January, 2020; originally announced January 2020.

    Comments: Accepted at ECCV2020 (spotlight) - Code: https://github.com/SimonVandenhende/Multi-Task-Learning-PyTorch

  14. arXiv:1909.10838  [pdf, other

    cs.AI cs.CL cs.RO

    Talk2Car: Taking Control of Your Self-Driving Car

    Authors: Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Luc Van Gool, Marie-Francine Moens

    Abstract: A long-term goal of artificial intelligence is to have an agent execute commands communicated through natural language. In many cases the commands are grounded in a visual environment shared by the human who gives the command and the agent. Execution of the command then requires mapping the command into the physical visual space, after which the appropriate action can be taken. In this paper we co… ▽ More

    Submitted 26 August, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

    Comments: 14 pages, accepted at emnlp-ijcnlp 2019 - Added Talk2Nav Reference

  15. arXiv:1904.02920  [pdf, other

    cs.CV

    Branched Multi-Task Networks: Deciding What Layers To Share

    Authors: Simon Vandenhende, Stamatios Georgoulis, Bert De Brabandere, Luc Van Gool

    Abstract: In the context of multi-task learning, neural networks with branched architectures have often been employed to jointly tackle the tasks at hand. Such ramified networks typically start with a number of shared layers, after which different tasks branch out into their own sequence of layers. Understandably, as the number of possible network configurations is combinatorially large, deciding what layer… ▽ More

    Submitted 13 August, 2020; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: Accepted at BMVC 2020

  16. arXiv:1903.03496  [pdf, other

    cs.CV

    A Three-Player GAN: Generating Hard Samples To Improve Classification Networks

    Authors: Simon Vandenhende, Bert De Brabandere, Davy Neven, Luc Van Gool

    Abstract: We propose a Three-Player Generative Adversarial Network to improve classification networks. In addition to the game played between the discriminator and generator, a competition is introduced between the generator and the classifier. The generator's objective is to synthesize samples that are both realistic and hard to label for the classifier. Even though we make no assumptions on the type of au… ▽ More

    Submitted 8 March, 2019; originally announced March 2019.

    Comments: Accepted for oral presentation at MVA2019