Skip to main content

Showing 1–50 of 196 results for author: Fox, D

  1. arXiv:2410.11758  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    Latent Action Pretraining from Videos

    Authors: Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Sejune Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, Lars Liden, Kimin Lee, Jianfeng Gao, Luke Zettlemoyer, Dieter Fox, Minjoon Seo

    Abstract: We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. Existing Vision-Language-Action models require action labels typically collected by human teleoperators during pretraining, which significantly limits possible data sources and scale. In this work, we propose a… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Website: https://latentactionpretraining.github.io

  2. arXiv:2410.03930  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Reverb: Open-Source ASR and Diarization from Rev

    Authors: Nishchal Bhandari, Danny Chen, Miguel Ángel del Río Fernández, Natalie Delworth, Jennifer Drexler Fox, Migüel Jetté, Quinten McNamara, Corey Miller, Ondřej Novotný, Ján Profant, Nan Qin, Martin Ratajczak, Jean-Philippe Robichaud

    Abstract: Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all exi… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  3. arXiv:2410.02193  [pdf, other

    cs.RO

    Guiding Long-Horizon Task and Motion Planning with Vision Language Models

    Authors: Zhutian Yang, Caelan Garrett, Dieter Fox, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: Vision-Language Models (VLM) can generate plausible high-level plans when prompted with a goal, the context, an image of the scene, and any planning constraints. However, there is no guarantee that the predicted actions are geometrically and kinematically feasible for a particular robot embodiment. As a result, many prerequisite steps such as opening drawers to access objects are often omitted in… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  4. arXiv:2410.01111  [pdf, other

    cs.AI cs.RO

    Learning to Build by Building Your Own Instructions

    Authors: Aaron Walsman, Muru Zhang, Adam Fishman, Ali Farhadi, Dieter Fox

    Abstract: Structural understanding of complex visual objects is an important unsolved component of artificial intelligence. To study this, we develop a new technique for the recently proposed Break-and-Make problem in LTRON where an agent must learn to build a previously unseen LEGO assembly using a single interactive session to gather information about its components and their structure. We attack this pro… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

  5. arXiv:2410.00371  [pdf, other

    cs.RO

    AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

    Authors: Jiafei Duan, Wilbert Pumacay, Nishanth Kumar, Yi Ru Wang, Shulin Tian, Wentao Yuan, Ranjay Krishna, Dieter Fox, Ajay Mandlekar, Yijie Guo

    Abstract: Robotic manipulation in open-world settings requires not only task execution but also the ability to detect and learn from failures. While recent advances in vision-language models (VLMs) and large language models (LLMs) have improved robots' spatial reasoning and problem-solving abilities, they still struggle with failure recognition, limiting their real-world applicability. We introduce AHA, an… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Appendix and details can be found in project website: https://aha-vlm.github.io/

  6. arXiv:2409.19494  [pdf, other

    cs.RO cs.CV

    OptiGrasp: Optimized Grasp Pose Detection Using RGB Images for Warehouse Picking Robots

    Authors: Soofiyan Atar, Yi Li, Markus Grotz, Michael Wolf, Dieter Fox, Joshua Smith

    Abstract: In warehouse environments, robots require robust picking capabilities to manage a wide variety of objects. Effective deployment demands minimal hardware, strong generalization to new products, and resilience in diverse settings. Current methods often rely on depth sensors for structural information, which suffer from high costs, complex setups, and technical limitations. Inspired by recent advance… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

    Comments: 8 pages, 6 figures

  7. arXiv:2408.06506  [pdf, other

    cs.RO

    TacSL: A Library for Visuotactile Sensor Simulation and Learning

    Authors: Iretiayo Akinola, Jie Xu, Jan Carius, Dieter Fox, Yashraj Narang

    Abstract: For both humans and robots, the sense of touch, known as tactile sensing, is critical for performing contact-rich manipulation tasks. Three key challenges in robotic tactile sensing are 1) interpreting sensor signals, 2) generating sensor signals in novel scenarios, and 3) learning sensor-based policies. For visuotactile sensors, interpretation has been facilitated by their close relationship with… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  8. arXiv:2408.04587  [pdf, other

    cs.RO

    FORGE: Force-Guided Exploration for Robust Contact-Rich Manipulation under Uncertainty

    Authors: Michael Noseworthy, Bingjie Tang, Bowen Wen, Ankur Handa, Nicholas Roy, Dieter Fox, Fabio Ramos, Yashraj Narang, Iretiayo Akinola

    Abstract: We present FORGE, a method that enables sim-to-real transfer of contact-rich manipulation policies in the presence of significant pose uncertainty. FORGE combines a force threshold mechanism with a dynamics randomization scheme during policy learning in simulation, to enable the robust transfer of the learned policies to the real robot. At deployment, FORGE policies, conditioned on a maximum allow… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  9. arXiv:2407.16968  [pdf, other

    stat.ML cs.AI cs.LG

    Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization

    Authors: Derek Fox, Samuel Hernandez, Qianqian Tong

    Abstract: Stochastic optimization algorithms are widely used for large-scale data analysis due to their low per-iteration costs, but they often suffer from slow asymptotic convergence caused by inherent variance. Variance-reduced techniques have been therefore used to address this issue in structured sparse models utilizing sparsity-inducing norms or $\ell_0$-norms. However, these techniques are not directl… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  10. arXiv:2407.08028  [pdf, other

    cs.RO

    AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

    Authors: Bingjie Tang, Iretiayo Akinola, Jie Xu, Bowen Wen, Ankur Handa, Karl Van Wyk, Dieter Fox, Gaurav S. Sukhatme, Fabio Ramos, Yashraj Narang

    Abstract: Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world,… ▽ More

    Submitted 31 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  11. arXiv:2407.00278  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation Tasks

    Authors: Markus Grotz, Mohit Shridhar, Tamim Asfour, Dieter Fox

    Abstract: Bimanual manipulation is challenging due to precise spatial and temporal coordination required between two arms. While there exist several real-world bimanual systems, there is a lack of simulated benchmarks with a large task diversity for systematically studying bimanual capabilities across a wide range of tabletop tasks. This paper addresses the gap by extending RLBench to bimanual manipulation.… ▽ More

    Submitted 31 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

  12. arXiv:2406.18915  [pdf, other

    cs.RO cs.CV

    Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

    Authors: Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

    Abstract: Large-scale endeavors like and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to env… ▽ More

    Submitted 29 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://robot-ma.github.io/. All supplementary material, prompts and code can be found on the project page

  13. arXiv:2406.18158  [pdf, other

    cs.RO cs.CV

    3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

    Authors: Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal

    Abstract: Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics applications require 3D scene understanding. In this work, we propose 3D-MVP, a novel approach for 3D multi-view pretraining using masked autoencoders. We leverage R… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.10721  [pdf, other

    cs.RO cs.AI cs.CV

    RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

    Authors: Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

    Abstract: From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  15. arXiv:2406.08545  [pdf, other

    cs.RO cs.AI cs.CV

    RVT-2: Learning Precise Manipulation from Few Demonstrations

    Authors: Ankit Goyal, Valts Blukis, Jie Xu, Yijie Guo, Yu-Wei Chao, Dieter Fox

    Abstract: In this work, we study how to build a robotic system that can solve multiple 3D manipulation tasks given language instructions. To be useful in industrial and household domains, such a system should be capable of learning new tasks with few demonstrations and solving them precisely. Prior works, like PerAct and RVT, have studied this problem, however, they often struggle with tasks requiring high… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to RSS 2024

  16. arXiv:2406.06995  [pdf, other

    cs.DC

    HPC Alongside User-space Kubernetes

    Authors: Vanessa Sochat, David Fox, Daniel Milroy

    Abstract: High performance computing (HPC) and cloud have traditionally been separate, and presented in an adversarial light. The conflict arises from disparate beginnings that led to two drastically different cultures, incentive structures, and communities that are now in direct competition with one another for resources, talent, and speed of innovation. With the emergence of converged computing, a new par… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures, 2 tables

  17. arXiv:2405.11656  [pdf, other

    cs.RO cs.AI

    URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

    Authors: Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, Abhishek Gupta

    Abstract: Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by… ▽ More

    Submitted 31 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted at RSS2024

  18. arXiv:2405.01472  [pdf, other

    cs.RO cs.AI

    IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning

    Authors: Ryan Hoque, Ajay Mandlekar, Caelan Garrett, Ken Goldberg, Dieter Fox

    Abstract: Imitation learning is a promising paradigm for training robot control policies, but these policies can suffer from distribution shift, where the conditions at evaluation time differ from those in the training data. A popular approach for increasing policy robustness to distribution shift is interactive imitation learning (i.e., DAgger and variants), where a human operator provides corrective inter… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  19. arXiv:2404.12308  [pdf, other

    cs.RO cs.LG eess.SY

    ASID: Active Exploration for System Identification in Robotic Manipulation

    Authors: Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Patrick Yin, Dieter Fox, Abhishek Gupta

    Abstract: Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, such methods can be sample inefficient, making them impractical in many real-world domains. On the other hand, model-based control techniques leveraging accura… ▽ More

    Submitted 26 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Project website at https://weirdlabuw.github.io/asid

  20. arXiv:2404.07428  [pdf, other

    cs.RO cs.LG

    AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

    Authors: Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg

    Abstract: Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstra… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  21. arXiv:2404.06089  [pdf, other

    cs.HC cs.RO

    EVE: Enabling Anyone to Train Robots using Augmented Reality

    Authors: Jun Wang, Chun-Cheng Chang, Jiafei Duan, Dieter Fox, Ranjay Krishna

    Abstract: The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training a robot to automate a task requires expensive trajectory data where a trained human annotator moves a physical robot to train it. Consequently, only those with access to robots produce demonstrations to train robots. In this work, we remove this restriction with EVE,… ▽ More

    Submitted 3 August, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 13 pages, UIST 2024

  22. arXiv:2404.03336  [pdf, other

    cs.RO

    Scaling Population-Based Reinforcement Learning with GPU Accelerated Simulation

    Authors: Asad Ali Shahid, Yashraj Narang, Vincenzo Petrone, Enrico Ferrentino, Ankur Handa, Dieter Fox, Marco Pavone, Loris Roveda

    Abstract: In recent years, deep reinforcement learning (RL) has shown its effectiveness in solving complex continuous control tasks like locomotion and dexterous manipulation. However, this comes at the cost of an enormous amount of experience required for training, exacerbated by the sensitivity of learning efficiency and the policy performance to hyperparameter selection, which often requires numerous tri… ▽ More

    Submitted 24 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Submitted for publication to IEEE-RAS 23rd International Conference on Humanoid Robots

  23. arXiv:2404.01440  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

    Authors: Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield

    Abstract: We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associa… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  24. arXiv:2402.08191  [pdf, other

    cs.RO cs.AI cs.LG

    THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

    Authors: Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, Dieter Fox

    Abstract: To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions. Unfortunately, a majority of studies evaluate robot performance in environments closely resembling or even identical to the training setup. We present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse manipulation tasks, that enabl… ▽ More

    Submitted 27 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: RSS 2024. 33 pages

  25. arXiv:2402.02612  [pdf, other

    cs.RO

    Fast Explicit-Input Assistance for Teleoperation in Clutter

    Authors: Nick Walker, Xuning Yang, Animesh Garg, Maya Cakmak, Dieter Fox, Claudia Pérez-D'Arpino

    Abstract: The performance of prediction-based assistance for robot teleoperation degrades in unseen or goal-rich environments due to incorrect or quickly-changing intent inferences. Poor predictions can confuse operators or cause them to change their control input to implicitly signal their goal. We present a new assistance interface for robotic manipulation where an operator can explicitly communicate a ma… ▽ More

    Submitted 7 October, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  26. arXiv:2311.02337  [pdf, other

    cs.RO cs.AI cs.CV

    STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots

    Authors: Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox

    Abstract: Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objec… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: CoRL 2023, project page: https://sites.google.com/view/stow-corl23

  27. arXiv:2311.00926  [pdf, other

    cs.RO cs.AI cs.CV

    M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place

    Authors: Wentao Yuan, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

    Abstract: With the advent of large language models and large-scale robotic datasets, there has been tremendous progress in high-level decision-making for object manipulation. These generic models are able to interpret complex tasks using language commands, but they often have difficulties generalizing to out-of-distribution objects due to the inability of low-level action primitives. In contrast, existing t… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: 12 pages, 8 figures, accepted by CoRL 2023

  28. arXiv:2310.17596  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

    Authors: Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, Dieter Fox

    Abstract: Imitation learning from a large set of human demonstrations has proved to be an effective paradigm for building capable robot agents. However, the demonstrations can be extremely costly and time-consuming to collect. We introduce MimicGen, a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations by adapting them to new contexts. We use Mim… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Conference on Robot Learning (CoRL) 2023

  29. arXiv:2310.17274  [pdf, other

    cs.RO cs.AR cs.DC

    cuRobo: Parallelized Collision-Free Minimum-Jerk Robot Motion Generation

    Authors: Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, Alexander Millane, Helen Oleynikova, Ankur Handa, Fabio Ramos, Nathan Ratliff, Dieter Fox

    Abstract: This paper explores the problem of collision-free motion generation for manipulators by formulating it as a global motion optimization problem. We develop a parallel optimization technique to solve this problem and demonstrate its effectiveness on massively parallel GPUs. We show that combining simple optimization techniques with many parallel seeds leads to solving difficult motion generation pro… ▽ More

    Submitted 3 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: revised technical report, 62 pages, Website: https://curobo.org

  30. arXiv:2310.16014  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Human-in-the-Loop Task and Motion Planning for Imitation Learning

    Authors: Ajay Mandlekar, Caelan Garrett, Danfei Xu, Dieter Fox

    Abstract: Imitation learning from human demonstrations can teach robots complex manipulation skills, but is time-consuming and labor intensive. In contrast, Task and Motion Planning (TAMP) systems are automated and excel at solving long-horizon tasks, but they are difficult to apply to contact-rich tasks. In this paper, we present Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a novel system that l… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Conference on Robot Learning (CoRL) 2023

  31. arXiv:2310.07018  [pdf, other

    cs.CL cs.AI cs.RO

    NEWTON: Are Large Language Models Capable of Physical Reasoning?

    Authors: Yi Ru Wang, Jiafei Duan, Dieter Fox, Siddhartha Srinivasa

    Abstract: Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repositor… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings; 8 pages, 3 figures, 7 tables; Project page: https://newtonreasoning.github.io

  32. arXiv:2309.15013  [pdf, other

    cs.CL cs.SD eess.AS

    Updated Corpora and Benchmarks for Long-Form Speech Recognition

    Authors: Jennifer Drexler Fox, Desh Raj, Natalie Delworth, Quinn McNamara, Corey Miller, Migüel Jetté

    Abstract: The vast majority of ASR research uses corpora in which both the training and test data have been pre-segmented into utterances. In most real-word ASR use-cases, however, test audio is not segmented, leading to a mismatch between inference-time conditions and models trained on segmented utterances. In this paper, we re-release three standard ASR corpora - TED-LIUM 3, Gigapeech, and VoxPopuli-en -… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  33. arXiv:2307.04751  [pdf, other

    cs.RO cs.CV cs.LG

    Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

    Authors: Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox

    Abstract: We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Project page: https://anthonysimeonov.github.io/rpdiff-multi-modal/

  34. arXiv:2307.04577  [pdf, other

    cs.RO cs.CV cs.LG

    AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

    Authors: Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, Dieter Fox

    Abstract: Vision-based teleoperation offers the possibility to endow robots with human-level intelligence to physically interact with the environment, while only requiring low-cost camera sensors. However, current vision-based teleoperation systems are designed and engineered towards a particular robot model and deploy environment, which scales poorly as the pool of the robot models expands and the variety… ▽ More

    Submitted 16 May, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: http://anyteleop.com/ Robotics: Science and Systems 2023

  35. arXiv:2307.04427  [pdf, other

    astro-ph.HE astro-ph.GA cs.LG

    Observation of high-energy neutrinos from the Galactic plane

    Authors: R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., S. W. Barwick, V. Basu, S. Baur, R. Bay, J. J. Beatty, K. -H. Becker, J. Becker Tjus , et al. (364 additional authors not shown)

    Abstract: The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrin… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Submitted on May 12th, 2022; Accepted on May 4th, 2023

    Journal ref: Science 380, 6652, 1338-1343 (2023)

  36. arXiv:2307.04040  [pdf, other

    cs.RO

    Meta-Policy Learning over Plan Ensembles for Robust Articulated Object Manipulation

    Authors: Constantinos Chamzas, Caelan Garrett, Balakumar Sundaralingam, Lydia E. Kavraki, Dieter Fox

    Abstract: Recent work has shown that complex manipulation skills, such as pushing or pouring, can be learned through state-of-the-art learning based techniques, such as Reinforcement Learning (RL). However, these methods often have high sample-complexity, are susceptible to domain changes, and produce unsafe motions that a robot should not perform. On the other hand, purely geometric model-based planning ca… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: 5 pages, Workshop on Learning for Task and Motion Planning (RSS2023)

  37. arXiv:2306.14896  [pdf, other

    cs.RO cs.CV

    RVT: Robotic View Transformer for 3D Object Manipulation

    Authors: Ankit Goyal, Jie Xu, Yijie Guo, Valts Blukis, Yu-Wei Chao, Dieter Fox

    Abstract: For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  38. arXiv:2306.13818  [pdf, other

    cs.RO cs.CV

    AR2-D2:Training a Robot Without a Robot

    Authors: Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna

    Abstract: Diligently gathered human demonstrations serve as the unsung heroes empowering the progression of robot learning. Today, demonstrations are collected by training people to use specialized controllers, which (tele-)operate robots to manipulate a small number of objects. By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized traini… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: Project website: www.ar2d2.site

  39. arXiv:2306.13196  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability

    Authors: Xiaolin Fang, Caelan Reed Garrett, Clemens Eppner, Tomás Lozano-Pérez, Leslie Pack Kaelbling, Dieter Fox

    Abstract: Generative models such as diffusion models, excel at capturing high-dimensional distributions with diverse input modalities, e.g. robot trajectories, but are less effective at multi-step constraint reasoning. Task and Motion Planning (TAMP) approaches are suited for planning multi-step autonomous robot manipulation. However, it can be difficult to apply them to domains where the environment and it… ▽ More

    Submitted 11 October, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

  40. arXiv:2305.17110  [pdf, other

    cs.RO

    IndustReal: Transferring Contact-Rich Assembly Tasks from Simulation to Reality

    Authors: Bingjie Tang, Michael A. Lin, Iretiayo Akinola, Ankur Handa, Gaurav S. Sukhatme, Fabio Ramos, Dieter Fox, Yashraj Narang

    Abstract: Robotic assembly is a longstanding challenge, requiring contact-rich interaction and high precision and accuracy. Many applications also require adaptivity to diverse parts, poses, and environments, as well as low cycle times. In other areas of robotics, simulation is a powerful tool to develop algorithms, generate datasets, and train agents. However, simulation has had a more limited impact on as… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to Robotics: Science and Systems (RSS) 2023

  41. arXiv:2305.16309  [pdf, other

    cs.RO cs.CV cs.LG

    Imitating Task and Motion Planning with Visuomotor Transformers

    Authors: Murtaza Dalal, Ajay Mandlekar, Caelan Garrett, Ankur Handa, Ruslan Salakhutdinov, Dieter Fox

    Abstract: Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale dataset… ▽ More

    Submitted 17 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Conference on Robot Learning (CoRL) 2023. 8 pages, 5 figures, 2 tables; 11 pages appendix (10 additional figures)

  42. arXiv:2305.15225  [pdf, other

    cs.CL

    SAIL: Search-Augmented Instruction Learning

    Authors: Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass

    Abstract: Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information. In this work, we propose search-augmented instruction learning (SAIL), which grounds the language generation and instruction following abilities on complex search results generated by in-house and external search engi… ▽ More

    Submitted 25 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  43. arXiv:2304.09302  [pdf, other

    cs.RO cs.CV

    CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation

    Authors: Adithyavairavan Murali, Arsalan Mousavian, Clemens Eppner, Adam Fishman, Dieter Fox

    Abstract: We address the important problem of generalizing robotic rearrangement to clutter without any explicit object models. We first generate over 650K cluttered scenes - orders of magnitude more than prior work - in diverse everyday environments, such as cabinets and shelves. We render synthetic partial point clouds from this data and use it to train our CabiNet model architecture. CabiNet is a collisi… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  44. arXiv:2304.03728  [pdf, other

    cs.CL

    Interpretable Unified Language Checking

    Authors: Tianhua Zhang, Hongyin Luo, Yung-Sung Chuang, Wei Fang, Luc Gaitskell, Thomas Hartvigsen, Xixin Wu, Danny Fox, Helen Meng, James Glass

    Abstract: Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge. We present an interpretable, unified, language checking (UniLC) method for both human and machine-generated language that aims… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: 10 + 5 pages

  45. arXiv:2304.00673  [pdf, other

    cs.CV

    Partial-View Object View Synthesis via Filtered Inversion

    Authors: Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

    Abstract: We propose Filtering Inversion (FINV), a learning framework and optimization process that predicts a renderable 3D object representation from one or few partial views. FINV addresses the challenge of synthesizing novel views of objects from partial observations, spanning cases where the object is not entirely in view, is partially occluded, or is only observed from similar views. To achieve this,… ▽ More

    Submitted 17 August, 2024; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: project website: http://cs.stanford.edu/~sunfanyun/finv

  46. arXiv:2303.17592  [pdf, other

    cs.RO cs.CV cs.LG

    Learning Human-to-Robot Handovers from Point Clouds

    Authors: Sammy Christen, Wei Yang, Claudia Pérez-D'Arpino, Otmar Hilliges, Dieter Fox, Yu-Wei Chao

    Abstract: We propose the first framework to learn control policies for vision-based human-to-robot handovers, a critical task for human-robot interaction. While research in Embodied AI has made significant progress in training robot agents in simulated environments, interacting with humans remains challenging due to the difficulties of simulating humans. Fortunately, recent research has developed realistic… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023 as highlight. Project page at https://handover-sim2real.github.io

  47. arXiv:2303.16138  [pdf, other

    cs.RO

    DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets

    Authors: Isabella Huang, Yashraj Narang, Ruzena Bajcsy, Fabio Ramos, Tucker Hermans, Dieter Fox

    Abstract: Robotic grasping of 3D deformable objects is critical for real-world applications such as food handling and robotic surgery. Unlike rigid and articulated objects, 3D deformable objects have infinite degrees of freedom. Fully defining their state requires 3D deformation and stress fields, which are exceptionally difficult to analytically compute or experimentally measure. Thus, evaluating grasp can… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: To be published in the IEEE Conference on Robotics and Automation (ICRA), 2023

  48. arXiv:2303.15771  [pdf, other

    cs.RO

    TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

    Authors: Xiangyun Meng, Nathan Hatch, Alexander Lambert, Anqi Li, Nolan Wagener, Matthew Schmittle, JoonHo Lee, Wentao Yuan, Zoey Chen, Samuel Deng, Greg Okopal, Dieter Fox, Byron Boots, Amirreza Shaban

    Abstract: Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for se… ▽ More

    Submitted 29 May, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

  49. arXiv:2303.14158  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

    Authors: Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, Stan Birchfield

    Abstract: We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is ma… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  50. arXiv:2302.10745  [pdf, other

    cs.RO

    Constrained Generative Sampling of 6-DoF Grasps

    Authors: Jens Lundell, Francesco Verdoja, Tran Nguyen Le, Arsalan Mousavian, Dieter Fox, Ville Kyrki

    Abstract: Most state-of-the-art data-driven grasp sampling methods propose stable and collision-free grasps uniformly on the target object. For bin-picking, executing any of those reachable grasps is sufficient. However, for completing specific tasks, such as squeezing out liquid from a bottle, we want the grasp to be on a specific part of the object's body while avoiding other locations, such as the cap. T… ▽ More

    Submitted 17 August, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted at the International Conference on Intelligent Robots and Systems (IROS 2023)