Skip to main content

Showing 1–21 of 21 results for author: Blukis, V

  1. arXiv:2410.15536  [pdf, other

    cs.RO cs.AI

    GRS: Generating Robotic Simulation Tasks from Real-World Images

    Authors: Alex Zook, Fan-Yun Sun, Josef Spjut, Valts Blukis, Stan Birchfield, Jonathan Tremblay

    Abstract: We introduce GRS (Generating Robotic Simulation tasks), a novel system to address the challenge of real-to-sim in robotics, computer vision, and AR/VR. GRS enables the creation of digital twin simulations from single real-world RGB-D observations, complete with diverse, solvable tasks for virtual agent training. We use state-of-the-art vision-language models (VLMs) to achieve a comprehensive real-… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  2. arXiv:2406.18158  [pdf, other

    cs.RO cs.CV

    3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

    Authors: Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal

    Abstract: Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics applications require 3D scene understanding. In this work, we propose 3D-MVP, a novel approach for 3D multi-view pretraining using masked autoencoders. We leverage R… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.10721  [pdf, other

    cs.RO cs.AI cs.CV

    RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

    Authors: Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

    Abstract: From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  4. arXiv:2406.08545  [pdf, other

    cs.RO cs.AI cs.CV

    RVT-2: Learning Precise Manipulation from Few Demonstrations

    Authors: Ankit Goyal, Valts Blukis, Jie Xu, Yijie Guo, Yu-Wei Chao, Dieter Fox

    Abstract: In this work, we study how to build a robotic system that can solve multiple 3D manipulation tasks given language instructions. To be useful in industrial and household domains, such a system should be capable of learning new tasks with few demonstrations and solving them precisely. Prior works, like PerAct and RVT, have studied this problem, however, they often struggle with tasks requiring high… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to RSS 2024

  5. arXiv:2404.01440  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

    Authors: Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield

    Abstract: We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associa… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  6. arXiv:2403.20275  [pdf, other

    cs.CV cs.RO

    Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

    Authors: Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison

    Abstract: Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 17 pages

  7. arXiv:2310.17274  [pdf, other

    cs.RO cs.AR cs.DC

    cuRobo: Parallelized Collision-Free Minimum-Jerk Robot Motion Generation

    Authors: Balakumar Sundaralingam, Siva Kumar Sastry Hari, Adam Fishman, Caelan Garrett, Karl Van Wyk, Valts Blukis, Alexander Millane, Helen Oleynikova, Ankur Handa, Fabio Ramos, Nathan Ratliff, Dieter Fox

    Abstract: This paper explores the problem of collision-free motion generation for manipulators by formulating it as a global motion optimization problem. We develop a parallel optimization technique to solve this problem and demonstrate its effectiveness on massively parallel GPUs. We show that combining simple optimization techniques with many parallel seeds leads to solving difficult motion generation pro… ▽ More

    Submitted 3 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: revised technical report, 62 pages, Website: https://curobo.org

  8. arXiv:2310.00463  [pdf, other

    cs.CV cs.RO

    Diff-DOPE: Differentiable Deep Object Pose Estimation

    Authors: Jonathan Tremblay, Bowen Wen, Valts Blukis, Balakumar Sundaralingam, Stephen Tyree, Stan Birchfield

    Abstract: We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object. The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model. We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: Submitted to ICRA 2023. Project page is at https://diffdope.github.io

  9. arXiv:2306.14896  [pdf, other

    cs.RO cs.CV

    RVT: Robotic View Transformer for 3D Object Manipulation

    Authors: Ankit Goyal, Jie Xu, Yijie Guo, Valts Blukis, Yu-Wei Chao, Dieter Fox

    Abstract: For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  10. arXiv:2304.00673  [pdf, other

    cs.CV

    Partial-View Object View Synthesis via Filtered Inversion

    Authors: Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

    Abstract: We propose Filtering Inversion (FINV), a learning framework and optimization process that predicts a renderable 3D object representation from one or few partial views. FINV addresses the challenge of synthesizing novel views of objects from partial observations, spanning cases where the object is not entirely in view, is partially occluded, or is only observed from similar views. To achieve this,… ▽ More

    Submitted 17 August, 2024; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: project website: http://cs.stanford.edu/~sunfanyun/finv

  11. arXiv:2303.16730  [pdf, other

    cs.CV

    TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation

    Authors: Taeyeop Lee, Jonathan Tremblay, Valts Blukis, Bowen Wen, Byeong-Uk Lee, Inkyu Shin, Stan Birchfield, In So Kweon, Kuk-Jin Yoon

    Abstract: Test-time adaptation methods have been gaining attention recently as a practical solution for addressing source-to-target domain gaps by gradually updating the model without requiring labels on the target data. In this paper, we propose a method of test-time adaptation for category-level object pose estimation called TTA-COPE. We design a pose ensemble approach with a self-training loss using pose… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023, Project page: https://taeyeop.com/ttacope

  12. arXiv:2303.14158  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

    Authors: Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, Stan Birchfield

    Abstract: We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is ma… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  13. arXiv:2210.12126  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    One-Shot Neural Fields for 3D Object Understanding

    Authors: Valts Blukis, Taeyeop Lee, Jonathan Tremblay, Bowen Wen, In So Kweon, Kuk-Jin Yoon, Dieter Fox, Stan Birchfield

    Abstract: We present a unified and compact scene representation for robotics, where each object in the scene is depicted by a latent code capturing geometry and appearance. This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction (e.g. recovering depth, point clouds, or voxel maps), collision checking, and stable grasp prediction. We build our representation from… ▽ More

    Submitted 8 August, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) on XRNeRF: Advances in NeRF for the Metaverse 2023

  14. arXiv:2209.11302  [pdf, other

    cs.RO cs.AI cs.CL cs.LG

    ProgPrompt: Generating Situated Robot Task Plans using Large Language Models

    Authors: Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, Animesh Garg

    Abstract: Task planning can require defining myriad domain knowledge about the world in which a robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to score potential next actions during task planning, and even generate action sequences directly, given an instruction in natural language with no additional domain information. However, such methods either require enumeratin… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  15. arXiv:2204.05186  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Correcting Robot Plans with Natural Language Feedback

    Authors: Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, Dieter Fox

    Abstract: When humans design cost or goal specifications for robots, they often produce specifications that are ambiguous, underspecified, or beyond planners' ability to solve. In these cases, corrections provide a valuable tool for human-in-the-loop robot control. Corrections might take the form of new goal specifications, new constraints (e.g. to avoid specific objects), or hints for planning algorithms (… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: 10 pages, 13 figures

  16. arXiv:2107.05612  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution

    Authors: Valts Blukis, Chris Paxton, Dieter Fox, Animesh Garg, Yoav Artzi

    Abstract: Natural language provides an accessible and expressive interface to specify long-term tasks for robotic agents. However, non-experts are likely to specify such tasks with high-level instructions, which abstract over specific robot actions through several layers of abstraction. We propose that key to bridging this gap between language and robot actions over long execution horizons are persistent re… ▽ More

    Submitted 28 November, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: Presented at CoRL 2021

  17. arXiv:2011.07384  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Few-shot Object Grounding and Mapping for Natural Language Robot Instruction Following

    Authors: Valts Blukis, Ross A. Knepper, Yoav Artzi

    Abstract: We study the problem of learning a robot policy to follow natural language instructions that can be easily extended to reason about new objects. We introduce a few-shot language-conditioned object grounding method trained from augmented reality data that uses exemplars to identify objects and align them to their mentions in instructions. We present a learned map representation that encodes object… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

    Comments: 4th Conference on Robot Learning (CoRL 2020), Cambridge MA, USA

  18. arXiv:1910.09664  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight

    Authors: Valts Blukis, Yannick Terme, Eyvind Niklasson, Ross A. Knepper, Yoav Artzi

    Abstract: We propose a joint simulation and real-world learning framework for mapping navigation instructions and raw first-person observations to continuous control. Our model estimates the need for environment exploration, predicts the likelihood of visiting environment positions during execution, and controls the agent to both explore and visit high-likelihood positions. We introduce Supervised Reinforce… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: Conference on Robot Learning (CoRL) 2019

  19. arXiv:1811.04179  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction

    Authors: Valts Blukis, Dipendra Misra, Ross A. Knepper, Yoav Artzi

    Abstract: We propose an approach for mapping natural language instructions and raw observations to continuous control of a quadcopter drone. Our model predicts interpretable position-visitation distributions indicating where the agent should go during execution and where it should stop, and uses the predicted distributions to select the actions to execute. This two-step model decomposition allows for simple… ▽ More

    Submitted 10 December, 2018; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: Appeared in Conference on Robot Learning 2018

    Journal ref: In Conference on Robot Learning (pp. 505-518) (2018)

  20. arXiv:1809.00786  [pdf, other

    cs.CL

    Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction

    Authors: Dipendra Misra, Andrew Bennett, Valts Blukis, Eyvind Niklasson, Max Shatkhin, Yoav Artzi

    Abstract: We propose to decompose instruction execution to goal prediction and action generation. We design a model that maps raw visual observations to goals using LINGUNET, a language-conditioned image generation network, and then generates the actions required to complete them. Our model is trained from demonstration only without external resources. To evaluate our approach, we introduce two benchmarks f… ▽ More

    Submitted 18 March, 2019; v1 submitted 3 September, 2018; originally announced September 2018.

    Comments: Accepted at EMNLP 2018

  21. arXiv:1806.00047  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning

    Authors: Valts Blukis, Nataly Brukhim, Andrew Bennett, Ross A. Knepper, Yoav Artzi

    Abstract: We introduce a method for following high-level navigation instructions by mapping directly from images, instructions and pose estimates to continuous low-level velocity commands for real-time control. The Grounded Semantic Mapping Network (GSMN) is a fully-differentiable neural network architecture that builds an explicit semantic map in the world reference frame by incorporating a pinhole camera… ▽ More

    Submitted 31 May, 2018; originally announced June 2018.

    Comments: To appear in Robotics: Science and Systems (RSS), 2018