-
Impact of the Three-Child Policy and Delayed Retirement on the Transfer of Surplus Rural Labor under Xi Jinping's New Population Vision: A Re-examination of China's Lewis Turning Point
Authors:
Jun Dai,
Guanqing Shi,
Xiaoke Xie,
Aitong Xie
Abstract:
Chinese-style modernization involves the modernization of a large population, requiring top-level design in terms of scale and structure. The population perspective in Xi Jinping's Thought on Socialism with Chinese Characteristics for a New Era serves as the fundamental guide for population policies. The three-child policy and delayed retirement will affect the supply of labor in China and challen…
▽ More
Chinese-style modernization involves the modernization of a large population, requiring top-level design in terms of scale and structure. The population perspective in Xi Jinping's Thought on Socialism with Chinese Characteristics for a New Era serves as the fundamental guide for population policies. The three-child policy and delayed retirement will affect the supply of labor in China and challenge the previous assessments of China's Lewis Turning Point. This study examines the rural surplus labor transfer from 2013 to 2022 based on urban and rural data. The results indicate that China's overall wage levels have continuously increased, the urban-rural income gap has narrowed, and the transfer of surplus rural labor has slowed. China has passed the first turning point and entered a transitional phase. Factors such as the level of agricultural mechanization, urbanization rate, and urban-rural income gap are more significant in influencing the transfer of surplus labor than the normal working-age population ratio. The delayed retirement policy has a more immediate impact on the supply and transfer of rural surplus labor than the three-child policy. Additionally, delayed retirement can offset the negative impact of the reduced relative surplus labor supply caused by the three-child policy, although the three-child policy could increase the future absolute surplus labor supply.
△ Less
Submitted 17 October, 2024; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling
Authors:
Yuejiang Liu,
Jubayer Ibn Hamid,
Annie Xie,
Yoonho Lee,
Maximilian Du,
Chelsea Finn
Abstract:
Predicting and executing a sequence of actions without intermediate replanning, known as action chunking, is increasingly used in robot learning from human demonstrations. Yet, its reported effects on the learned policy are inconsistent: some studies find it crucial for achieving strong results, while others observe decreased performance. In this paper, we first dissect how action chunking impacts…
▽ More
Predicting and executing a sequence of actions without intermediate replanning, known as action chunking, is increasingly used in robot learning from human demonstrations. Yet, its reported effects on the learned policy are inconsistent: some studies find it crucial for achieving strong results, while others observe decreased performance. In this paper, we first dissect how action chunking impacts the divergence between a learner and a demonstrator. We find that action chunking allows the learner to better capture the temporal dependencies in demonstrations but at the cost of reduced reactivity in stochastic environments. To address this tradeoff, we propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop operations. BID samples multiple predictions at each time step and searches for the optimal one based on two criteria: (i) backward coherence, which favors samples that align with previous decisions; (ii) forward contrast, which seeks samples of high likelihood for future plans. By coupling decisions within and across action chunks, BID promotes consistency over time while maintaining reactivity to unexpected changes. Experimental results show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks. Code and videos are available at https://bid-robot.github.io.
△ Less
Submitted 21 October, 2024; v1 submitted 30 August, 2024;
originally announced August 2024.
-
FlowRetrieval: Flow-Guided Data Retrieval for Few-Shot Imitation Learning
Authors:
Li-Heng Lin,
Yuchen Cui,
Amber Xie,
Tianyu Hua,
Dorsa Sadigh
Abstract:
Few-shot imitation learning relies on only a small amount of task-specific demonstrations to efficiently adapt a policy for a given downstream tasks. Retrieval-based methods come with a promise of retrieving relevant past experiences to augment this target data when learning policies. However, existing data retrieval methods fall under two extremes: they either rely on the existence of exact behav…
▽ More
Few-shot imitation learning relies on only a small amount of task-specific demonstrations to efficiently adapt a policy for a given downstream tasks. Retrieval-based methods come with a promise of retrieving relevant past experiences to augment this target data when learning policies. However, existing data retrieval methods fall under two extremes: they either rely on the existence of exact behaviors with visually similar scenes in the prior data, which is impractical to assume; or they retrieve based on semantic similarity of high-level language descriptions of the task, which might not be that informative about the shared low-level behaviors or motions across tasks that is often a more important factor for retrieving relevant data for policy learning. In this work, we investigate how we can leverage motion similarity in the vast amount of cross-task data to improve few-shot imitation learning of the target task. Our key insight is that motion-similar data carries rich information about the effects of actions and object interactions that can be leveraged during few-shot adaptation. We propose FlowRetrieval, an approach that leverages optical flow representations for both extracting similar motions to target tasks from prior data, and for guiding learning of a policy that can maximally benefit from such data. Our results show FlowRetrieval significantly outperforms prior methods across simulated and real-world domains, achieving on average 27% higher success rate than the best retrieval-based prior method. In the Pen-in-Cup task with a real Franka Emika robot, FlowRetrieval achieves 3.7x the performance of the baseline imitation learning technique that learns from all prior and target data. Website: https://flow-retrieval.github.io
△ Less
Submitted 11 October, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Selective-injection GaN Heterojunction Bipolar Transistors with 275 kA/cm$^2$ Current Density
Authors:
Zhanbo Xia,
Chandan Joishi,
Shahadat H. Sohel,
Andy Xie,
Edward Beam,
Yu Cao,
Siddharth Rajan
Abstract:
We design and demonstrate selective injection GaN heterojunction bipolar transistors that utilize a patterned base for selective injection of electrons from the emitter. The design maneuvers minority carrier injection through a thin p-GaN base region, while the majority carrier holes for base current are injected from thick p-GaN regions adjacent to the thin p-GaN base. The design is realized usin…
▽ More
We design and demonstrate selective injection GaN heterojunction bipolar transistors that utilize a patterned base for selective injection of electrons from the emitter. The design maneuvers minority carrier injection through a thin p-GaN base region, while the majority carrier holes for base current are injected from thick p-GaN regions adjacent to the thin p-GaN base. The design is realized using a regrowth emitter approach with SiO$_2$ as a spacer between the emitter layer and the thick p-GaN base contact regions. The fabricated device demonstrated state-of-art output current density (I$_{C, max}$) ~275 kA/cm$^2$ with a current gain ($β$) of 9, and 17 for the planar HBT design (I$_{C, max}$ =150 kA/cm$^2$). The reported results highlight the potential of the selective injection design to overcome the persistent GaN HBT design tradeoff between base resistance and current gain, paving the way for next-generation radio frequency and mm-Wave applications.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
An Efficient Replay for Class-Incremental Learning with Pre-trained Models
Authors:
Weimin Yin,
Bin Chen adn Chunzhao Xie,
Zhenhao Tan
Abstract:
In general class-incremental learning, researchers typically use sample sets as a tool to avoid catastrophic forgetting during continuous learning. At the same time, researchers have also noted the differences between class-incremental learning and Oracle training and have attempted to make corrections. In recent years, researchers have begun to develop class-incremental learning algorithms utiliz…
▽ More
In general class-incremental learning, researchers typically use sample sets as a tool to avoid catastrophic forgetting during continuous learning. At the same time, researchers have also noted the differences between class-incremental learning and Oracle training and have attempted to make corrections. In recent years, researchers have begun to develop class-incremental learning algorithms utilizing pre-trained models, achieving significant results. This paper observes that in class-incremental learning, the steady state among the weight guided by each class center is disrupted, which is significantly correlated with catastrophic forgetting. Based on this, we propose a new method to overcoming forgetting . In some cases, by retaining only a single sample unit of each class in memory for replay and applying simple gradient constraints, very good results can be achieved. Experimental results indicate that under the condition of pre-trained models, our method can achieve competitive performance with very low computational cost and by simply using the cross-entropy loss.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Data-driven approach to mixed-state multipartite entanglement characterisation
Authors:
Eric Brunner,
Aaron Xie,
Gabriel Dufour,
Andreas Buchleitner
Abstract:
We develop a statistical framework, based on a manifold learning embedding, to extract relevant features of multipartite entanglement structures of mixed quantum states from the measurable correlation data of a quantum computer. We show that the statistics of the measured correlators contains sufficient information to characterise the entanglement, and to quantify the mixedness of the state of the…
▽ More
We develop a statistical framework, based on a manifold learning embedding, to extract relevant features of multipartite entanglement structures of mixed quantum states from the measurable correlation data of a quantum computer. We show that the statistics of the measured correlators contains sufficient information to characterise the entanglement, and to quantify the mixedness of the state of the computer's register. The transition to the maximally mixed regime, in the embedding space, displays a sharp boundary between entangled and separable states. Away from this boundary, the multipartite entanglement structure is robust to finite noise.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Affordance-Guided Reinforcement Learning via Visual Prompting
Authors:
Olivia Y. Lee,
Annie Xie,
Kuan Fang,
Karl Pertsch,
Chelsea Finn
Abstract:
Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, th…
▽ More
Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics that can perform visual reasoning in physical contexts and generate coarse robot motions for manipulation tasks. Motivated by this range of capability, in this work, we present Keypoint-based Affordance Guidance for Improvements (KAGI), a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. State-of-the-art VLMs have demonstrated impressive reasoning about affordances through keypoints in zero-shot, and we use these to define dense rewards that guide autonomous robotic learning. On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps. Additionally, we demonstrate the robustness of KAGI to reductions in the number of in-domain demonstrations used for pre-training, reaching similar performance in 35K online fine-tuning steps. Project website: https://sites.google.com/view/affordance-guided-rl
△ Less
Submitted 1 October, 2024; v1 submitted 14 July, 2024;
originally announced July 2024.
-
Revision Matters: Generative Design Guided by Revision Edits
Authors:
Tao Li,
Chin-Yi Cheng,
Amber Xie,
Gang Li,
Yang Li
Abstract:
Layout design, such as user interface or graphical layout in general, is fundamentally an iterative revision process. Through revising a design repeatedly, the designer converges on an ideal layout. In this paper, we investigate how revision edits from human designer can benefit a multimodal generative model. To do so, we curate an expert dataset that traces how human designers iteratively edit an…
▽ More
Layout design, such as user interface or graphical layout in general, is fundamentally an iterative revision process. Through revising a design repeatedly, the designer converges on an ideal layout. In this paper, we investigate how revision edits from human designer can benefit a multimodal generative model. To do so, we curate an expert dataset that traces how human designers iteratively edit and improve a layout generation with a prompted language goal. Based on such data, we explore various supervised fine-tuning task setups on top of a Gemini multimodal backbone, a large multimodal model. Our results show that human revision plays a critical role in iterative layout refinement. While being noisy, expert revision edits lead our model to a surprisingly strong design FID score ~10 which is close to human performance (~6). In contrast, self-revisions that fully rely on model's own judgement, lead to an echo chamber that prevents iterative improvement, and sometimes leads to generative degradation. Fortunately, we found that providing human guidance plays at early stage plays a critical role in final generation. In such human-in-the-loop scenario, our work paves the way for iterative design revision based on pre-trained large multimodal models.
△ Less
Submitted 27 May, 2024;
originally announced June 2024.
-
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Authors:
Sean Welleck,
Amanda Bertsch,
Matthew Finlayson,
Hailey Schoelkopf,
Alex Xie,
Graham Neubig,
Ilia Kulikov,
Zaid Harchaoui
Abstract:
One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, m…
▽ More
One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
STARD: A Chinese Statute Retrieval Dataset with Real Queries Issued by Non-professionals
Authors:
Weihang Su,
Yiran Hu,
Anzhe Xie,
Qingyao Ai,
Zibing Que,
Ning Zheng,
Yun Liu,
Weixing Shen,
Yiqun Liu
Abstract:
Statute retrieval aims to find relevant statutory articles for specific queries. This process is the basis of a wide range of legal applications such as legal advice, automated judicial decisions, legal document drafting, etc. Existing statute retrieval benchmarks focus on formal and professional queries from sources like bar exams and legal case documents, thereby neglecting non-professional quer…
▽ More
Statute retrieval aims to find relevant statutory articles for specific queries. This process is the basis of a wide range of legal applications such as legal advice, automated judicial decisions, legal document drafting, etc. Existing statute retrieval benchmarks focus on formal and professional queries from sources like bar exams and legal case documents, thereby neglecting non-professional queries from the general public, which often lack precise legal terminology and references. To address this gap, we introduce the STAtute Retrieval Dataset (STARD), a Chinese dataset comprising 1,543 query cases collected from real-world legal consultations and 55,348 candidate statutory articles. Unlike existing statute retrieval datasets, which primarily focus on professional legal queries, STARD captures the complexity and diversity of real queries from the general public. Through a comprehensive evaluation of various retrieval baselines, we reveal that existing retrieval approaches all fall short of these real queries issued by non-professional users. The best method only achieves a Recall@100 of 0.907, suggesting the necessity for further exploration and additional research in this area.
All the codes and datasets are available at: https://github.com/oneal2000/STARD/tree/main
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Meta Reasoning for Large Language Models
Authors:
Peizhong Gao,
Ao Xie,
Shaoguang Mao,
Wenshan Wu,
Yan Xia,
Haipeng Mi,
Furu Wei
Abstract:
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show promise but lack consistent state-of-the-art performance across diverse tasks due to their specialized nature. MRP addresses this limitation by guiding…
▽ More
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show promise but lack consistent state-of-the-art performance across diverse tasks due to their specialized nature. MRP addresses this limitation by guiding LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task, optimizing both performance and computational efficiency. With MRP, LLM reasoning operates in two phases. Initially, the LLM identifies the most appropriate reasoning method using task input cues and objective descriptions of available methods. Subsequently, it applies the chosen method to complete the task. This dynamic strategy mirrors human meta-reasoning, allowing the model to excel in a wide range of problem domains. We evaluate the effectiveness of MRP through comprehensive benchmarks. The results demonstrate that MRP achieves or approaches state-of-the-art performance across diverse tasks. MRP represents a significant advancement in enabling LLMs to identify cognitive challenges across problems and leverage benefits across different reasoning approaches, enhancing their ability to handle diverse and complex problem domains efficiently. Every LLM deserves a Meta-Reasoning Prompting to unlock its full potential and ensure adaptability in an ever-evolving landscape of challenges and applications.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Selective Undercut of Undoped Optical Membranes for Spin-Active Color Centers in 4H-SiC
Authors:
Jonathan R. Dietz,
Aaron M. Day,
Amberly Xie,
Evelyn L. Hu
Abstract:
Silicon carbide (SiC) is a semiconductor used in quantum information processing, microelectromechanical systems, photonics, power electronics, and harsh environment sensors. However, its high temperature stability, high breakdown voltage, wide bandgap, and high mechanical strength are accompanied by a chemical inertness which makes complex micromachining difficult. Photoelectrochemical etching is…
▽ More
Silicon carbide (SiC) is a semiconductor used in quantum information processing, microelectromechanical systems, photonics, power electronics, and harsh environment sensors. However, its high temperature stability, high breakdown voltage, wide bandgap, and high mechanical strength are accompanied by a chemical inertness which makes complex micromachining difficult. Photoelectrochemical etching is a simple, rapid means of wet processing SiC, including the use of dopant selective etch stops that take advantage of mature SiC homoepitaxy. However, dopant selective photoelectrochemical etching typically relies on highly doped material, which poses challenges for device applications such as quantum defects and photonics that benefit from low doping to produce robust emitter properties and high optical transparency. In this work, we develop a new, selective photoelectrochemical etching process that relies not on high doping but on the electrical depletion of a fabricated diode structure, allowing the selective etching of an n-doped substrate wafer versus an undoped epitaxial ($N_a=1(10)^{14}cm^{-3}$) device layer. We characterize the photo-response and photoelectrochemical etching behavior of the diode under bias and use those insights to suspend large ($>100μm^2$) undoped membranes of SiC. We further characterize the compatibility of membranes with quantum emitters, performing comparative spin spectroscopy between undoped and highly doped membrane structures, finding the use of undoped material improves ensemble spin lifetime by $>3x$. This work enables the fabrication of high-purity suspended thin films suitable for scalable photonics, mechanics, and quantum technologies in SiC.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Leveraging Human Revisions for Improving Text-to-Layout Models
Authors:
Amber Xie,
Chin-Yi Cheng,
Forrest Huang,
Yang Li
Abstract:
Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanc…
▽ More
Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanced feedback through the form of human revisions for stronger alignment. In this paper, we ask expert designers to fix layouts generated from a generative layout model that is pretrained on a large-scale dataset of mobile screens. Then, we train a reward model based on how human designers revise these generated layouts. With the learned reward model, we optimize our model with reinforcement learning from human feedback (RLHF). Our method, Revision-Aware Reward Models ($\method$), allows a generative text-to-layout model to produce more modern, designer-aligned layouts, showing the potential for utilizing human revisions and stronger forms of feedback in improving generative models.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Authors:
Yiqing Xie,
Alex Xie,
Divyanshu Sheth,
Pengfei Liu,
Daniel Fried,
Carolyn Rose
Abstract:
To adequately test modern code generation systems, evaluation benchmarks must execute and test the code generated by the system. However, these execution and testing requirements have largely limited benchmarks to settings where code is easily executable or has human-written tests. To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to…
▽ More
To adequately test modern code generation systems, evaluation benchmarks must execute and test the code generated by the system. However, these execution and testing requirements have largely limited benchmarks to settings where code is easily executable or has human-written tests. To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks from naturally occurring code sources. Specifically, we leverage a large language model (LLM) to sandbox arbitrary pieces of code into evaluation examples, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples involving 293 libraries converted from code in 367 GitHub repositories taken from the Code- SearchNet dataset. To demonstrate the solvability of examples in Exec-CSN, we present a human study demonstrating that 81.3% of the examples can be solved by humans and 61% are rated as "requires effort to solve". We conduct code generation experiments on open-source and proprietary models and analyze the performance of both humans and models. We provide code and data at: https://github.com/yiqingxyq/CodeBenchGen.
△ Less
Submitted 2 October, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Authors:
Alexander Khazatsky,
Karl Pertsch,
Suraj Nair,
Ashwin Balakrishna,
Sudeep Dasari,
Siddharth Karamcheti,
Soroush Nasiriany,
Mohan Kumar Srirama,
Lawrence Yunliang Chen,
Kirsty Ellis,
Peter David Fagan,
Joey Hejna,
Masha Itkina,
Marion Lepert,
Yecheng Jason Ma,
Patrick Tree Miller,
Jimmy Wu,
Suneel Belkhale,
Shivin Dass,
Huy Ha,
Arhan Jain,
Abraham Lee,
Youngwoon Lee,
Marius Memmel,
Sungjae Park
, et al. (74 additional authors not shown)
Abstract:
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu…
▽ More
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Efficient Data Collection for Robotic Manipulation via Compositional Generalization
Authors:
Jensen Gao,
Annie Xie,
Ted Xiao,
Chelsea Finn,
Dorsa Sadigh
Abstract:
Data collection has become an increasingly important problem in robotic manipulation, yet there still lacks much understanding of how to effectively collect data to facilitate broad generalization. Recent works on large-scale robotic data collection typically vary many environmental factors of variation (e.g., object types, table textures) during data collection, to cover a diverse range of scenar…
▽ More
Data collection has become an increasingly important problem in robotic manipulation, yet there still lacks much understanding of how to effectively collect data to facilitate broad generalization. Recent works on large-scale robotic data collection typically vary many environmental factors of variation (e.g., object types, table textures) during data collection, to cover a diverse range of scenarios. However, they do not explicitly account for the possible compositional abilities of policies trained on the data. If robot policies can compose environmental factors from their data to succeed when encountering unseen factor combinations, we can exploit this to avoid collecting data for situations that composition would address. To investigate this possibility, we conduct thorough empirical studies both in simulation and on a real robot that compare data collection strategies and assess whether visual imitation learning policies can compose environmental factors. We find that policies do exhibit composition, although leveraging prior robotic datasets is critical for this on a real robot. We use these insights to propose better in-domain data collection strategies that exploit composition, which can induce better generalization than naive approaches for the same amount of effort during data collection. We further demonstrate that a real robot policy trained on data from such a strategy achieves a success rate of 77.5% when transferred to entirely new environments that encompass unseen combinations of environmental factors, whereas policies trained using data collected without accounting for environmental variation fail to transfer effectively, with a success rate of only 2.5%. We provide videos at http://iliad.stanford.edu/robot-data-comp/.
△ Less
Submitted 21 May, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Autonomous vehicle decision and control through reinforcement learning with traffic flow randomization
Authors:
Yuan Lin,
Antai Xie,
Xiao Liu
Abstract:
Most of the current studies on autonomous vehicle decision-making and control tasks based on reinforcement learning are conducted in simulated environments. The training and testing of these studies are carried out under rule-based microscopic traffic flow, with little consideration of migrating them to real or near-real environments to test their performance. It may lead to a degradation in perfo…
▽ More
Most of the current studies on autonomous vehicle decision-making and control tasks based on reinforcement learning are conducted in simulated environments. The training and testing of these studies are carried out under rule-based microscopic traffic flow, with little consideration of migrating them to real or near-real environments to test their performance. It may lead to a degradation in performance when the trained model is tested in more realistic traffic scenes. In this study, we propose a method to randomize the driving style and behavior of surrounding vehicles by randomizing certain parameters of the car-following model and the lane-changing model of rule-based microscopic traffic flow in SUMO. We trained policies with deep reinforcement learning algorithms under the domain randomized rule-based microscopic traffic flow in freeway and merging scenes, and then tested them separately in rule-based microscopic traffic flow and high-fidelity microscopic traffic flow. Results indicate that the policy trained under domain randomization traffic flow has significantly better success rate and calculative reward compared to the models trained under other microscopic traffic flows.
△ Less
Submitted 19 April, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
A Communication-Efficient Stochastic Gradient Descent Algorithm for Distributed Nonconvex Optimization
Authors:
Antai Xie,
Xinlei Yi,
Xiaofan Wang,
Ming Cao,
Xiaoqiang Ren
Abstract:
This paper studies distributed nonconvex optimization problems with stochastic gradients for a multi-agent system, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. We propose a distributed stochastic gradient descent (SGD) algorithm, suitable for a general class of compressors. We show that the proposed algorithm achieves th…
▽ More
This paper studies distributed nonconvex optimization problems with stochastic gradients for a multi-agent system, in which each agent aims to minimize the sum of all agents' cost functions by using local compressed information exchange. We propose a distributed stochastic gradient descent (SGD) algorithm, suitable for a general class of compressors. We show that the proposed algorithm achieves the linear speedup convergence rate $\mathcal{O}(1/\sqrt{nT})$ for smooth nonconvex functions, where $T$ and $n$ are the number of iterations and agents, respectively. If the global cost function additionally satisfies the Polyak--Łojasiewicz condition, the proposed algorithm can linearly converge to a neighborhood of the global optimum, regardless of whether the stochastic gradient is unbiased or not. Numerical experiments are carried out to verify the efficiency of our algorithm.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Authors:
Soroush Nasiriany,
Fei Xia,
Wenhao Yu,
Ted Xiao,
Jacky Liang,
Ishita Dasgupta,
Annie Xie,
Danny Driess,
Ayzaan Wahid,
Zhuo Xu,
Quan Vuong,
Tingnan Zhang,
Tsang-Wei Edward Lee,
Kuang-Huei Lee,
Peng Xu,
Sean Kirmani,
Yuke Zhu,
Andy Zeng,
Karol Hausman,
Nicolas Heess,
Chelsea Finn,
Sergey Levine,
Brian Ichter
Abstract:
Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we ena…
▽ More
Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we enable VLMs to handle such settings without fine-tuning on task-specific data?
In this paper, we propose a novel visual prompting approach for VLMs that we call Prompting with Iterative Visual Optimization (PIVOT), which casts tasks as iterative visual question answering. In each iteration, the image is annotated with a visual representation of proposals that the VLM can refer to (e.g., candidate robot actions, localizations, or trajectories). The VLM then selects the best ones for the task. These proposals are iteratively refined, allowing the VLM to eventually zero in on the best available answer. We investigate PIVOT on real-world robotic navigation, real-world manipulation from images, instruction following in simulation, and additional spatial inference tasks such as localization. We find, perhaps surprisingly, that our approach enables zero-shot control of robotic systems without any robot training data, navigation in a variety of environments, and other capabilities. Although current performance is far from perfect, our work highlights potentials and limitations of this new regime and shows a promising approach for Internet-Scale VLMs in robotic and spatial reasoning domains. Website: pivot-prompt.github.io and HuggingFace: https://huggingface.co/spaces/pivot-prompt/pivot-prompt-demo.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Deep Learning-Based Correction and Unmixing of Hyperspectral Images for Brain Tumor Surgery
Authors:
David Black,
Jaidev Gill,
Andrew Xie,
Benoit Liquet,
Antonio Di leva,
Walter Stummer,
Eric Suero Molina
Abstract:
Hyperspectral Imaging (HSI) for fluorescence-guided brain tumor resection enables visualization of differences between tissues that are not distinguishable to humans. This augmentation can maximize brain tumor resection, improving patient outcomes. However, much of the processing in HSI uses simplified linear methods that are unable to capture the non-linear, wavelength-dependent phenomena that mu…
▽ More
Hyperspectral Imaging (HSI) for fluorescence-guided brain tumor resection enables visualization of differences between tissues that are not distinguishable to humans. This augmentation can maximize brain tumor resection, improving patient outcomes. However, much of the processing in HSI uses simplified linear methods that are unable to capture the non-linear, wavelength-dependent phenomena that must be modeled for accurate recovery of fluorophore abundances. We therefore propose two deep learning models for correction and unmixing, which can account for the nonlinear effects and produce more accurate estimates of abundances. Both models use an autoencoder-like architecture to process the captured spectra. One is trained with protoporphyrin IX (PpIX) concentration labels. The other undergoes semi-supervised training, first learning hyperspectral unmixing self-supervised and then learning to correct fluorescence emission spectra for heterogeneous optical and geometric properties using a reference white-light reflectance spectrum in a few-shot manner. The models were evaluated against phantom and pig brain data with known PpIX concentration; the supervised model achieved Pearson correlation coefficients (R values) between the known and computed PpIX concentrations of 0.997 and 0.990, respectively, whereas the classical approach achieved only 0.93 and 0.82. The semi-supervised approach's R values were 0.98 and 0.91, respectively. On human data, the semi-supervised model gives qualitatively more realistic results than the classical method, better removing bright spots of specular reflectance and reducing the variance in PpIX abundance over biopsies that should be relatively homogeneous. These results show promise for using deep learning to improve HSI in fluorescence-guided neurosurgery.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Authors:
Open X-Embodiment Collaboration,
Abby O'Neill,
Abdul Rehman,
Abhinav Gupta,
Abhiram Maddukuri,
Abhishek Gupta,
Abhishek Padalkar,
Abraham Lee,
Acorn Pooley,
Agrim Gupta,
Ajay Mandlekar,
Ajinkya Jain,
Albert Tung,
Alex Bewley,
Alex Herzog,
Alex Irpan,
Alexander Khazatsky,
Anant Rai,
Anchit Gupta,
Andrew Wang,
Andrey Kolobov,
Anikait Singh,
Animesh Garg,
Aniruddha Kembhavi,
Annie Xie
, et al. (267 additional authors not shown)
Abstract:
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method…
▽ More
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.
△ Less
Submitted 1 June, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
Authors:
Dacheng Li,
Rulin Shao,
Anze Xie,
Eric P. Xing,
Xuezhe Ma,
Ion Stoica,
Joseph E. Gonzalez,
Hao Zhang
Abstract:
FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU. In this paper, we introduce DISTFLASHATTN, a distributed memory-efficient attention mechanism optimized for long-context LLMs training. We propose three key techniques: token-level workload balancing, overlapping key-value communicatio…
▽ More
FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU. In this paper, we introduce DISTFLASHATTN, a distributed memory-efficient attention mechanism optimized for long-context LLMs training. We propose three key techniques: token-level workload balancing, overlapping key-value communication, and a rematerialization-aware gradient checkpointing algorithm. We evaluate DISTFLASHATTN on Llama-7B and variants with sequence lengths from 32K to 512K. DISTFLASHATTN achieves 8x longer sequences, 4.45 - 5.64x speedup compared to Ring Self-Attention, 2 - 8x longer sequences, 1.24 - 2.01x speedup compared to Megatron-LM with FlashAttention. It achieves 1.67x and 1.26 - 1.88x speedup compared to recent Ring Attention and DeepSpeed-Ulysses. Code is available at https://github.com/RulinShao/LightSeq.
△ Less
Submitted 31 March, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk
Authors:
Amanda Bertsch,
Alex Xie,
Graham Neubig,
Matthew R. Gormley
Abstract:
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates. It is a simple but powerful method: for an additional cost at inference time, MBR provides reliable several-point improvements across metrics for a wide variety of ta…
▽ More
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates. It is a simple but powerful method: for an additional cost at inference time, MBR provides reliable several-point improvements across metrics for a wide variety of tasks without any additional data or training. Despite this, MBR is not frequently applied in NLP works, and knowledge of the method itself is limited. We first provide an introduction to the method and the recent literature. We show that several recent methods that do not reference MBR can be written as special cases of MBR; this reformulation provides additional theoretical justification for the performance of these methods, explaining some results that were previously only empirical. We provide theoretical and empirical results about the effectiveness of various MBR variants and make concrete recommendations for the application of MBR in NLP models, including future directions in this area.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering
Authors:
Yike Wu,
Nan Hu,
Sheng Bi,
Guilin Qi,
Jie Ren,
Anhuan Xie,
Wei Song
Abstract:
Despite their competitive performance on knowledge-intensive tasks, large language models (LLMs) still have limitations in memorizing all world knowledge especially long tail knowledge. In this paper, we study the KG-augmented language model approach for solving the knowledge graph question answering (KGQA) task that requires rich world knowledge. Existing work has shown that retrieving KG knowled…
▽ More
Despite their competitive performance on knowledge-intensive tasks, large language models (LLMs) still have limitations in memorizing all world knowledge especially long tail knowledge. In this paper, we study the KG-augmented language model approach for solving the knowledge graph question answering (KGQA) task that requires rich world knowledge. Existing work has shown that retrieving KG knowledge to enhance LLMs prompting can significantly improve LLMs performance in KGQA. However, their approaches lack a well-formed verbalization of KG knowledge, i.e., they ignore the gap between KG representations and textual representations. To this end, we propose an answer-sensitive KG-to-Text approach that can transform KG knowledge into well-textualized statements most informative for KGQA. Based on this approach, we propose a KG-to-Text enhanced LLMs framework for solving the KGQA task. Experiments on several KGQA benchmarks show that the proposed KG-to-Text augmented LLMs approach outperforms previous KG-augmented LLMs approaches regarding answer accuracy and usefulness of knowledge statements.
△ Less
Submitted 21 September, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Language-Conditioned Path Planning
Authors:
Amber Xie,
Youngwoon Lee,
Pieter Abbeel,
Stephen James
Abstract:
Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where con…
▽ More
Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Language Reward Modulation for Pretraining Reinforcement Learning
Authors:
Ademi Adeniji,
Amber Xie,
Carmelo Sferrazza,
Younggyo Seo,
Stephen James,
Pieter Abbeel
Abstract:
Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose…
▽ More
Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose $\textbf{LA}$nguage Reward $\textbf{M}$odulated $\textbf{P}$retraining (LAMP) which leverages the zero-shot capabilities of Vision-Language Models (VLMs) as a $\textit{pretraining}$ utility for RL as opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to scalably generate noisy, albeit shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Differentially Private and Communication-Efficient Distributed Nonconvex Optimization Algorithms
Authors:
Antai Xie,
Xinlei Yi,
Xiaofan Wang,
Ming Cao,
Xiaoqiang Ren
Abstract:
This paper studies the privacy-preserving distributed optimization problem under limited communication, where each agent aims to keep its cost function private while minimizing the sum of all agents' cost functions. To this end, we propose two differentially private distributed algorithms under compressed communication. We show that the proposed algorithms achieve sublinear convergence for smooth…
▽ More
This paper studies the privacy-preserving distributed optimization problem under limited communication, where each agent aims to keep its cost function private while minimizing the sum of all agents' cost functions. To this end, we propose two differentially private distributed algorithms under compressed communication. We show that the proposed algorithms achieve sublinear convergence for smooth (possibly nonconvex) cost functions and linear convergence when the global cost function additionally satisfies the Polyak-Łojasiewicz condition, even for a general class of compressors with bounded relative compression error. Furthermore, we rigorously prove that the proposed algorithms ensure $ε$-differential privacy. Unlike methods in the literature, the analysis of privacy under the proposed algorithms do not rely on the specific forms of compressors. Simulations are presented to demonstrate the effectiveness of our proposed approach.
△ Less
Submitted 1 May, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation
Authors:
Annie Xie,
Lisa Lee,
Ted Xiao,
Chelsea Finn
Abstract:
What makes generalization hard for imitation learning in visual robotic manipulation? This question is difficult to approach at face value, but the environment from the perspective of a robot can often be decomposed into enumerable factors of variation, such as the lighting conditions or the placement of the camera. Empirically, generalization to some of these factors have presented a greater obst…
▽ More
What makes generalization hard for imitation learning in visual robotic manipulation? This question is difficult to approach at face value, but the environment from the perspective of a robot can often be decomposed into enumerable factors of variation, such as the lighting conditions or the placement of the camera. Empirically, generalization to some of these factors have presented a greater obstacle than others, but existing work sheds little light on precisely how much each factor contributes to the generalization gap. Towards an answer to this question, we study imitation learning policies in simulation and on a real robot language-conditioned manipulation task to quantify the difficulty of generalization to different (sets of) factors. We also design a new simulated benchmark of 19 tasks with 11 factors of variation to facilitate more controlled evaluations of generalization. From our study, we determine an ordering of factors based on generalization difficulty, that is consistent across simulation and our real robot setup.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Authors:
Jonathan N. Lee,
Annie Xie,
Aldo Pacchiano,
Yash Chandak,
Chelsea Finn,
Ofir Nachum,
Emma Brunskill
Abstract:
Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce…
▽ More
Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action given a query state and an in-context dataset of interactions, across a diverse set of tasks. This procedure, while simple, produces a model with several surprising capabilities. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline, despite not being explicitly trained to do so. The model also generalizes beyond the pretraining distribution to new tasks and automatically adapts its decision-making strategies to unknown structure. Theoretically, we show DPT can be viewed as an efficient implementation of Bayesian posterior sampling, a provably sample-efficient RL algorithm. We further leverage this connection to provide guarantees on the regret of the in-context algorithm yielded by DPT, and prove that it can learn faster than algorithms used to generate the pretraining data. These results suggest a promising yet simple path towards instilling strong in-context decision-making abilities in transformers.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Compressed Differentially Private Distributed Optimization with Linear Convergence
Authors:
Antai Xie,
Xinlei Yi,
Xiaofan Wang,
Ming Cao,
Xiaoqiang Ren
Abstract:
This paper addresses the problem of differentially private distributed optimization under limited communication, where each agent aims to keep their cost function private while minimizing the sum of all agents' cost functions. In response, we propose a novel Compressed differentially Private distributed Gradient Tracking algorithm (CPGT). We demonstrate that CPGT achieves linear convergence for sm…
▽ More
This paper addresses the problem of differentially private distributed optimization under limited communication, where each agent aims to keep their cost function private while minimizing the sum of all agents' cost functions. In response, we propose a novel Compressed differentially Private distributed Gradient Tracking algorithm (CPGT). We demonstrate that CPGT achieves linear convergence for smooth and strongly convex cost functions, even with a class of biased but contractive compressors, and achieves the same accuracy as the idealized communication algorithm. Additionally, we rigorously prove that CPGT ensures differential privacy. Simulations are provided to validate the effectiveness of the proposed algorithm.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Molecular Property Prediction by Semantic-invariant Contrastive Learning
Authors:
Ziqiao Zhang,
Ailin Xie,
Jihong Guan,
Shuigeng Zhou
Abstract:
Contrastive learning have been widely used as pretext tasks for self-supervised pre-trained molecular representation learning models in AI-aided drug design and discovery. However, exiting methods that generate molecular views by noise-adding operations for contrastive learning may face the semantic inconsistency problem, which leads to false positive pairs and consequently poor prediction perform…
▽ More
Contrastive learning have been widely used as pretext tasks for self-supervised pre-trained molecular representation learning models in AI-aided drug design and discovery. However, exiting methods that generate molecular views by noise-adding operations for contrastive learning may face the semantic inconsistency problem, which leads to false positive pairs and consequently poor prediction performance. To address this problem, in this paper we first propose a semantic-invariant view generation method by properly breaking molecular graphs into fragment pairs. Then, we develop a Fragment-based Semantic-Invariant Contrastive Learning (FraSICL) model based on this view generation method for molecular property prediction. The FraSICL model consists of two branches to generate representations of views for contrastive learning, meanwhile a multi-view fusion and an auxiliary similarity loss are introduced to make better use of the information contained in different fragment-pair views. Extensive experiments on various benchmark datasets show that with the least number of pre-training samples, FraSICL can achieve state-of-the-art performance, compared with major existing counterpart models.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Activity Cliff Prediction: Dataset and Benchmark
Authors:
Ziqiao Zhang,
Bangyi Zhao,
Ailin Xie,
Yatao Bian,
Shuigeng Zhou
Abstract:
Activity cliffs (ACs), which are generally defined as pairs of structurally similar molecules that are active against the same bio-target but significantly different in the binding potency, are of great importance to drug discovery. Up to date, the AC prediction problem, i.e., to predict whether a pair of molecules exhibit the AC relationship, has not yet been fully explored. In this paper, we fir…
▽ More
Activity cliffs (ACs), which are generally defined as pairs of structurally similar molecules that are active against the same bio-target but significantly different in the binding potency, are of great importance to drug discovery. Up to date, the AC prediction problem, i.e., to predict whether a pair of molecules exhibit the AC relationship, has not yet been fully explored. In this paper, we first introduce ACNet, a large-scale dataset for AC prediction. ACNet curates over 400K Matched Molecular Pairs (MMPs) against 190 targets, including over 20K MMP-cliffs and 380K non-AC MMPs, and provides five subsets for model development and evaluation. Then, we propose a baseline framework to benchmark the predictive performance of molecular representations encoded by deep neural networks for AC prediction, and 16 models are evaluated in experiments. Our experimental results show that deep learning models can achieve good performance when the models are trained on tasks with adequate amount of data, while the imbalanced, low-data and out-of-distribution features of the ACNet dataset still make it challenging for deep neural networks to cope with. In addition, the traditional ECFP method shows a natural advantage on MMP-cliff prediction, and outperforms other deep learning models on most of the data subsets. To the best of our knowledge, our work constructs the first large-scale dataset for AC prediction, which may stimulate the study of AC prediction models and prompt further breakthroughs in AI-aided drug discovery. The codes and dataset can be accessed by https://drugai.github.io/ACNet/.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
Authors:
Ajay Jain,
Amber Xie,
Pieter Abbeel
Abstract:
Diffusion models have shown impressive results in text-to-image synthesis. Using massive datasets of captioned images, diffusion models learn to generate raster images of highly diverse objects and scenes. However, designers frequently use vector representations of images like Scalable Vector Graphics (SVGs) for digital icons or art. Vector graphics can be scaled to any size, and are compact. We s…
▽ More
Diffusion models have shown impressive results in text-to-image synthesis. Using massive datasets of captioned images, diffusion models learn to generate raster images of highly diverse objects and scenes. However, designers frequently use vector representations of images like Scalable Vector Graphics (SVGs) for digital icons or art. Vector graphics can be scaled to any size, and are compact. We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics. We do so without access to large datasets of captioned SVGs. By optimizing a differentiable vector graphics rasterizer, our method, VectorFusion, distills abstract semantic knowledge out of a pretrained diffusion model. Inspired by recent text-to-3D work, we learn an SVG consistent with a caption using Score Distillation Sampling. To accelerate generation and improve fidelity, VectorFusion also initializes from an image sample. Experiments show greater quality than prior work, and demonstrate a range of styles including pixel art and sketches. See our project webpage at https://ajayj.com/vectorfusion .
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data
Authors:
John So,
Amber Xie,
Sunggoo Jung,
Jeffrey Edlund,
Rohan Thakker,
Ali Agha-mohammadi,
Pieter Abbeel,
Stephen James
Abstract:
Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying vis…
▽ More
Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying visual sim-to-real techniques has worked well for robot manipulation, deploying beyond controlled workspace viewpoints remains a challenge. In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data. This is done by learning to translate randomized simulation images into simulated segmentation and depth maps, subsequently enabling real-world images to also be translated. This allows us to train an end-to-end RL policy in simulation, and directly deploy in the real-world. Our approach, which can be trained in 48 hours on 1 GPU, can perform equally as well as a classical perception and control stack that took thousands of engineering hours over several months to build. We hope this work motivates future end-to-end autonomous driving research.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Flying Trot Control Method for Quadruped Robot Based on Trajectory Planning
Authors:
Hongge Wang,
Hui Chai,
Bin Chen,
Aizhen Xie,
Rui Song,
Bo Su
Abstract:
An intuitive control method for the flying trot, which combines offline trajectory planning with real-time balance control, is presented. The motion features of running animals in the vertical direction were analysed using the spring-load-inverted-pendulum (SLIP) model, and the foot trajectory of the robot was planned, so the robot could run similar to an animal capable of vertical flight, accordi…
▽ More
An intuitive control method for the flying trot, which combines offline trajectory planning with real-time balance control, is presented. The motion features of running animals in the vertical direction were analysed using the spring-load-inverted-pendulum (SLIP) model, and the foot trajectory of the robot was planned, so the robot could run similar to an animal capable of vertical flight, according to the given height and speed of the trunk. To improve the robustness of running, a posture control method based on a foot acceleration adjustment is proposed. A novel kinematic based CoM observation method and CoM regulation method is present to enhance the stability of locomotion. To reduce the impact force when the robot interacts with the environment, the virtual model control method is used in the control of the foot trajectory to achieve active compliance. By selecting the proper parameters for the virtual model, the oscillation motion of the virtual model and the planning motion of the support foot are synchronized to avoid the large disturbance caused by the oscillation motion of the virtual model in relation to the robot motion. The simulation and experiment using the quadruped robot Billy are reported. In the experiment, the maximum speed of the robot could reach 4.73 times the body length per second, which verified the feasibility of the control method.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning
Authors:
Annie Xie,
Fahim Tajwar,
Archit Sharma,
Chelsea Finn
Abstract:
A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world. A critical challenge to such autonomy is the presence of irreversible states which require external assistance to recover from, such as when a robot arm has pushed an object off of a table. While standard agents require constant monitoring to decide when to intervene, we aim to des…
▽ More
A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world. A critical challenge to such autonomy is the presence of irreversible states which require external assistance to recover from, such as when a robot arm has pushed an object off of a table. While standard agents require constant monitoring to decide when to intervene, we aim to design proactive agents that can request human intervention only when needed. To this end, we propose an algorithm that efficiently learns to detect and avoid states that are irreversible, and proactively asks for help in case the agent does enter them. On a suite of continuous control environments with unknown irreversible states, we find that our algorithm exhibits better sample- and intervention-efficiency compared to existing methods. Our code is publicly available at https://sites.google.com/view/proactive-interventions
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Skill-Based Reinforcement Learning with Intrinsic Reward Matching
Authors:
Ademi Adeniji,
Amber Xie,
Pieter Abbeel
Abstract:
While unsupervised skill discovery has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill pretraining and downstream, task-aware finetuning. We present Intrinsic Reward Matching (IRM), which unifies these two phases of learning via the $\textit{skill discriminator}$, a pretraining model component often discard…
▽ More
While unsupervised skill discovery has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill pretraining and downstream, task-aware finetuning. We present Intrinsic Reward Matching (IRM), which unifies these two phases of learning via the $\textit{skill discriminator}$, a pretraining model component often discarded during finetuning. Conventional approaches finetune pretrained agents directly at the policy level, often relying on expensive environment rollouts to empirically determine the optimal skill. However, often the most concise yet complete description of a task is the reward function itself, and skill learning methods learn an $\textit{intrinsic}$ reward function via the discriminator that corresponds to the skill policy. We propose to leverage the skill discriminator to $\textit{match}$ the intrinsic and downstream task rewards and determine the optimal skill for an unseen task without environment samples, consequently finetuning with greater sample-efficiency. Furthermore, we generalize IRM to sequence skills for complex, long-horizon tasks and demonstrate that IRM enables us to utilize pretrained skills far more effectively than previous skill selection methods on both the Fetch tabletop and Franka Kitchen robot manipulation benchmarks.
△ Less
Submitted 25 May, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Can Pre-trained Models Really Learn Better Molecular Representations for AI-aided Drug Discovery?
Authors:
Ziqiao Zhang,
Yatao Bian,
Ailin Xie,
Pengju Han,
Long-Kai Huang,
Shuigeng Zhou
Abstract:
Self-supervised pre-training is gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pre-trained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations have not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in tr…
▽ More
Self-supervised pre-training is gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pre-trained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations have not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship (QSAR) analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pre-trained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pre-trained models are analyzed. The results indicate that the state-of-the-art pre-trained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints (ECFP), while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pre-trained models. And our findings can guide the community to develop better pre-training techniques to regularize the occurrence of ACs and SH.
△ Less
Submitted 21 August, 2022;
originally announced September 2022.
-
Sensitivity Analysis on Transferred Neural Architectures of BERT and GPT-2 for Financial Sentiment Analysis
Authors:
Tracy Qian,
Andy Xie,
Camille Bruckmann
Abstract:
The explosion in novel NLP word embedding and deep learning techniques has induced significant endeavors into potential applications. One of these directions is in the financial sector. Although there is a lot of work done in state-of-the-art models like GPT and BERT, there are relatively few works on how well these methods perform through fine-tuning after being pre-trained, as well as info on ho…
▽ More
The explosion in novel NLP word embedding and deep learning techniques has induced significant endeavors into potential applications. One of these directions is in the financial sector. Although there is a lot of work done in state-of-the-art models like GPT and BERT, there are relatively few works on how well these methods perform through fine-tuning after being pre-trained, as well as info on how sensitive their parameters are. We investigate the performance and sensitivity of transferred neural architectures from pre-trained GPT-2 and BERT models. We test the fine-tuning performance based on freezing transformer layers, batch size, and learning rate. We find the parameters of BERT are hypersensitive to stochasticity in fine-tuning and that GPT-2 is more stable in such practice. It is also clear that the earlier layers of GPT-2 and BERT contain essential word pattern information that should be maintained.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Nonparaxiality-triggered Landau-Zener transition in topological photonic waveguides
Authors:
An Xie,
Shaodong Zhou,
Kelei Xi,
Li Ding,
Yiming Pan,
Yongguan Ke,
Huaiqiang Wang,
Songlin Zhuang,
Qingqing Cheng
Abstract:
Photonic lattices have been widely used for simulating quantum physics, owing to the similar evolutions of paraxial waves and quantum particles. However, nonparaxial wave propagations in photonic lattices break the paradigm of the quantum-optical analogy. Here, we reveal that nonparaxiality exerts stretched and compressed forces on the energy spectrum in the celebrated Aubry-Andre-Harper model. By…
▽ More
Photonic lattices have been widely used for simulating quantum physics, owing to the similar evolutions of paraxial waves and quantum particles. However, nonparaxial wave propagations in photonic lattices break the paradigm of the quantum-optical analogy. Here, we reveal that nonparaxiality exerts stretched and compressed forces on the energy spectrum in the celebrated Aubry-Andre-Harper model. By exploring the mini-gaps induced by the finite size of the different effects of nonparaxiality, we experimentally present that the expansion of one band gap supports the adiabatic transfer of boundary states while Landau-Zener transition occurs at the narrowing of the other gap, whereas identical transport behaviors are expected for the two gaps under paraxial approximation. Our results not only serve as a foundation of future studies of dynamic state transfer but also inspire applications leveraging nonparaxial transitions as a new degree of freedom.
△ Less
Submitted 7 May, 2022;
originally announced May 2022.
-
Robust Policy Learning over Multiple Uncertainty Sets
Authors:
Annie Xie,
Shagun Sodhani,
Chelsea Finn,
Joelle Pineau,
Amy Zhang
Abstract:
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally d…
▽ More
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods provide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.
△ Less
Submitted 4 March, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
NoFADE: Analyzing Diminishing Returns on CO2 Investment
Authors:
Andre Fu,
Justin Tran,
Andy Xie,
Jonathan Spraggett,
Elisa Ding,
Chang-Won Lee,
Kanav Singla,
Mahdi S. Hosseini,
Konstantinos N. Plataniotis
Abstract:
Climate change continues to be a pressing issue that currently affects society at-large. It is important that we as a society, including the Computer Vision (CV) community take steps to limit our impact on the environment. In this paper, we (a) analyze the effect of diminishing returns on CV methods, and (b) propose a \textit{``NoFADE''}: a novel entropy-based metric to quantify model--dataset--co…
▽ More
Climate change continues to be a pressing issue that currently affects society at-large. It is important that we as a society, including the Computer Vision (CV) community take steps to limit our impact on the environment. In this paper, we (a) analyze the effect of diminishing returns on CV methods, and (b) propose a \textit{``NoFADE''}: a novel entropy-based metric to quantify model--dataset--complexity relationships. We show that some CV tasks are reaching saturation, while others are almost fully saturated. In this light, NoFADE allows the CV community to compare models and datasets on a similar basis, establishing an agnostic platform.
△ Less
Submitted 28 November, 2021;
originally announced November 2021.
-
Influencing Towards Stable Multi-Agent Interactions
Authors:
Woodrow Z. Wang,
Andy Shih,
Annie Xie,
Dorsa Sadigh
Abstract:
Learning in multi-agent environments is difficult due to the non-stationarity introduced by an opponent's or partner's changing behaviors. Instead of reactively adapting to the other agent's (opponent or partner) behavior, we propose an algorithm to proactively influence the other agent's strategy to stabilize -- which can restrain the non-stationarity caused by the other agent. We learn a low-dim…
▽ More
Learning in multi-agent environments is difficult due to the non-stationarity introduced by an opponent's or partner's changing behaviors. Instead of reactively adapting to the other agent's (opponent or partner) behavior, we propose an algorithm to proactively influence the other agent's strategy to stabilize -- which can restrain the non-stationarity caused by the other agent. We learn a low-dimensional latent representation of the other agent's strategy and the dynamics of how the latent strategy evolves with respect to our robot's behavior. With this learned dynamics model, we can define an unsupervised stability reward to train our robot to deliberately influence the other agent to stabilize towards a single strategy. We demonstrate the effectiveness of stabilizing in improving efficiency of maximizing the task reward in a variety of simulated environments, including autonomous driving, emergent communication, and robotic manipulation. We show qualitative results on our website: https://sites.google.com/view/stable-marl/.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Lifelong Robotic Reinforcement Learning by Retaining Experiences
Authors:
Annie Xie,
Chelsea Finn
Abstract:
Multi-task learning ideally allows robots to acquire a diverse repertoire of useful skills. However, many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times. In reality, the tasks that the robot learns arrive sequentially, depending on the user and the robot's current environment. In this work, we study a practical sequential multi-task RL probl…
▽ More
Multi-task learning ideally allows robots to acquire a diverse repertoire of useful skills. However, many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times. In reality, the tasks that the robot learns arrive sequentially, depending on the user and the robot's current environment. In this work, we study a practical sequential multi-task RL problem that is motivated by the practical constraints of physical robotic systems, and derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set. In a series of simulated robotic manipulation experiments, our approach requires less than half the samples than learning each task from scratch, while avoiding impractical round-robin data collection. On a Franka Emika Panda robot arm, our approach incrementally learns ten challenging tasks, including bottle capping and block insertion.
△ Less
Submitted 6 April, 2022; v1 submitted 19 September, 2021;
originally announced September 2021.
-
Uncertainty Quantified Deep Learning for Predicting Dice Coefficient of Digital Histopathology Image Segmentation
Authors:
Sambuddha Ghosal,
Audrey Xie,
Pratik Shah
Abstract:
Deep learning models (DLMs) can achieve state of the art performance in medical image segmentation and classification tasks. However, DLMs that do not provide feedback for their predictions such as Dice coefficients (Dice) have limited deployment potential in real world clinical settings. Uncertainty estimates can increase the trust of these automated systems by identifying predictions that need f…
▽ More
Deep learning models (DLMs) can achieve state of the art performance in medical image segmentation and classification tasks. However, DLMs that do not provide feedback for their predictions such as Dice coefficients (Dice) have limited deployment potential in real world clinical settings. Uncertainty estimates can increase the trust of these automated systems by identifying predictions that need further review but remain computationally prohibitive to deploy. In this study, we use a DLM with randomly initialized weights and Monte Carlo dropout (MCD) to segment tumors from microscopic Hematoxylin and Eosin (H&E) dye stained prostate core biopsy RGB images. We devise a novel approach that uses multiple clinical region based uncertainties from a single image (instead of the entire image) to predict Dice of the DLM model output by linear models. Image level uncertainty maps were generated and showed correspondence between imperfect model segmentation and high levels of uncertainty associated with specific prostate tissue regions with or without tumors. Results from this study suggest that linear models can learn coefficients of uncertainty quantified deep learning and correlations ((Spearman's correlation (p<0.05)) to predict Dice scores of specific regions of medical images.
△ Less
Submitted 31 August, 2021;
originally announced September 2021.
-
Investigation of Edge States in Artificial Graphene Nano-Flakes
Authors:
Qiushi Zhang,
Tszchun Wu,
Guowen Kuang,
Ayu Xie,
Nian Lin
Abstract:
Graphene nano-flakes (GNFs) are predicted to host spin-polarized metallic edge states, which are envisioned for exploration of spintronics at the nanometer scale. To date, experimental realization of GNFs is only in its infancy because of the limitation of precise cutting or synthesizing methods at the nanometer scale. Here, we use low temperature scanning tunneling microscope (STM) to manipulate…
▽ More
Graphene nano-flakes (GNFs) are predicted to host spin-polarized metallic edge states, which are envisioned for exploration of spintronics at the nanometer scale. To date, experimental realization of GNFs is only in its infancy because of the limitation of precise cutting or synthesizing methods at the nanometer scale. Here, we use low temperature scanning tunneling microscope (STM) to manipulate coronene molecules on a Cu(111) surface to build artificial triangular and hexagonal GNFs with either zigzag or armchair type of edges. We observe that the metallic edge states only exist in the GNFs with zigzag edge and localize at the most outside one type of the sublattice. The experimental results agree well with the tight-binding calculations. To our knowledge, our work renders the first systematic experimental confirmation of the predicated electronic properties of the GNFs.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Learning Latent Representations to Influence Multi-Agent Interaction
Authors:
Annie Xie,
Dylan P. Losey,
Ryan Tolsma,
Chelsea Finn,
Dorsa Sadigh
Abstract:
Seamlessly interacting with humans or robots is hard because these agents are non-stationary. They update their policy in response to the ego agent's behavior, and the ego agent must anticipate these changes to co-adapt. Inspired by humans, we recognize that robots do not need to explicitly model every low-level action another agent will make; instead, we can capture the latent strategy of other a…
▽ More
Seamlessly interacting with humans or robots is hard because these agents are non-stationary. They update their policy in response to the ego agent's behavior, and the ego agent must anticipate these changes to co-adapt. Inspired by humans, we recognize that robots do not need to explicitly model every low-level action another agent will make; instead, we can capture the latent strategy of other agents through high-level representations. We propose a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy. The ego agent then leverages these latent dynamics to influence the other agent, purposely guiding them towards policies suitable for co-adaptation. Across several simulated domains and a real-world air hockey game, our approach outperforms the alternatives and learns to influence the other agent.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Efficient Learning of Control Policies for Robust Quadruped Bounding using Pretrained Neural Networks
Authors:
Zhicheng Wang,
Anqiao Li,
Yixiao Zheng,
Anhuan Xie,
Zhibin Li,
Jun Wu,
Qiuguo Zhu
Abstract:
Bounding is one of the important gaits in quadrupedal locomotion for negotiating obstacles. The authors proposed an effective approach that can learn robust bounding gaits more efficiently despite its large variation in dynamic body movements. The authors first pretrained the neural network (NN) based on data from a robot operated by conventional model based controllers, and then further optimised…
▽ More
Bounding is one of the important gaits in quadrupedal locomotion for negotiating obstacles. The authors proposed an effective approach that can learn robust bounding gaits more efficiently despite its large variation in dynamic body movements. The authors first pretrained the neural network (NN) based on data from a robot operated by conventional model based controllers, and then further optimised the pretrained NN via deep reinforcement learning (DRL). In particular, the authors designed a reward function considering contact points and phases to enforce the gait symmetry and periodicity, which improved the bounding performance. The NN based feedback controller was learned in the simulation and directly deployed on the real quadruped robot Jueying Mini successfully. A variety of environments are presented both indoors and outdoors with the authors approach. The authors approach shows efficient computing and good locomotion results by the Jueying Mini quadrupedal robot bounding over uneven terrain.
△ Less
Submitted 29 October, 2023; v1 submitted 1 November, 2020;
originally announced November 2020.
-
Deep Reinforcement Learning amidst Lifelong Non-Stationarity
Authors:
Annie Xie,
James Harrison,
Chelsea Finn
Abstract:
As humans, our goals and our environment are persistently changing throughout our lifetime based on our experiences, actions, and internal and external drives. In contrast, typical reinforcement learning problem set-ups consider decision processes that are stationary across episodes. Can we develop reinforcement learning algorithms that can cope with the persistent change in the former, more reali…
▽ More
As humans, our goals and our environment are persistently changing throughout our lifetime based on our experiences, actions, and internal and external drives. In contrast, typical reinforcement learning problem set-ups consider decision processes that are stationary across episodes. Can we develop reinforcement learning algorithms that can cope with the persistent change in the former, more realistic problem settings? While on-policy algorithms such as policy gradients in principle can be extended to non-stationary settings, the same cannot be said for more efficient off-policy algorithms that replay past experiences when learning. In this work, we formalize this problem setting, and draw upon ideas from the online learning and probabilistic inference literature to derive an off-policy RL algorithm that can reason about and tackle such lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy RL with this representation. We further introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
Learning Predictive Models From Observation and Interaction
Authors:
Karl Schmeckpeper,
Annie Xie,
Oleh Rybkin,
Stephen Tian,
Kostas Daniilidis,
Sergey Levine,
Chelsea Finn
Abstract:
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works, and then use this learned model to plan coordinated sequences of actions to bring about desired outcomes. However, learning a model that captures the dynamics of complex skills represents a major challenge: if the agent needs a good model to perform these skills, it migh…
▽ More
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works, and then use this learned model to plan coordinated sequences of actions to bring about desired outcomes. However, learning a model that captures the dynamics of complex skills represents a major challenge: if the agent needs a good model to perform these skills, it might never be able to collect the experience on its own that is required to learn these delicate and complex behaviors. Instead, we can imagine augmenting the training set with observational data of other agents, such as humans. Such data is likely more plentiful, but represents a different embodiment. For example, videos of humans might show a robot how to use a tool, but (i) are not annotated with suitable robot actions, and (ii) contain a systematic distributional shift due to the embodiment differences between humans and robots. We address the first challenge by formulating the corresponding graphical model and treating the action as an observed variable for the interaction data and an unobserved variable for the observation data, and the second challenge by using a domain-dependent prior. In addition to interaction data, our method is able to leverage videos of passive observations in a driving dataset and a dataset of robotic manipulation videos. A robotic planning agent equipped with our method can learn to use tools in a tabletop robotic manipulation setting by observing humans without ever seeing a robotic video of tool use.
△ Less
Submitted 29 December, 2019;
originally announced December 2019.