Alert button
Picture for Allen Z. Ren

Allen Z. Ren

Alert button

Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

Jul 04, 2023
Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, Anirudha Majumdar

Figure 1 for Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Figure 2 for Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Figure 3 for Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Figure 4 for Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. Website: https://robot-help.github.io

* Under review 
Viaarxiv icon

Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Apr 10, 2023
Zihan Ding, Yuanpei Chen, Allen Z. Ren, Shixiang Shane Gu, Hao Dong, Chi Jin

Figure 1 for Learning a Universal Human Prior for Dexterous Manipulation from Human Preference
Figure 2 for Learning a Universal Human Prior for Dexterous Manipulation from Human Preference
Figure 3 for Learning a Universal Human Prior for Dexterous Manipulation from Human Preference
Figure 4 for Learning a Universal Human Prior for Dexterous Manipulation from Human Preference

Generating human-like behavior on robots is a great challenge especially in dexterous manipulation tasks with robotic hands. Even in simulation with no sample constraints, scripting controllers is intractable due to high degrees of freedom, and manual reward engineering can also be hard and lead to non-realistic motions. Leveraging the recent progress on Reinforcement Learning from Human Feedback (RLHF), we propose a framework to learn a universal human prior using direct human preference feedback over videos, for efficiently tuning the RL policy on 20 dual-hand robot manipulation tasks in simulation, without a single human demonstration. One task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories; it is then applied for regularizing the behavior of polices in the fine-tuning stage. Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks, indicating its generalization capability.

Viaarxiv icon

AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer

Feb 09, 2023
Allen Z. Ren, Hongkai Dai, Benjamin Burchfiel, Anirudha Majumdar

Figure 1 for AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer
Figure 2 for AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer
Figure 3 for AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer
Figure 4 for AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer

Simulation parameter settings such as contact models and object geometry approximations are critical to training robust robotic policies capable of transferring from simulation to real-world deployment. Previous approaches typically handcraft distributions over such parameters (domain randomization), or identify parameters that best match the dynamics of the real environment (system identification). However, there is often an irreducible gap between simulation and reality: attempting to match the dynamics between simulation and reality across all states and tasks may be infeasible and may not lead to policies that perform well in reality for a specific task. Addressing this issue, we propose AdaptSim, a new task-driven adaptation framework for sim-to-real transfer that aims to optimize task performance in target (real) environments -- instead of matching dynamics between simulation and reality. First, we meta-learn an adaptation policy in simulation using reinforcement learning for adjusting the simulation parameter distribution based on the current policy's performance in a target environment. We then perform iterative real-world adaptation by inferring new simulation parameter distributions for policy training, using a small amount of real data. We perform experiments in three robotic tasks: (1) swing-up of linearized double pendulum, (2) dynamic table-top pushing of a bottle, and (3) dynamic scooping of food pieces with a spatula. Our extensive simulation and hardware experiments demonstrate AdaptSim achieving 1-3x asymptotic performance and $\sim$2x real data efficiency when adapting to different environments, compared to methods based on Sys-ID and directly training the task policy in target environments.

* Under review 
Viaarxiv icon

FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors

Oct 12, 2022
Nathaniel Simon, Allen Z. Ren, Alexander Piqué, David Snyder, Daphne Barretto, Marcus Hultmark, Anirudha Majumdar

Figure 1 for FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors
Figure 2 for FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors
Figure 3 for FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors
Figure 4 for FlowDrone: Wind Estimation and Gust Rejection on UAVs Using Fast-Response Hot-Wire Flow Sensors

Unmanned aerial vehicles (UAVs) are finding use in applications that place increasing emphasis on robustness to external disturbances including extreme wind. However, traditional multirotor UAV platforms do not directly sense wind; conventional flow sensors are too slow, insensitive, or bulky for widespread integration on UAVs. Instead, drones typically observe the effects of wind indirectly through accumulated errors in position or trajectory tracking. In this work, we integrate a novel flow sensor based on micro-electro-mechanical systems (MEMS) hot-wire technology developed in our prior work onto a multirotor UAV for wind estimation. These sensors are omnidirectional, lightweight, fast, and accurate. In order to achieve superior tracking performance in windy conditions, we train a `wind-aware' residual-based controller via reinforcement learning using simulated wind gusts and their aerodynamic effects on the drone. In extensive hardware experiments, we demonstrate the wind-aware controller outperforming two strong `wind-unaware' baseline controllers in challenging windy conditions.

* submitted to ICRA 2023 
Viaarxiv icon

Leveraging Language for Accelerated Learning of Tool Manipulation

Jun 27, 2022
Allen Z. Ren, Bharat Govil, Tsung-Yen Yang, Karthik Narasimhan, Anirudha Majumdar

Figure 1 for Leveraging Language for Accelerated Learning of Tool Manipulation
Figure 2 for Leveraging Language for Accelerated Learning of Tool Manipulation
Figure 3 for Leveraging Language for Accelerated Learning of Tool Manipulation
Figure 4 for Leveraging Language for Accelerated Learning of Tool Manipulation

Robust and generalized tool manipulation requires an understanding of the properties and affordances of different tools. We investigate whether linguistic information about a tool (e.g., its geometry, common uses) can help control policies adapt faster to new tools for a given task. We obtain diverse descriptions of various tools in natural language and use pre-trained language models to generate their feature representations. We then perform language-conditioned meta-learning to learn policies that can efficiently adapt to new tools given their corresponding text descriptions. Our results demonstrate that combining linguistic information and meta-learning significantly accelerates tool learning in several manipulation tasks including pushing, lifting, sweeping, and hammering.

Viaarxiv icon

Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

Feb 11, 2022
Alec Farid, David Snyder, Allen Z. Ren, Anirudha Majumdar

Figure 1 for Failure Prediction with Statistical Guarantees for Vision-Based Robot Control
Figure 2 for Failure Prediction with Statistical Guarantees for Vision-Based Robot Control
Figure 3 for Failure Prediction with Statistical Guarantees for Vision-Based Robot Control
Figure 4 for Failure Prediction with Statistical Guarantees for Vision-Based Robot Control

We are motivated by the problem of performing failure prediction for safety-critical robotic systems with high-dimensional sensor observations (e.g., vision). Given access to a blackbox control policy (e.g., in the form of a neural network) and a dataset of training environments, we present an approach for synthesizing a failure predictor with guaranteed bounds on false-positive and false-negative errors. In order to achieve this, we utilize techniques from Probably Approximately Correct (PAC)-Bayes generalization theory. In addition, we present novel class-conditional bounds that allow us to tradeoff the relative rates of false-positive vs. false-negative errors. We propose algorithms that train failure predictors (that take as input the history of sensor observations) by minimizing our theoretical error bounds. We demonstrate the resulting approach using extensive simulation and hardware experiments for vision-based navigation with a drone and grasping objects with a robotic manipulator equipped with a wrist-mounted RGB-D camera. These experiments illustrate the ability of our approach to (1) provide strong bounds on failure prediction error rates (that closely match empirical error rates), and (2) improve safety by predicting failures.

Viaarxiv icon

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Feb 10, 2022
Kai-Chieh Hsu, Allen Z. Ren, Duy Phuong Nguyen, Anirudha Majumdar, Jaime F. Fisac

Figure 1 for Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees
Figure 2 for Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees
Figure 3 for Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees
Figure 4 for Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the reach-avoid Bellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments including a photo-realistic one. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.

* Preprint submitted to Special Issue on Risk-aware Autonomous Systems: Theory and Practice, Artificial Intelligence Journal 
Viaarxiv icon

Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data

Nov 16, 2021
Abhinav Agarwal, Sushant Veer, Allen Z. Ren, Anirudha Majumdar

Figure 1 for Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data
Figure 2 for Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data
Figure 3 for Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data
Figure 4 for Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data

We are motivated by the problem of learning policies for robotic systems with rich sensory inputs (e.g., vision) in a manner that allows us to guarantee generalization to environments unseen during training. We provide a framework for providing such generalization guarantees by leveraging a finite dataset of real-world environments in combination with a (potentially inaccurate) generative model of environments. The key idea behind our approach is to utilize the generative model in order to implicitly specify a prior over policies. This prior is updated using the real-world dataset of environments by minimizing an upper bound on the expected cost across novel environments derived via Probably Approximately Correct (PAC)-Bayes generalization theory. We demonstrate our approach on two simulated systems with nonlinear/hybrid dynamics and rich sensing modalities: (i) quadrotor navigation with an onboard vision sensor, and (ii) grasping objects using a depth sensor. Comparisons with prior work demonstrate the ability of our approach to obtain stronger generalization guarantees by utilizing generative models. We also present hardware experiments for validating our bounds for the grasping task.

Viaarxiv icon

Distributionally Robust Policy Learning via Adversarial Environment Generation

Jul 13, 2021
Allen Z. Ren, Anirudha Majumdar

Figure 1 for Distributionally Robust Policy Learning via Adversarial Environment Generation
Figure 2 for Distributionally Robust Policy Learning via Adversarial Environment Generation
Figure 3 for Distributionally Robust Policy Learning via Adversarial Environment Generation
Figure 4 for Distributionally Robust Policy Learning via Adversarial Environment Generation

Our goal is to train control policies that generalize well to unseen environments. Inspired by the Distributionally Robust Optimization (DRO) framework, we propose DRAGEN - Distributionally Robust policy learning via Adversarial Generation of ENvironments - for iteratively improving robustness of policies to realistic distribution shifts by generating adversarial environments. The key idea is to learn a generative model for environments whose latent variables capture cost-predictive and realistic variations in environments. We perform DRO with respect to a Wasserstein ball around the empirical distribution of environments by generating realistic adversarial environments via gradient ascent on the latent space. We demonstrate strong Out-of-Distribution (OoD) generalization in simulation for (i) swinging up a pendulum with onboard vision and (ii) grasping realistic 2D/3D objects. Grasping experiments on hardware demonstrate better sim2real performance compared to domain randomization.

Viaarxiv icon