Alert button
Picture for Youngwoon Lee

Youngwoon Lee

Alert button

Language-Conditioned Path Planning

Aug 31, 2023
Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James

Figure 1 for Language-Conditioned Path Planning
Figure 2 for Language-Conditioned Path Planning
Figure 3 for Language-Conditioned Path Planning
Figure 4 for Language-Conditioned Path Planning

Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.

* Conference on Robot Learning, 2023 
Viaarxiv icon

Video Prediction Models as Rewards for Reinforcement Learning

May 23, 2023
Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel

Figure 1 for Video Prediction Models as Rewards for Reinforcement Learning
Figure 2 for Video Prediction Models as Rewards for Reinforcement Learning
Figure 3 for Video Prediction Models as Rewards for Reinforcement Learning
Figure 4 for Video Prediction Models as Rewards for Reinforcement Learning

Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet. We present Video Prediction Rewards (VIPER), an algorithm that leverages pretrained video prediction models as action-free reward signals for reinforcement learning. Specifically, we first train an autoregressive transformer on expert videos and then use the video prediction likelihoods as reward signals for a reinforcement learning agent. VIPER enables expert-level control without programmatic task rewards across a wide range of DMC, Atari, and RLBench tasks. Moreover, generalization of the video prediction model allows us to derive rewards for an out-of-distribution environment where no expert data is available, enabling cross-embodiment generalization for tabletop manipulation. We see our work as starting point for scalable reward specification from unlabeled videos that will benefit from the rapid advances in generative modeling. Source code and datasets are available on the project website: https://escontrela.me

* 20 pages, 15 figures, 4 tables. under review 
Viaarxiv icon

FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation

May 22, 2023
Minho Heo, Youngwoon Lee, Doohyun Lee, Joseph J. Lim

Figure 1 for FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation
Figure 2 for FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation
Figure 3 for FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation
Figure 4 for FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation

Reinforcement learning (RL), imitation learning (IL), and task and motion planning (TAMP) have demonstrated impressive performance across various robotic manipulation tasks. However, these approaches have been limited to learning simple behaviors in current real-world manipulation benchmarks, such as pushing or pick-and-place. To enable more complex, long-horizon behaviors of an autonomous robot, we propose to focus on real-world furniture assembly, a complex, long-horizon robot manipulation task that requires addressing many current robotic manipulation challenges to solve. We present FurnitureBench, a reproducible real-world furniture assembly benchmark aimed at providing a low barrier for entry and being easily reproducible, so that researchers across the world can reliably test their algorithms and compare them against prior work. For ease of use, we provide 200+ hours of pre-collected data (5000+ demonstrations), 3D printable furniture models, a robotic environment setup guide, and systematic task initialization. Furthermore, we provide FurnitureSim, a fast and realistic simulator of FurnitureBench. We benchmark the performance of offline RL and IL algorithms on our assembly tasks and demonstrate the need to improve such algorithms to be able to solve our tasks in the real world, providing ample opportunities for future research.

* Robotics: Science and Systems (RSS) 2023. Website: https://clvrai.com/furniture-bench 
Viaarxiv icon

Controllability-Aware Unsupervised Skill Discovery

Feb 13, 2023
Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel

Figure 1 for Controllability-Aware Unsupervised Skill Discovery
Figure 2 for Controllability-Aware Unsupervised Skill Discovery
Figure 3 for Controllability-Aware Unsupervised Skill Discovery
Figure 4 for Controllability-Aware Unsupervised Skill Discovery

One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision. However, the current unsupervised skill discovery methods are often limited to acquiring simple, easy-to-learn skills due to the lack of incentives to discover more complex, challenging behaviors. We introduce a novel unsupervised skill discovery method, Controllability-aware Skill Discovery (CSD), which actively seeks complex, hard-to-control skills without supervision. The key component of CSD is a controllability-aware distance function, which assigns larger values to state transitions that are harder to achieve with the current skills. Combined with distance-maximizing skill discovery, CSD progressively learns more challenging skills over the course of training as our jointly trained distance function reduces rewards for easy-to-achieve skills. Our experimental results in six robotic manipulation and locomotion environments demonstrate that CSD can discover diverse complex skills including object manipulation and locomotion skills with no supervision, significantly outperforming prior unsupervised skill discovery methods. Videos and code are available at https://seohong.me/projects/csd/

Viaarxiv icon

PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

Dec 09, 2022
Shivin Dass, Karl Pertsch, Hejia Zhang, Youngwoon Lee, Joseph J. Lim, Stefanos Nikolaidis

Figure 1 for PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection
Figure 2 for PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection
Figure 3 for PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection
Figure 4 for PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower as each operator can control only a single robot at a time. To make this costly data collection process efficient and scalable, we propose Policy Assisted TeleOperation (PATO), a system which automates part of the demonstration collection process using a learned assistive policy. PATO autonomously executes repetitive behaviors in data collection and asks for human input only when it is uncertain about which subtask or behavior to execute. We conduct teleoperation user studies both with a real robot and a simulated robot fleet and demonstrate that our assisted teleoperation system reduces human operators' mental load while improving data collection efficiency. Further, it enables a single operator to control multiple robots in parallel, which is a first step towards scalable robotic data collection. For code and video results, see https://clvrai.com/pato

* Website: https://clvrai.com/pato 
Viaarxiv icon

Skill-based Model-based Reinforcement Learning

Jul 15, 2022
Lucy Xiaoyang Shi, Joseph J. Lim, Youngwoon Lee

Figure 1 for Skill-based Model-based Reinforcement Learning
Figure 2 for Skill-based Model-based Reinforcement Learning
Figure 3 for Skill-based Model-based Reinforcement Learning
Figure 4 for Skill-based Model-based Reinforcement Learning

Model-based reinforcement learning (RL) is a sample-efficient way of learning complex behaviors by leveraging a learned single-step dynamics model to plan actions in imagination. However, planning every action for long-horizon tasks is not practical, akin to a human planning out every muscle movement. Instead, humans efficiently plan with high-level skills to solve complex tasks. From this intuition, we propose a Skill-based Model-based RL framework (SkiMo) that enables planning in the skill space using a skill dynamics model, which directly predicts the skill outcomes, rather than predicting all small details in the intermediate states, step by step. For accurate and efficient long-term planning, we jointly learn the skill dynamics model and a skill repertoire from prior experience. We then harness the learned skill dynamics model to accurately simulate and plan over long horizons in the skill space, which enables efficient downstream learning of long-horizon, sparse reward tasks. Experimental results in navigation and manipulation domains show that SkiMo extends the temporal horizon of model-based approaches and improves the sample efficiency for both model-based RL and skill-based RL. Code and videos are available at \url{https://clvrai.com/skimo}

* Website: \url{https://clvrai.com/skimo} 
Viaarxiv icon

Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization

Nov 15, 2021
Youngwoon Lee, Joseph J. Lim, Anima Anandkumar, Yuke Zhu

Figure 1 for Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization
Figure 2 for Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization
Figure 3 for Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization
Figure 4 for Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization

Skill chaining is a promising approach for synthesizing complex behaviors by sequentially combining previously learned skills. Yet, a naive composition of skills fails when a policy encounters a starting state never seen during its training. For successful skill chaining, prior approaches attempt to widen the policy's starting state distribution. However, these approaches require larger state distributions to be covered as more policies are sequenced, and thus are limited to short skill sequences. In this paper, we propose to chain multiple policies without excessively large initial state distributions by regularizing the terminal state distributions in an adversarial learning framework. We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly. Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks; whereas prior skill chaining approaches fail. The code and videos are available at https://clvrai.com/skill-chaining

* Published at the Conference on Robot Learning (CoRL) 2021 
Viaarxiv icon

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

Nov 11, 2021
I-Chun Arthur Liu, Shagun Uppal, Gaurav S. Sukhatme, Joseph J. Lim, Peter Englert, Youngwoon Lee

Figure 1 for Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation
Figure 2 for Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation
Figure 3 for Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation
Figure 4 for Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

Learning complex manipulation tasks in realistic, obstructed environments is a challenging problem due to hard exploration in the presence of obstacles and high-dimensional visual observations. Prior work tackles the exploration problem by integrating motion planning and reinforcement learning. However, the motion planner augmented policy requires access to state information, which is often not available in the real-world settings. To this end, we propose to distill a state-based motion planner augmented policy to a visual control policy via (1) visual behavioral cloning to remove the motion planner dependency along with its jittery motion, and (2) vision-based reinforcement learning with the guidance of the smoothed trajectories from the behavioral cloning agent. We evaluate our method on three manipulation tasks in obstructed environments and compare it against various reinforcement learning and imitation learning baselines. The results demonstrate that our framework is highly sample-efficient and outperforms the state-of-the-art algorithms. Moreover, coupled with domain randomization, our policy is capable of zero-shot transfer to unseen environment settings with distractors. Code and videos are available at https://clvrai.com/mopa-pd

* Published at the Conference on Robot Learning (CoRL) 2021 
Viaarxiv icon

Demonstration-Guided Reinforcement Learning with Learned Skills

Jul 21, 2021
Karl Pertsch, Youngwoon Lee, Yue Wu, Joseph J. Lim

Figure 1 for Demonstration-Guided Reinforcement Learning with Learned Skills
Figure 2 for Demonstration-Guided Reinforcement Learning with Learned Skills
Figure 3 for Demonstration-Guided Reinforcement Learning with Learned Skills
Figure 4 for Demonstration-Guided Reinforcement Learning with Learned Skills

Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every new task as an independent learning problem and attempt to follow the provided demonstrations step-by-step, akin to a human trying to imitate a completely unseen behavior by following the demonstrator's exact muscle movements. Naturally, such learning will be slow, but often new behaviors are not completely unseen: they share subtasks with behaviors we have previously learned. In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL. We first learn a set of reusable skills from large offline datasets of prior experience collected across many tasks. We then propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations by following the demonstrated skills instead of the primitive actions, resulting in substantial performance improvements over prior demonstration-guided RL approaches. We validate the effectiveness of our approach on long-horizon maze navigation and complex robot manipulation tasks.

Viaarxiv icon

Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding

Jul 01, 2021
Grace Zhang, Linghan Zhong, Youngwoon Lee, Joseph J. Lim

Figure 1 for Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding
Figure 2 for Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding
Figure 3 for Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding
Figure 4 for Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding

The ability to transfer a policy from one environment to another is a promising avenue for efficient robot learning in realistic settings where task supervision is not available. This can allow us to take advantage of environments well suited for training, such as simulators or laboratories, to learn a policy for a real robot in a home or office. To succeed, such policy transfer must overcome both the visual domain gap (e.g. different illumination or background) and the dynamics domain gap (e.g. different robot calibration or modelling error) between source and target environments. However, prior policy transfer approaches either cannot handle a large domain gap or can only address one type of domain gap at a time. In this paper, we propose a novel policy transfer method with iterative "environment grounding", IDAPT, that alternates between (1) directly minimizing both visual and dynamics domain gaps by grounding the source environment in the target environment domains, and (2) training a policy on the grounded source environment. This iterative training progressively aligns the domains between the two environments and adapts the policy to the target environment. Once trained, the policy can be directly executed on the target environment. The empirical results on locomotion and robotic manipulation tasks demonstrate that our approach can effectively transfer a policy across visual and dynamics domain gaps with minimal supervision and interaction with the target environment. Videos and code are available at https://clvrai.com/idapt .

* Robotics: Science and Systems (RSS), 2021 
Viaarxiv icon