Alert button
Picture for Laura Smith

Laura Smith

Alert button

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

Nov 02, 2023
Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously learned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing and adapting relevant behaviors on-the-fly.

* 19 pages, 6 figures 
Viaarxiv icon

Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion

Oct 26, 2023
Laura Smith, Yunhao Cao, Sergey Levine

Deep reinforcement learning (RL) can enable robots to autonomously acquire complex behaviors, such as legged locomotion. However, RL in the real world is complicated by constraints on efficiency, safety, and overall training stability, which limits its practical applicability. We present APRL, a policy regularization framework that modulates the robot's exploration over the course of training, striking a balance between flexible improvement potential and focused, efficient exploration. APRL enables a quadrupedal robot to efficiently learn to walk entirely in the real world within minutes and continue to improve with more training where prior work saturates in performance. We demonstrate that continued training with APRL results in a policy that is substantially more capable of navigating challenging situations and is able to adapt to changes in dynamics with continued training.

* First two authors contributed equally. Project website: https://sites.google.com/berkeley.edu/aprl 
Viaarxiv icon

Learning and Adapting Agile Locomotion Skills by Transferring Experience

Apr 19, 2023
Laura Smith, J. Chase Kew, Tianyu Li, Linda Luu, Xue Bin Peng, Sehoon Ha, Jie Tan, Sergey Levine

Figure 1 for Learning and Adapting Agile Locomotion Skills by Transferring Experience
Figure 2 for Learning and Adapting Agile Locomotion Skills by Transferring Experience
Figure 3 for Learning and Adapting Agile Locomotion Skills by Transferring Experience
Figure 4 for Learning and Adapting Agile Locomotion Skills by Transferring Experience

Legged robots have enormous potential in their range of capabilities, from navigating unstructured terrains to high-speed running. However, designing robust controllers for highly agile dynamic motions remains a substantial challenge for roboticists. Reinforcement learning (RL) offers a promising data-driven approach for automatically training such controllers. However, exploration in these high-dimensional, underactuated systems remains a significant hurdle for enabling legged robots to learn performant, naturalistic, and versatile agility skills. We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. To leverage controllers we can acquire in practice, we design this framework to be flexible in terms of their source -- that is, the controllers may have been optimized for a different objective under different dynamics, or may require different knowledge of the surroundings -- and thus may be highly suboptimal for the target task. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments. We also demonstrate that the agile behaviors learned in this way are graceful and safe enough to deploy in the real world.

* Project website: https://sites.google.com/berkeley.edu/twirl 
Viaarxiv icon

RoboPianist: A Benchmark for High-Dimensional Robot Control

Apr 09, 2023
Kevin Zakka, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel

Figure 1 for RoboPianist: A Benchmark for High-Dimensional Robot Control
Figure 2 for RoboPianist: A Benchmark for High-Dimensional Robot Control
Figure 3 for RoboPianist: A Benchmark for High-Dimensional Robot Control
Figure 4 for RoboPianist: A Benchmark for High-Dimensional Robot Control

We introduce a new benchmarking suite for high-dimensional control, targeted at testing high spatial and temporal precision, coordination, and planning, all with an underactuated system frequently making-and-breaking contacts. The proposed challenge is mastering the piano through bi-manual dexterity, using a pair of simulated anthropomorphic robot hands. We call it RoboPianist, and the initial version covers a broad set of 150 variable-difficulty songs. We investigate both model-free and model-based methods on the benchmark, characterizing their performance envelopes. We observe that while certain existing methods, when well-tuned, can achieve impressive levels of performance in certain aspects, there is significant room for improvement. RoboPianist provides a rich quantitative benchmarking environment, with human-interpretable results, high ease of expansion by simply augmenting the repertoire with new songs, and opportunities for further research, including in multi-task learning, zero-shot generalization, multimodal (sound, vision, touch) learning, and imitation. Supplementary information, including videos of our control policies, can be found at https://kzakka.com/robopianist/

Viaarxiv icon

Efficient Online Reinforcement Learning with Offline Data

Feb 15, 2023
Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine

Figure 1 for Efficient Online Reinforcement Learning with Offline Data
Figure 2 for Efficient Online Reinforcement Learning with Offline Data
Figure 3 for Efficient Online Reinforcement Learning with Offline Data
Figure 4 for Efficient Online Reinforcement Learning with Offline Data

Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: can we simply apply existing off-policy methods to leverage offline data when learning online? In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a $\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.

Viaarxiv icon

A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Aug 16, 2022
Laura Smith, Ilya Kostrikov, Sergey Levine

Figure 1 for A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Figure 2 for A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Figure 3 for A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Figure 4 for A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge. Unfortunately, due to sample inefficiency, deep RL applications have primarily focused on simulated environments. In this work, we demonstrate that the recent advancements in machine learning algorithms and libraries combined with a carefully tuned robot controller lead to learning quadruped locomotion in only 20 minutes in the real world. We evaluate our approach on several indoor and outdoor terrains which are known to be challenging for classical model-based controllers. We observe the robot to be able to learn walking gait consistently on all of these terrains. Finally, we evaluate our design decisions in a simulated environment.

* First two authors contributed equally. Project website: https://sites.google.com/berkeley.edu/walk-in-the-park 
Viaarxiv icon

B-Pref: Benchmarking Preference-Based Reinforcement Learning

Nov 04, 2021
Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel

Figure 1 for B-Pref: Benchmarking Preference-Based Reinforcement Learning
Figure 2 for B-Pref: Benchmarking Preference-Based Reinforcement Learning
Figure 3 for B-Pref: Benchmarking Preference-Based Reinforcement Learning
Figure 4 for B-Pref: Benchmarking Preference-Based Reinforcement Learning

Reinforcement learning (RL) requires access to a reward function that incentivizes the right behavior, but these are notoriously hard to specify for complex tasks. Preference-based RL provides an alternative: learning policies using a teacher's preferences without pre-defined rewards, thus overcoming concerns associated with reward engineering. However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark. In this paper, we introduce B-Pref: a benchmark specially designed for preference-based RL. A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly, which makes relying on real human input for evaluation prohibitive. At the same time, simulating human input as giving perfect preferences for the ground truth reward function is unrealistic. B-Pref alleviates this by simulating teachers with a wide array of irrationalities, and proposes metrics not solely for performance but also for robustness to these potential irrationalities. We showcase the utility of B-Pref by using it to analyze algorithmic design choices, such as selecting informative queries, for state-of-the-art preference-based RL algorithms. We hope that B-Pref can serve as a common starting point to study preference-based RL more systematically. Source code is available at https://github.com/rll-research/B-Pref.

* NeurIPS Datasets and Benchmarks Track 2021. Code is available at https://github.com/rll-research/B-Pref 
Viaarxiv icon

Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Oct 11, 2021
Laura Smith, J. Chase Kew, Xue Bin Peng, Sehoon Ha, Jie Tan, Sergey Levine

Figure 1 for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World
Figure 2 for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World
Figure 3 for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World
Figure 4 for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged robots are physically capable of traversing a wide range of challenging environments, but designing controllers that are sufficiently robust to handle this diversity has been a long-standing challenge in robotics. Reinforcement learning presents an appealing approach for automating the controller design process and has been able to produce remarkably robust controllers when trained in a suitable range of environments. However, it is difficult to predict all likely conditions the robot will encounter during deployment and enumerate them at training-time. What if instead of training controllers that are robust enough to handle any eventuality, we enable the robot to continually learn in any setting it finds itself in? This kind of real-world reinforcement learning poses a number of challenges, including efficiency, safety, and autonomy. To address these challenges, we propose a practical robot reinforcement learning system for fine-tuning locomotion policies in the real world. We demonstrate that a modest amount of real-world training can substantially improve performance during deployment, and this enables a real A1 quadrupedal robot to autonomously fine-tune multiple locomotion skills in a range of environments, including an outdoor lawn and a variety of indoor terrains.

* Project website: https://sites.google.com/berkeley.edu/fine-tuning-locomotion 
Viaarxiv icon

Offline Meta-Reinforcement Learning with Online Self-Supervision

Jul 19, 2021
Vitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, Sergey Levine

Figure 1 for Offline Meta-Reinforcement Learning with Online Self-Supervision
Figure 2 for Offline Meta-Reinforcement Learning with Online Self-Supervision
Figure 3 for Offline Meta-Reinforcement Learning with Online Self-Supervision
Figure 4 for Offline Meta-Reinforcement Learning with Online Self-Supervision

Meta-reinforcement learning (RL) can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We do not want to remove this distributional shift by simply adopting a conservative exploration strategy, because learning an exploration strategy enables an agent to collect better data for faster adaptation. Instead, we propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. By not requiring reward labels for online collection, this data can be much cheaper to collect. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional unsupervised online data collection leads to a dramatic improvement in the adaptive capabilities of the meta-trained policies, matching the performance of fully online meta-RL on a range of challenging domains that require generalization to new tasks.

* 10 pages, 6 figures 
Viaarxiv icon