Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xue Bin Peng

Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

Jan 30, 2024

Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

Abstract:This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.

Via

Access Paper or Ask Questions

Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

Jan 16, 2024

Mathis Petrovich, Or Litany, Umar Iqbal, Michael J. Black, Gül Varol, Xue Bin Peng, Davis Rempe

Figure 1 for Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

Figure 2 for Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

Figure 3 for Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

Figure 4 for Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation

Abstract:Recent advances in generative modeling have led to promising progress on synthesizing 3D human motion from text, with methods that can generate character animations from short prompts and specified durations. However, using a single text prompt as input lacks the fine-grained control needed by animators, such as composing multiple actions and defining precise durations for parts of the motion. To address this, we introduce the new problem of timeline control for text-driven motion synthesis, which provides an intuitive, yet fine-grained, input interface for users. Instead of a single prompt, users can specify a multi-track timeline of multiple prompts organized in temporal intervals that may overlap. This enables specifying the exact timings of each action and composing multiple actions in sequence or at overlapping intervals. To generate composite animations from a multi-track timeline, we propose a new test-time denoising method. This method can be integrated with any pre-trained motion diffusion model to synthesize realistic motions that accurately reflect the timeline. At every step of denoising, our method processes each timeline interval (text prompt) individually, subsequently aggregating the predictions with consideration for the specific body parts engaged in each action. Experimental comparisons and ablations validate that our method produces realistic motions that respect the semantics and timing of given text prompts. Our code and models are publicly available at https://mathis.petrovich.fr/stmc.

* Project page: https://mathis.petrovich.fr/stmc

Via

Access Paper or Ask Questions

Trajeglish: Learning the Language of Driving Scenarios

Dec 07, 2023

Jonah Philion, Xue Bin Peng, Sanja Fidler

Figure 1 for Trajeglish: Learning the Language of Driving Scenarios

Figure 2 for Trajeglish: Learning the Language of Driving Scenarios

Figure 3 for Trajeglish: Learning the Language of Driving Scenarios

Figure 4 for Trajeglish: Learning the Language of Driving Scenarios

Abstract:A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using a small vocabulary. We then model the multi-agent sequence of motion tokens with a GPT-like encoder-decoder that is autoregressive in time and takes into account intra-timestep interaction between agents. Scenarios sampled from our model exhibit state-of-the-art realism; our model tops the Waymo Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy and partial autonomy settings, and show that the representations learned by our model can quickly be adapted to improve performance on nuScenes. We additionally evaluate the scalability of our model with respect to parameter count and dataset size, and use density estimates from our model to quantify the saliency of context length and intra-timestep interaction for the traffic modeling task.

* Preprint

Via

Access Paper or Ask Questions

Video Prediction Models as Rewards for Reinforcement Learning

May 23, 2023

Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel

Figure 1 for Video Prediction Models as Rewards for Reinforcement Learning

Figure 2 for Video Prediction Models as Rewards for Reinforcement Learning

Figure 3 for Video Prediction Models as Rewards for Reinforcement Learning

Figure 4 for Video Prediction Models as Rewards for Reinforcement Learning

Abstract:Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet. We present Video Prediction Rewards (VIPER), an algorithm that leverages pretrained video prediction models as action-free reward signals for reinforcement learning. Specifically, we first train an autoregressive transformer on expert videos and then use the video prediction likelihoods as reward signals for a reinforcement learning agent. VIPER enables expert-level control without programmatic task rewards across a wide range of DMC, Atari, and RLBench tasks. Moreover, generalization of the video prediction model allows us to derive rewards for an out-of-distribution environment where no expert data is available, enabling cross-embodiment generalization for tabletop manipulation. We see our work as starting point for scalable reward specification from unlabeled videos that will benefit from the rapid advances in generative modeling. Source code and datasets are available on the project website: https://escontrela.me

* 20 pages, 15 figures, 4 tables. under review

Via

Access Paper or Ask Questions

CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

May 02, 2023

Chen Tessler, Yoni Kasten, Yunrong Guo, Shie Mannor, Gal Chechik, Xue Bin Peng

Figure 1 for CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

Figure 2 for CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

Figure 3 for CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

Figure 4 for CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

Abstract:In this work, we present Conditional Adversarial Latent Models (CALM), an approach for generating diverse and directable behaviors for user-controlled interactive virtual characters. Using imitation learning, CALM learns a representation of movement that captures the complexity and diversity of human motion, and enables direct control over character movements. The approach jointly learns a control policy and a motion encoder that reconstructs key characteristics of a given motion without merely replicating it. The results show that CALM learns a semantic motion representation, enabling control over the generated motions and style-conditioning for higher-level task training. Once trained, the character can be controlled using intuitive interfaces, akin to those found in video games.

* Accepted to SIGGRAPH 2023

Via

Access Paper or Ask Questions

Learning and Adapting Agile Locomotion Skills by Transferring Experience

Apr 19, 2023

Laura Smith, J. Chase Kew, Tianyu Li, Linda Luu, Xue Bin Peng, Sehoon Ha, Jie Tan, Sergey Levine

Figure 1 for Learning and Adapting Agile Locomotion Skills by Transferring Experience

Figure 2 for Learning and Adapting Agile Locomotion Skills by Transferring Experience

Figure 3 for Learning and Adapting Agile Locomotion Skills by Transferring Experience

Figure 4 for Learning and Adapting Agile Locomotion Skills by Transferring Experience

Abstract:Legged robots have enormous potential in their range of capabilities, from navigating unstructured terrains to high-speed running. However, designing robust controllers for highly agile dynamic motions remains a substantial challenge for roboticists. Reinforcement learning (RL) offers a promising data-driven approach for automatically training such controllers. However, exploration in these high-dimensional, underactuated systems remains a significant hurdle for enabling legged robots to learn performant, naturalistic, and versatile agility skills. We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. To leverage controllers we can acquire in practice, we design this framework to be flexible in terms of their source -- that is, the controllers may have been optimized for a different objective under different dynamics, or may require different knowledge of the surroundings -- and thus may be highly suboptimal for the target task. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments. We also demonstrate that the agile behaviors learned in this way are graceful and safe enough to deploy in the real world.

* Project website: https://sites.google.com/berkeley.edu/twirl

Via

Access Paper or Ask Questions

RoboPianist: A Benchmark for High-Dimensional Robot Control

Apr 09, 2023

Kevin Zakka, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel

Figure 1 for RoboPianist: A Benchmark for High-Dimensional Robot Control

Figure 2 for RoboPianist: A Benchmark for High-Dimensional Robot Control

Figure 3 for RoboPianist: A Benchmark for High-Dimensional Robot Control

Figure 4 for RoboPianist: A Benchmark for High-Dimensional Robot Control

Abstract:We introduce a new benchmarking suite for high-dimensional control, targeted at testing high spatial and temporal precision, coordination, and planning, all with an underactuated system frequently making-and-breaking contacts. The proposed challenge is mastering the piano through bi-manual dexterity, using a pair of simulated anthropomorphic robot hands. We call it RoboPianist, and the initial version covers a broad set of 150 variable-difficulty songs. We investigate both model-free and model-based methods on the benchmark, characterizing their performance envelopes. We observe that while certain existing methods, when well-tuned, can achieve impressive levels of performance in certain aspects, there is significant room for improvement. RoboPianist provides a rich quantitative benchmarking environment, with human-interpretable results, high ease of expansion by simply augmenting the repertoire with new songs, and opportunities for further research, including in multi-task learning, zero-shot generalization, multimodal (sound, vision, touch) learning, and imitation. Supplementary information, including videos of our control policies, can be found at https://kzakka.com/robopianist/

Via

Access Paper or Ask Questions

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Apr 04, 2023

Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, Or Litany

Figure 1 for Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Figure 2 for Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Figure 3 for Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Figure 4 for Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion

Abstract:We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. We draw on recent advances in guided diffusion modeling to achieve test-time controllability of trajectories, which is normally only associated with rule-based systems. Our guided diffusion model allows users to constrain trajectories through target waypoints, speed, and specified social groups while accounting for the surrounding environment context. This trajectory diffusion model is integrated with a novel physics-based humanoid controller to form a closed-loop, full-body pedestrian animation system capable of placing large crowds in a simulated environment with varying terrains. We further propose utilizing the value function learned during RL training of the animation controller to guide diffusion to produce trajectories better suited for particular scenarios such as collision avoidance and traversing uneven terrain. Video results are available on the project page at https://nv-tlabs.github.io/trace-pace .

* Conference on Computer Vision and Pattern Recognition (CVPR) 2023

Via

Access Paper or Ask Questions

Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Feb 19, 2023

Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

Figure 1 for Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Figure 2 for Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Figure 3 for Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Figure 4 for Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Abstract:This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world. We present a multi-task reinforcement learning framework to train the robot to accomplish a large variety of jumping tasks, such as jumping to different locations and directions. To improve performance on these challenging tasks, we develop a new policy structure that encodes the robot's long-term input/output (I/O) history while also providing direct access to its short-term I/O history. In order to train a versatile multi-task policy, we utilize a multi-stage training scheme that includes different training stages for different objectives. After multi-stage training, the multi-task policy can be directly transferred to Cassie, a physical bipedal robot. Training on different tasks and exploring more diverse scenarios leads to highly robust policies that can exploit the diverse set of learned skills to recover from perturbations or poor landings during real-world deployment. Such robustness in the proposed multi-task policy enables Cassie to succeed in completing a variety of challenging jump tasks in the real world, such as standing long jumps, jumping onto elevated platforms, and multi-axis jumps.

* Accompanying video is at https://youtu.be/aAPSZ2QFB-E

Via

Access Paper or Ask Questions

Synthesizing Physical Character-Scene Interactions

Feb 02, 2023

Mohamed Hassan, Yunrong Guo, Tingwu Wang, Michael Black, Sanja Fidler, Xue Bin Peng

Abstract:Movement is how people interact with and affect their environment. For realistic character animation, it is necessary to synthesize such interactions between virtual characters and their surroundings. Despite recent progress in character animation using machine learning, most systems focus on controlling an agent's movements in fairly simple and homogeneous environments, with limited interactions with other objects. Furthermore, many previous approaches that synthesize human-scene interactions require significant manual labeling of the training data. In contrast, we present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters that perform scene interaction tasks in a natural and life-like manner. Our method learns scene interaction behaviors from large unstructured motion datasets, without manual annotation of the motion data. These scene interactions are learned using an adversarial discriminator that evaluates the realism of a motion within the context of a scene. The key novelty involves conditioning both the discriminator and the policy networks on scene context. We demonstrate the effectiveness of our approach through three challenging scene interaction tasks: carrying, sitting, and lying down, which require coordination of a character's movements in relation to objects in the environment. Our policies learn to seamlessly transition between different behaviors like idling, walking, and sitting. By randomizing the properties of the objects and their placements during training, our method is able to generalize beyond the objects and scenarios depicted in the training dataset, producing natural character-scene interactions for a wide variety of object shapes and placements. The approach takes physics-based character motion generation a step closer to broad applicability.

Via

Access Paper or Ask Questions