Alert button
Picture for Zipeng Fu

Zipeng Fu

Alert button

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

Nov 02, 2023
Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously learned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing and adapting relevant behaviors on-the-fly.

* 19 pages, 6 figures 
Viaarxiv icon

Robot Parkour Learning

Sep 12, 2023
Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christopher Atkeson, Soeren Schwertfeger, Chelsea Finn, Hang Zhao

Figure 1 for Robot Parkour Learning
Figure 2 for Robot Parkour Learning
Figure 3 for Robot Parkour Learning
Figure 4 for Robot Parkour Learning

Parkour is a grand challenge for legged locomotion that requires robots to overcome various obstacles rapidly in complex environments. Existing methods can generate either diverse but blind locomotion skills or vision-based but specialized skills by using reference animal data or complex rewards. However, autonomous parkour requires robots to learn generalizable skills that are both vision-based and diverse to perceive and react to various scenarios. In this work, we propose a system for learning a single end-to-end vision-based parkour policy of diverse parkour skills using a simple reward without any reference motion data. We develop a reinforcement learning method inspired by direct collocation to generate parkour skills, including climbing over high obstacles, leaping over large gaps, crawling beneath low barriers, squeezing through thin slits, and running. We distill these skills into a single vision-based parkour policy and transfer it to a quadrupedal robot using its egocentric depth camera. We demonstrate that our system can empower two different low-cost robots to autonomously select and execute appropriate parkour skills to traverse challenging real-world environments.

* CoRL 2023 (Oral). Project website at https://robot-parkour.github.io 
Viaarxiv icon

Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion

Oct 18, 2022
Zipeng Fu, Xuxin Cheng, Deepak Pathak

Figure 1 for Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion
Figure 2 for Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion
Figure 3 for Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion
Figure 4 for Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion

An attached arm can significantly increase the applicability of legged robots to several mobile manipulation tasks that are not possible for the wheeled or tracked counterparts. The standard hierarchical control pipeline for such legged manipulators is to decouple the controller into that of manipulation and locomotion. However, this is ineffective. It requires immense engineering to support coordination between the arm and legs, and error can propagate across modules causing non-smooth unnatural motions. It is also biological implausible given evidence for strong motor synergies across limbs. In this work, we propose to learn a unified policy for whole-body control of a legged manipulator using reinforcement learning. We propose Regularized Online Adaptation to bridge the Sim2Real gap for high-DoF control, and Advantage Mixing exploiting the causal dependency in the action space to overcome local minima during training the whole-body system. We also present a simple design for a low-cost legged manipulator, and find that our unified policy can demonstrate dynamic and agile behaviors across several task setups. Videos are at https://maniploco.github.io

* CoRL 2022 (Oral). Project website at https://maniploco.github.io 
Viaarxiv icon

Coupling Vision and Proprioception for Navigation of Legged Robots

Dec 03, 2021
Zipeng Fu, Ashish Kumar, Ananye Agarwal, Haozhi Qi, Jitendra Malik, Deepak Pathak

Figure 1 for Coupling Vision and Proprioception for Navigation of Legged Robots
Figure 2 for Coupling Vision and Proprioception for Navigation of Legged Robots
Figure 3 for Coupling Vision and Proprioception for Navigation of Legged Robots
Figure 4 for Coupling Vision and Proprioception for Navigation of Legged Robots

We exploit the complementary strengths of vision and proprioception to achieve point goal navigation in a legged robot. Legged systems are capable of traversing more complex terrain than wheeled robots, but to fully exploit this capability, we need the high-level path planner in the navigation system to be aware of the walking capabilities of the low-level locomotion policy on varying terrains. We achieve this by using proprioceptive feedback to estimate the safe operating limits of the walking policy, and to sense unexpected obstacles and terrain properties like smoothness or softness of the ground that may be missed by vision. The navigation system uses onboard cameras to generate an occupancy map and a corresponding cost map to reach the goal. The FMM (Fast Marching Method) planner then generates a target path. The velocity command generator takes this as input to generate the desired velocity for the locomotion policy using as input additional constraints, from the safety advisor, of unexpected obstacles and terrain determined speed limits. We show superior performance compared to wheeled robot (LoCoBot) baselines, and other baselines which have disjoint high-level planning and low-level control. We also show the real-world deployment of our system on a quadruped robot with onboard sensors and compute. Videos at https://navigation-locomotion.github.io/camera-ready

* Website and videos at https://navigation-locomotion.github.io/camera-ready 
Viaarxiv icon

Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

Oct 25, 2021
Zipeng Fu, Ashish Kumar, Jitendra Malik, Deepak Pathak

Figure 1 for Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots
Figure 2 for Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots
Figure 3 for Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots
Figure 4 for Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

Legged locomotion is commonly studied and expressed as a discrete set of gait patterns, like walk, trot, gallop, which are usually treated as given and pre-programmed in legged robots for efficient locomotion at different speeds. However, fixing a set of pre-programmed gaits limits the generality of locomotion. Recent animal motor studies show that these conventional gaits are only prevalent in ideal flat terrain conditions while real-world locomotion is unstructured and more like bouts of intermittent steps. What principles could lead to both structured and unstructured patterns across mammals and how to synthesize them in robots? In this work, we take an analysis-by-synthesis approach and learn to move by minimizing mechanical energy. We demonstrate that learning to minimize energy consumption plays a key role in the emergence of natural locomotion gaits at different speeds in real quadruped robots. The emergent gaits are structured in ideal terrains and look similar to that of horses and sheep. The same approach leads to unstructured gaits in rough terrains which is consistent with the findings in animal motor control. We validate our hypothesis in both simulation and real hardware across natural terrains. Videos at https://energy-locomotion.github.io

* CoRL 2021. Website at https://energy-locomotion.github.io 
Viaarxiv icon

Emergence of Theory of Mind Collaboration in Multiagent Systems

Sep 30, 2021
Luyao Yuan, Zipeng Fu, Linqi Zhou, Kexin Yang, Song-Chun Zhu

Figure 1 for Emergence of Theory of Mind Collaboration in Multiagent Systems
Figure 2 for Emergence of Theory of Mind Collaboration in Multiagent Systems
Figure 3 for Emergence of Theory of Mind Collaboration in Multiagent Systems
Figure 4 for Emergence of Theory of Mind Collaboration in Multiagent Systems

Currently, in the study of multiagent systems, the intentions of agents are usually ignored. Nonetheless, as pointed out by Theory of Mind (ToM), people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. However, due to its intrinsic recursion and intractable modeling of distribution over belief, integrating ToM in multiagent planning and decision making is still a challenge. In this paper, we incorporate ToM in multiagent partially observable Markov decision process (POMDP) and propose an adaptive training algorithm to develop effective collaboration between agents with ToM. We evaluate our algorithms with two games, where our algorithm surpasses all previous decentralized execution algorithms without modeling ToM.

* Emergent Communication Workshop, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)  
Viaarxiv icon

RMA: Rapid Motor Adaptation for Legged Robots

Jul 08, 2021
Ashish Kumar, Zipeng Fu, Deepak Pathak, Jitendra Malik

Figure 1 for RMA: Rapid Motor Adaptation for Legged Robots
Figure 2 for RMA: Rapid Motor Adaptation for Legged Robots
Figure 3 for RMA: Rapid Motor Adaptation for Legged Robots
Figure 4 for RMA: Rapid Motor Adaptation for Legged Robots

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/

* RSS 2021. Webpage at https://ashish-kmr.github.io/rma-legged-robots/ 
Viaarxiv icon

Emergence of Pragmatics from Referential Game between Theory of Mind Agents

Jan 21, 2020
Luyao Yuan, Zipeng Fu, Jingyue Shen, Lu Xu, Junhong Shen, Song-Chun Zhu

Figure 1 for Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Figure 2 for Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Figure 3 for Emergence of Pragmatics from Referential Game between Theory of Mind Agents
Figure 4 for Emergence of Pragmatics from Referential Game between Theory of Mind Agents

Pragmatics studies how context can contribute to language meanings [1]. In human communication, language is never interpreted out of context, and sentences can usually convey more information than their literal meanings [2]. However, this mechanism is missing in most multi-agent systems [3, 4, 5, 6], restricting the communication efficiency and the capability of human-agent interaction. In this paper, we propose an algorithm, using which agents can spontaneously learn the ability to "read between lines" without any explicit hand-designed rules. We integrate the theory of mind (ToM) [7, 8] in a cooperative multi-agent pedagogical situation and propose an adaptive reinforcement learning (RL) algorithm to develop a communication protocol. ToM is a profound cognitive science concept, claiming that people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. With this ability, agents consider language as not only messages but also rational acts reflecting others' hidden states. Our experiments demonstrate the advantage of pragmatic protocols over non-pragmatic protocols. We also show the teaching complexity following the pragmatic protocol empirically approximates to recursive teaching dimension (RTD).

Viaarxiv icon