Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chongjie Zhang

Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Nov 02, 2023

Heng Dong, Junyu Zhang, Chongjie Zhang

Figure 1 for Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Figure 2 for Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Figure 3 for Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Figure 4 for Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

Abstract:Multi-cellular robot design aims to create robots comprised of numerous cells that can be efficiently controlled to perform diverse tasks. Previous research has demonstrated the ability to generate robots for various tasks, but these approaches often optimize robots directly in the vast design space, resulting in robots with complicated morphologies that are hard to control. In response, this paper presents a novel coarse-to-fine method for designing multi-cellular robots. Initially, this strategy seeks optimal coarse-grained robots and progressively refines them. To mitigate the challenge of determining the precise refinement juncture during the coarse-to-fine transition, we introduce the Hyperbolic Embeddings for Robot Design (HERD) framework. HERD unifies robots of various granularity within a shared hyperbolic space and leverages a refined Cross-Entropy Method for optimization. This framework enables our method to autonomously identify areas of exploration in hyperbolic space and concentrate on regions demonstrating promise. Finally, the extensive empirical studies on various challenging tasks sourced from EvoGym show our approach's superior efficiency and generalization capability.

Via

Access Paper or Ask Questions

Unsupervised Behavior Extraction via Random Intent Priors

Oct 28, 2023

Hao Hu, Yiqin Yang, Jianing Ye, Ziqing Mai, Chongjie Zhang

Abstract:Reward-free data is abundant and contains rich prior knowledge of human behaviors, but it is not well exploited by offline reinforcement learning (RL) algorithms. In this paper, we propose UBER, an unsupervised approach to extract useful behaviors from offline reward-free datasets via diversified rewards. UBER assigns different pseudo-rewards sampled from a given prior distribution to different agents to extract a diverse set of behaviors, and reuse them as candidate policies to facilitate the learning of new tasks. Perhaps surprisingly, we show that rewards generated from random neural networks are sufficient to extract diverse and useful behaviors, some even close to expert ones. We provide both empirical and theoretical evidence to justify the use of random priors for the reward function. Experiments on multiple benchmarks showcase UBER's ability to learn effective and diverse behavior sets that enhance sample efficiency for online RL, outperforming existing baselines. By reducing reliance on human supervision, UBER broadens the applicability of RL to real-world scenarios with abundant reward-free data.

* Thirty-seventh Conference on Neural Information Processing Systems

Via

Access Paper or Ask Questions

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

Oct 19, 2023

Rui Yang, Han Zhong, Jiawei Xu, Amy Zhang, Chongjie Zhang, Lei Han, Tong Zhang

Figure 1 for Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

Figure 2 for Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

Figure 3 for Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

Figure 4 for Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

Abstract:Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment. However, datasets collected by humans in real-world environments are often noisy and may even be maliciously corrupted, which can significantly degrade the performance of offline RL. In this work, we first investigate the performance of current offline RL algorithms under comprehensive data corruption, including states, actions, rewards, and dynamics. Our extensive experiments reveal that implicit Q-learning (IQL) demonstrates remarkable resilience to data corruption among various offline RL algorithms. Furthermore, we conduct both empirical and theoretical analyses to understand IQL's robust performance, identifying its supervised policy learning scheme as the key factor. Despite its relative robustness, IQL still suffers from heavy-tail targets of Q functions under dynamics corruption. To tackle this challenge, we draw inspiration from robust statistics to employ the Huber loss to handle the heavy-tailedness and utilize quantile estimators to balance penalization for corrupted data and learning stability. By incorporating these simple yet effective modifications into IQL, we propose a more robust offline RL approach named Robust IQL (RIQL). Extensive experiments demonstrate that RIQL exhibits highly robust performance when subjected to diverse data corruption scenarios.

* 31 pages, 17 figures

Via

Access Paper or Ask Questions

Imitation Learning from Observation with Automatic Discount Scheduling

Oct 12, 2023

Yuyang Liu, Weijun Dong, Yingdong Hu, Chuan Wen, Zhao-Heng Yin, Chongjie Zhang, Yang Gao

Abstract:Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinforcement learning problems, utilizing a proxy reward computed from the agent's and the expert's observations. Nonetheless, we identify that tasks characterized by a progress dependency property pose significant challenges for such approaches; in these tasks, the agent needs to initially learn the expert's preceding behaviors before mastering the subsequent ones. Our investigation reveals that the main cause is that the reward signals assigned to later steps hinder the learning of initial behaviors. To address this challenge, we present a novel ILfO framework that enables the agent to master earlier behaviors before advancing to later ones. We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively alters the discount factor in reinforcement learning during the training phase, prioritizing earlier rewards initially and gradually engaging later rewards only when the earlier behaviors have been mastered. Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms state-of-the-art methods across all tasks, including those that are unsolvable by them.

Via

Access Paper or Ask Questions

Never Explore Repeatedly in Multi-Agent Reinforcement Learning

Aug 19, 2023

Chenghao Li, Tonghan Wang, Chongjie Zhang, Qianchuan Zhao

Figure 1 for Never Explore Repeatedly in Multi-Agent Reinforcement Learning

Figure 2 for Never Explore Repeatedly in Multi-Agent Reinforcement Learning

Figure 3 for Never Explore Repeatedly in Multi-Agent Reinforcement Learning

Figure 4 for Never Explore Repeatedly in Multi-Agent Reinforcement Learning

Abstract:In the realm of multi-agent reinforcement learning, intrinsic motivations have emerged as a pivotal tool for exploration. While the computation of many intrinsic rewards relies on estimating variational posteriors using neural network approximators, a notable challenge has surfaced due to the limited expressive capability of these neural statistics approximators. We pinpoint this challenge as the "revisitation" issue, where agents recurrently explore confined areas of the task space. To combat this, we propose a dynamic reward scaling approach. This method is crafted to stabilize the significant fluctuations in intrinsic rewards in previously explored areas and promote broader exploration, effectively curbing the revisitation phenomenon. Our experimental findings underscore the efficacy of our approach, showcasing enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks, especially in sparse reward settings.

Via

Access Paper or Ask Questions

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Aug 14, 2023

Siyuan Li, Hao Li, Jin Zhang, Zhen Wang, Peng Liu, Chongjie Zhang

Abstract:Humans have the ability to reuse previously learned policies to solve new tasks quickly, and reinforcement learning (RL) agents can do the same by transferring knowledge from source policies to a related target task. Transfer RL methods can reshape the policy optimization objective (optimization transfer) or influence the behavior policy (behavior transfer) using source policies. However, selecting the appropriate source policy with limited samples to guide target policy learning has been a challenge. Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions, which can lead to non-stationary policy optimization or heavy sampling costs, diminishing transfer effectiveness. To address this challenge, we propose a novel transfer RL method that selects the source policy without training extra components. Our method utilizes the Q function in the actor-critic framework to guide policy selection, choosing the source policy with the largest one-step improvement over the current target policy. We integrate optimization transfer and behavior transfer (IOB) by regularizing the learned policy to mimic the guidance policy and combining them as the behavior policy. This integration significantly enhances transfer effectiveness, surpasses state-of-the-art transfer RL baselines in benchmark tasks, and improves final performance and knowledge transferability in continual learning scenarios. Additionally, we show that our optimization transfer technique is guaranteed to improve target policy learning.

* 26 pages, 9 figures

Via

Access Paper or Ask Questions

Learning to Solve Tasks with Exploring Prior Behaviours

Jul 06, 2023

Ruiqi Zhu, Siyuan Li, Tianhong Dai, Chongjie Zhang, Oya Celiktutan

Figure 1 for Learning to Solve Tasks with Exploring Prior Behaviours

Figure 2 for Learning to Solve Tasks with Exploring Prior Behaviours

Figure 3 for Learning to Solve Tasks with Exploring Prior Behaviours

Figure 4 for Learning to Solve Tasks with Exploring Prior Behaviours

Abstract:Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.

Via

Access Paper or Ask Questions

What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

Jun 02, 2023

Rui Yang, Yong Lin, Xiaoteng Ma, Hao Hu, Chongjie Zhang, Tong Zhang

Figure 1 for What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

Figure 2 for What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

Figure 3 for What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

Figure 4 for What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

Abstract:Offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully offline datasets. In addition to being conservative within the dataset, the generalization ability to achieve unseen goals is another fundamental challenge for offline GCRL. However, to the best of our knowledge, this problem has not been well studied yet. In this paper, we study out-of-distribution (OOD) generalization of offline GCRL both theoretically and empirically to identify factors that are important. In a number of experiments, we observe that weighted imitation learning enjoys better generalization than pessimism-based offline RL method. Based on this insight, we derive a theory for OOD generalization, which characterizes several important design choices. We then propose a new offline GCRL method, Generalizable Offline goAl-condiTioned RL (GOAT), by combining the findings from our theoretical and empirical studies. On a new benchmark containing 9 independent identically distributed (IID) tasks and 17 OOD tasks, GOAT outperforms current state-of-the-art methods by a large margin.

* Accepted by International Conference on Machine Learning (ICML), 2023

Via

Access Paper or Ask Questions

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Jun 01, 2023

Jianhao Wang, Jin Zhang, Haozhe Jiang, Junyu Zhang, Liwei Wang, Chongjie Zhang

Figure 1 for Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Figure 2 for Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Figure 3 for Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Figure 4 for Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Abstract:Recent offline meta-reinforcement learning (meta-RL) methods typically utilize task-dependent behavior policies (e.g., training RL agents on each individual task) to collect a multi-task dataset. However, these methods always require extra information for fast adaptation, such as offline context for testing tasks. To address this problem, we first formally characterize a unique challenge in offline meta-RL: transition-reward distribution shift between offline datasets and online adaptation. Our theory finds that out-of-distribution adaptation episodes may lead to unreliable policy evaluation and that online adaptation with in-distribution episodes can ensure adaptation performance guarantee. Based on these theoretical insights, we propose a novel adaptation framework, called In-Distribution online Adaptation with uncertainty Quantification (IDAQ), which generates in-distribution context using a given uncertainty quantification and performs effective task belief inference to address new tasks. We find a return-based uncertainty quantification for IDAQ that performs effectively. Experiments show that IDAQ achieves state-of-the-art performance on the Meta-World ML1 benchmark compared to baselines with/without offline adaptation.

Via

Access Paper or Ask Questions

Symmetry-Aware Robot Design with Structured Subgroups

May 31, 2023

Heng Dong, Junyu Zhang, Tonghan Wang, Chongjie Zhang

Figure 1 for Symmetry-Aware Robot Design with Structured Subgroups

Figure 2 for Symmetry-Aware Robot Design with Structured Subgroups

Figure 3 for Symmetry-Aware Robot Design with Structured Subgroups

Figure 4 for Symmetry-Aware Robot Design with Structured Subgroups

Abstract:Robot design aims at learning to create robots that can be easily controlled and perform tasks efficiently. Previous works on robot design have proven its ability to generate robots for various tasks. However, these works searched the robots directly from the vast design space and ignored common structures, resulting in abnormal robots and poor performance. To tackle this problem, we propose a Symmetry-Aware Robot Design (SARD) framework that exploits the structure of the design space by incorporating symmetry searching into the robot design process. Specifically, we represent symmetries with the subgroups of the dihedral group and search for the optimal symmetry in structured subgroups. Then robots are designed under the searched symmetry. In this way, SARD can design efficient symmetric robots while covering the original design space, which is theoretically analyzed. We further empirically evaluate SARD on various tasks, and the results show its superior efficiency and generalizability.

* The Fortieth International Conference on Machine Learning (ICML 2023)

Via

Access Paper or Ask Questions