Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianyu Chen

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

Oct 09, 2022

Yao Mu, Yuzheng Zhuang, Fei Ni, Bin Wang, Jianyu Chen, Jianye Hao, Ping Luo

Figure 1 for Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

Figure 2 for Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

Figure 3 for Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

Figure 4 for Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

Abstract:Adapting to the changes in transition dynamics is essential in robotic applications. By learning a conditional policy with a compact context, context-aware meta-reinforcement learning provides a flexible way to adjust behavior according to dynamics changes. However, in real-world applications, the agent may encounter complex dynamics changes. Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making. This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, while minimizing the state transition prediction error. Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges via learning disentangled context and reduce the demand for the number of samples collected in various environments. Extensive experiments show that the context learned by DOMINO benefits both model-based and model-free reinforcement learning algorithms for dynamics generalization in terms of sample efficiency and performance in unseen environments.

* NeurIPS 2022

Via

Access Paper or Ask Questions

Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model

Oct 08, 2022

Zeyu Gao, Yao Mu, Ruoyan Shen, Chen Chen, Yangang Ren, Jianyu Chen, Shengbo Eben Li, Ping Luo, Yanfeng Lu

Figure 1 for Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model

Figure 2 for Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model

Figure 3 for Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model

Figure 4 for Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model

Abstract:End-to-end autonomous driving provides a feasible way to automatically maximize overall driving system performance by directly mapping the raw pixels from a front-facing camera to control signals. Recent advanced methods construct a latent world model to map the high dimensional observations into compact latent space. However, the latent states embedded by the world model proposed in previous works may contain a large amount of task-irrelevant information, resulting in low sampling efficiency and poor robustness to input perturbations. Meanwhile, the training data distribution is usually unbalanced, and the learned policy is hard to cope with the corner cases during the driving process. To solve the above challenges, we present a semantic masked recurrent world model (SEM2), which introduces a latent filter to extract key task-relevant features and reconstruct a semantic mask via the filtered features, and is trained with a multi-source data sampler, which aggregates common data and multiple corner case data in a single batch, to balance the data distribution. Extensive experiments on CARLA show that our method outperforms the state-of-the-art approaches in terms of sample efficiency and robustness to input permutations.

* 11 pages, 7 figures, 1 table, submitted to Deep RL Workshop 2022

Via

Access Paper or Ask Questions

Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

Oct 01, 2022

Zheng Wu, Yichen Xie, Wenzhao Lian, Changhao Wang, Yanjiang Guo, Jianyu Chen, Stefan Schaal, Masayoshi Tomizuka

Figure 1 for Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

Figure 2 for Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

Figure 3 for Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

Figure 4 for Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

Abstract:Humans are capable of abstracting various tasks as different combinations of multiple attributes. This perspective of compositionality is vital for human rapid learning and adaption since previous experiences from related tasks can be combined to generalize across novel compositional settings. In this work, we aim to achieve zero-shot policy generalization of Reinforcement Learning (RL) agents by leveraging the task compositionality. Our proposed method is a meta- RL algorithm with disentangled task representation, explicitly encoding different aspects of the tasks. Policy generalization is then performed by inferring unseen compositional task representations via the obtained disentanglement without extra exploration. The evaluation is conducted on three simulated tasks and a challenging real-world robotic insertion task. Experimental results demonstrate that our proposed method achieves policy generalization to unseen compositional tasks in a zero-shot manner.

* 7 pages, 9 figures

Via

Access Paper or Ask Questions

Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning

Sep 11, 2022

Yuheng Lei, Jianyu Chen, Shengbo Eben Li, Sifa Zheng

Figure 1 for Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning

Figure 2 for Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning

Figure 3 for Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning

Figure 4 for Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning

Abstract:Choosing an appropriate parameter set for the designed controller is critical for the final performance but usually requires a tedious and careful tuning process, which implies a strong need for automatic tuning methods. However, among existing methods, derivative-free ones suffer from poor scalability or low efficiency, while gradient-based ones are often unavailable due to possibly non-differentiable controller structure. To resolve the issues, we tackle the controller tuning problem using a novel derivative-free reinforcement learning (RL) framework, which performs timestep-wise perturbation in parameter space during experience collection and integrates derivative-free policy updates into the advanced actor-critic RL architecture to achieve high versatility and efficiency. To demonstrate the framework's efficacy, we conduct numerical experiments on two concrete examples from autonomous driving, namely, adaptive cruise control with PID controller and trajectory tracking with MPC controller. Experimental results show that the proposed method outperforms popular baselines and highlight its strong potential for controller tuning.

* Accepted by the 61st IEEE Conference on Decision and Control (CDC), 2022. Copyright @IEEE

Via

Access Paper or Ask Questions

A Contact-Safe Reinforcement Learning Framework for Contact-Rich Robot Manipulation

Jul 27, 2022

Xiang Zhu, Shucheng Kang, Jianyu Chen

Figure 1 for A Contact-Safe Reinforcement Learning Framework for Contact-Rich Robot Manipulation

Figure 2 for A Contact-Safe Reinforcement Learning Framework for Contact-Rich Robot Manipulation

Figure 3 for A Contact-Safe Reinforcement Learning Framework for Contact-Rich Robot Manipulation

Figure 4 for A Contact-Safe Reinforcement Learning Framework for Contact-Rich Robot Manipulation

Abstract:Reinforcement learning shows great potential to solve complex contact-rich robot manipulation tasks. However, the safety of using RL in the real world is a crucial problem, since unexpected dangerous collisions might happen when the RL policy is imperfect during training or in unseen scenarios. In this paper, we propose a contact-safe reinforcement learning framework for contact-rich robot manipulation, which maintains safety in both the task space and joint space. When the RL policy causes unexpected collisions between the robot arm and the environment, our framework is able to immediately detect the collision and ensure the contact force to be small. Furthermore, the end-effector is enforced to perform contact-rich tasks compliantly, while keeping robust to external disturbances. We train the RL policy in simulation and transfer it to the real robot. Real world experiments on robot wiping tasks show that our method is able to keep the contact force small both in task space and joint space even when the policy is under unseen scenario with unexpected collision, while rejecting the disturbances on the main task.

* 7 pages, 5 figures, accepted to IROS 2022

Via

Access Paper or Ask Questions

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Jun 17, 2022

Yao Mu, Shoufa Chen, Mingyu Ding, Jianyu Chen, Runjian Chen, Ping Luo

Figure 1 for CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Figure 2 for CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Figure 3 for CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Figure 4 for CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

Abstract:Transformer has achieved great successes in learning vision and language representation, which is general across various downstream tasks. In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size. However, porting Transformer to sample-efficient visual control remains a challenging and unsolved problem. To this end, we propose a novel Control Transformer (CtrlFormer), possessing many appealing benefits that prior arts do not have. Firstly, CtrlFormer jointly learns self-attention mechanisms between visual tokens and policy tokens among different control tasks, where multitask representation can be learned and transferred without catastrophic forgetting. Secondly, we carefully design a contrastive reinforcement learning paradigm to train CtrlFormer, enabling it to achieve high sample efficiency, which is important in control problems. For example, in the DMControl benchmark, unlike recent advanced methods that failed by producing a zero score in the "Cartpole" task after transfer learning with 100k samples, CtrlFormer can achieve a state-of-the-art score with only 100k samples while maintaining the performance of previous tasks. The code and models are released in our project homepage.

* ICML 2022

Via

Access Paper or Ask Questions

Flow-based Recurrent Belief State Learning for POMDPs

May 23, 2022

Xiaoyu Chen, Yao Mu, Ping Luo, Shengbo Li, Jianyu Chen

Figure 1 for Flow-based Recurrent Belief State Learning for POMDPs

Figure 2 for Flow-based Recurrent Belief State Learning for POMDPs

Figure 3 for Flow-based Recurrent Belief State Learning for POMDPs

Figure 4 for Flow-based Recurrent Belief State Learning for POMDPs

Abstract:Partially Observable Markov Decision Process (POMDP) provides a principled and generic framework to model real world sequential decision making processes but yet remains unsolved, especially for high dimensional continuous space and unknown models. The main challenge lies in how to accurately obtain the belief state, which is the probability distribution over the unobservable environment states given historical information. Accurately calculating this belief state is a precondition for obtaining an optimal policy of POMDPs. Recent advances in deep learning techniques show great potential to learn good belief states. However, existing methods can only learn approximated distribution with limited flexibility. In this paper, we introduce the \textbf{F}l\textbf{O}w-based \textbf{R}ecurrent \textbf{BE}lief \textbf{S}tate model (FORBES), which incorporates normalizing flows into the variational inference to learn general continuous belief states for POMDPs. Furthermore, we show that the learned belief states can be plugged into downstream RL algorithms to improve performance. In experiments, we show that our methods successfully capture the complex belief states that enable multi-modal predictions as well as high quality reconstructions, and results on challenging visual-motor control tasks show that our method achieves superior performance and sample efficiency.

Via

Access Paper or Ask Questions

Reachability Constrained Reinforcement Learning

May 16, 2022

Dongjie Yu, Haitong Ma, Shengbo Eben Li, Jianyu Chen

Figure 1 for Reachability Constrained Reinforcement Learning

Figure 2 for Reachability Constrained Reinforcement Learning

Figure 3 for Reachability Constrained Reinforcement Learning

Figure 4 for Reachability Constrained Reinforcement Learning

Abstract:Constrained Reinforcement Learning (CRL) has gained significant interest recently, since the satisfaction of safety constraints is critical for real world problems. However, existing CRL methods constraining discounted cumulative costs generally lack rigorous definition and guarantee of safety. On the other hand, in the safe control research, safety is defined as persistently satisfying certain state constraints. Such persistent safety is possible only on a subset of the state space, called feasible set, where an optimal largest feasible set exists for a given environment. Recent studies incorporating safe control with CRL using energy-based methods such as control barrier function (CBF), safety index (SI) leverage prior conservative estimation of feasible sets, which harms performance of the learned policy. To deal with this problem, this paper proposes a reachability CRL (RCRL) method by using reachability analysis to characterize the largest feasible sets. We characterize the feasible set by the established self-consistency condition, then a safety value function can be learned and used as constraints in CRL. We also use the multi-time scale stochastic approximation theory to prove that the proposed algorithm converges to a local optimum, where the largest feasible set can be guaranteed. Empirical results on different benchmarks such as safe-control-gym and Safety-Gym validate the learned feasible set, the performance in optimal criteria, and constraint satisfaction of RCRL, compared to state-of-the-art CRL baselines.

* Accepted by ICML 2022

Via

Access Paper or Ask Questions

Scale-Equivalent Distillation for Semi-Supervised Object Detection

Mar 26, 2022

Qiushan Guo, Yao Mu, Jianyu Chen, Tianqi Wang, Yizhou Yu, Ping Luo

Figure 1 for Scale-Equivalent Distillation for Semi-Supervised Object Detection

Figure 2 for Scale-Equivalent Distillation for Semi-Supervised Object Detection

Figure 3 for Scale-Equivalent Distillation for Semi-Supervised Object Detection

Figure 4 for Scale-Equivalent Distillation for Semi-Supervised Object Detection

Abstract:Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, i.e., generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals. Although they achieved certain success, the limited labeled data in semi-supervised learning scales up the challenges of object detection. We analyze the challenges these methods meet with the empirical experiment results. We find that the massive False Negative samples and inferior localization precision lack consideration. Besides, the large variance of object sizes and class imbalance (i.e., the extreme ratio between background and object) hinder the performance of prior arts. Further, we overcome these challenges by introducing a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance. SED has several appealing benefits compared to the previous works. (1) SED imposes a consistency regularization to handle the large scale variance problem. (2) SED alleviates the noise problem from the False Negative samples and inferior localization precision. (3) A re-weighting strategy can implicitly screen the potential foreground regions of the unlabeled data to reduce the effect of class imbalance. Extensive experiments show that SED consistently outperforms the recent state-of-the-art methods on different datasets with significant margins. For example, it surpasses the supervised counterpart by more than 10 mAP when using 5% and 10% labeled data on MS-COCO.

* Accepted by CVPR 2022

Via

Access Paper or Ask Questions

Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Mar 02, 2022

Hai Zhong, Yutaka Shimizu, Jianyu Chen

Figure 1 for Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Figure 2 for Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Figure 3 for Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Figure 4 for Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Abstract:Dynamic game arises as a powerful paradigm for multi-robot planning, for which the safety constraints satisfaction is crucial. Constrained stochastic games are of particular interest, as real-world robots need to operate and satisfy constraints under uncertainty. Existing methods for solving stochastic games handle constraints using soft penalties with hand-tuned weights. However, finding a suitable penalty weight is non-trivial and requires trial and error. In this paper, we propose the chance-constrained iterative linear-quadratic stochastic games (CCILQGames) algorithm. CCILQGames solves chance-constrained stochastic games using the augmented Lagrangian method, with the merit of automatically finding a suitable penalty weight. We evaluate our algorithm in three autonomous driving scenarios, including merge, intersection, and roundabout. Experimental results and Monte Carlo tests show that CCILQGames could generate safe and interactive strategies in stochastic environments.

* Submitted to IROS 2022, 8 pages, 4 figures

Via

Access Paper or Ask Questions