Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianyu Chen

Scale-Equivalent Distillation for Semi-Supervised Object Detection

Mar 26, 2022

Qiushan Guo, Yao Mu, Jianyu Chen, Tianqi Wang, Yizhou Yu, Ping Luo

Figure 1 for Scale-Equivalent Distillation for Semi-Supervised Object Detection

Figure 2 for Scale-Equivalent Distillation for Semi-Supervised Object Detection

Figure 3 for Scale-Equivalent Distillation for Semi-Supervised Object Detection

Figure 4 for Scale-Equivalent Distillation for Semi-Supervised Object Detection

Abstract:Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, i.e., generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals. Although they achieved certain success, the limited labeled data in semi-supervised learning scales up the challenges of object detection. We analyze the challenges these methods meet with the empirical experiment results. We find that the massive False Negative samples and inferior localization precision lack consideration. Besides, the large variance of object sizes and class imbalance (i.e., the extreme ratio between background and object) hinder the performance of prior arts. Further, we overcome these challenges by introducing a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance. SED has several appealing benefits compared to the previous works. (1) SED imposes a consistency regularization to handle the large scale variance problem. (2) SED alleviates the noise problem from the False Negative samples and inferior localization precision. (3) A re-weighting strategy can implicitly screen the potential foreground regions of the unlabeled data to reduce the effect of class imbalance. Extensive experiments show that SED consistently outperforms the recent state-of-the-art methods on different datasets with significant margins. For example, it surpasses the supervised counterpart by more than 10 mAP when using 5% and 10% labeled data on MS-COCO.

* Accepted by CVPR 2022

Via

Access Paper or Ask Questions

Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Mar 02, 2022

Hai Zhong, Yutaka Shimizu, Jianyu Chen

Figure 1 for Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Figure 2 for Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Figure 3 for Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Figure 4 for Chance-Constrained Iterative Linear-Quadratic Stochastic Games

Abstract:Dynamic game arises as a powerful paradigm for multi-robot planning, for which the safety constraints satisfaction is crucial. Constrained stochastic games are of particular interest, as real-world robots need to operate and satisfy constraints under uncertainty. Existing methods for solving stochastic games handle constraints using soft penalties with hand-tuned weights. However, finding a suitable penalty weight is non-trivial and requires trial and error. In this paper, we propose the chance-constrained iterative linear-quadratic stochastic games (CCILQGames) algorithm. CCILQGames solves chance-constrained stochastic games using the augmented Lagrangian method, with the merit of automatically finding a suitable penalty weight. We evaluate our algorithm in three autonomous driving scenarios, including merge, intersection, and roundabout. Experimental results and Monte Carlo tests show that CCILQGames could generate safe and interactive strategies in stochastic environments.

* Submitted to IROS 2022, 8 pages, 4 figures

Via

Access Paper or Ask Questions

Zeroth-Order Actor-Critic

Jan 29, 2022

Yuheng Lei, Jianyu Chen, Shengbo Eben Li, Sifa Zheng

Abstract:Zeroth-order optimization methods and policy gradient based first-order methods are two promising alternatives to solve reinforcement learning (RL) problems with complementary advantages. The former work with arbitrary policies, drive state-dependent and temporally-extended exploration, possess robustness-seeking property, but suffer from high sample complexity, while the latter are more sample efficient but restricted to differentiable policies and the learned policies are less robust. We propose Zeroth-Order Actor-Critic algorithm (ZOAC) that unifies these two methods into an on-policy actor-critic architecture to preserve the advantages from both. ZOAC conducts rollouts collection with timestep-wise perturbation in parameter space, first-order policy evaluation (PEV) and zeroth-order policy improvement (PIM) alternately in each iteration. We evaluate our proposed method on a range of challenging continuous control benchmarks using different types of policies, where ZOAC outperforms zeroth-order and first-order baseline algorithms.

Via

Access Paper or Ask Questions

Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Nov 25, 2021

Haitong Ma, Changliu Liu, Shengbo Eben Li, Sifa Zheng, Wenchao Sun, Jianyu Chen

Figure 1 for Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Figure 2 for Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Figure 3 for Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Figure 4 for Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Abstract:In the trial-and-error mechanism of reinforcement learning (RL), a notorious contradiction arises when we expect to learn a safe policy: how to learn a safe policy without enough data and prior model about the dangerous region? Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger. This fact causes that the agent cannot learn a zero-violation policy even after convergence. Otherwise, it would not receive any penalty and lose the knowledge about danger. In this paper, we propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions, or the safety indexes. The safety index is designed to increase rapidly for potentially dangerous actions, which allows us to locate the safe set on the action space, or the control safe set. Therefore, we can identify the dangerous actions prior to taking them, and further obtain a zero constraint-violation policy after convergence.We claim that we can learn the energy function in a model-free manner similar to learning a value function. By using the energy function transition as the constraint objective, we formulate a constrained RL problem. We prove that our Lagrangian-based solutions make sure that the learned policy will converge to the constrained optimum under some assumptions. The proposed algorithm is evaluated on both the complex simulation environments and a hardware-in-loop (HIL) experiment with a real controller from the autonomous vehicle. Experimental results suggest that the converged policy in all environments achieves zero constraint violation and comparable performance with model-based baselines.

Via

Access Paper or Ask Questions

Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Nov 15, 2021

Haitong Ma, Changliu Liu, Shengbo Eben Li, Sifa Zheng, Jianyu Chen

Figure 1 for Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Figure 2 for Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Figure 3 for Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Figure 4 for Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Abstract:Safety is the major consideration in controlling complex dynamical systems using reinforcement learning (RL), where the safety certificate can provide provable safety guarantee. A valid safety certificate is an energy function indicating that safe states are with low energy, and there exists a corresponding safe control policy that allows the energy function to always dissipate. The safety certificate and the safe control policy are closely related to each other and both challenging to synthesize. Therefore, existing learning-based studies treat either of them as prior knowledge to learn the other, which limits their applicability with general unknown dynamics. This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL. We do not rely on prior knowledge about either an available model-based controller or a perfect safety certificate. In particular, we formulate a loss function to optimize the safety certificate parameters by minimizing the occurrence of energy increases. By adding this optimization procedure as an outer loop to the Lagrangian-based constrained reinforcement learning (CRL), we jointly update the policy and safety certificate parameters and prove that they will converge to their respective local optima, the optimal safe policy and a valid safety certificate. We evaluate our algorithms on multiple safety-critical benchmark environments. The results show that the proposed algorithm learns provably safe policies with no constraint violation. The validity or feasibility of synthesized safety certificate is also verified numerically.

Via

Access Paper or Ask Questions

Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Aug 26, 2021

Baiyu Peng, Jingliang Duan, Jianyu Chen, Shengbo Eben Li, Genjin Xie, Congsheng Zhang, Yang Guan, Yao Mu, Enxin Sun

Figure 1 for Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Figure 2 for Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Figure 3 for Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Figure 4 for Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Abstract:Safety is essential for reinforcement learning (RL) applied in the real world. Adding chance constraints (or probabilistic constraints) is a suitable way to enhance RL safety under uncertainty. Existing chance-constrained RL methods like the penalty methods and the Lagrangian methods either exhibit periodic oscillations or learn an over-conservative or unsafe policy. In this paper, we address these shortcomings by proposing a separated proportional-integral Lagrangian (SPIL) algorithm. We first review the constrained policy optimization process from a feedback control perspective, which regards the penalty weight as the control input and the safe probability as the control output. Based on this, the penalty method is formulated as a proportional controller, and the Lagrangian method is formulated as an integral controller. We then unify them and present a proportional-integral Lagrangian method to get both their merits, with an integral separation technique to limit the integral value in a reasonable range. To accelerate training, the gradient of safe probability is computed in a model-based manner. We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation. To prove its practicality, we also apply our method to a real-world mobile robot navigation task, where our robot successfully avoids a moving obstacle with highly uncertain or even aggressive behaviors.

Via

Access Paper or Ask Questions

Constrained Iterative LQG for Real-Time Chance-Constrained Gaussian Belief Space Planning

Aug 21, 2021

Jianyu Chen, Yutaka Shimizu, Liting Sun, Masayoshi Tomizuka, Wei Zhan

Figure 1 for Constrained Iterative LQG for Real-Time Chance-Constrained Gaussian Belief Space Planning

Figure 2 for Constrained Iterative LQG for Real-Time Chance-Constrained Gaussian Belief Space Planning

Figure 3 for Constrained Iterative LQG for Real-Time Chance-Constrained Gaussian Belief Space Planning

Figure 4 for Constrained Iterative LQG for Real-Time Chance-Constrained Gaussian Belief Space Planning

Abstract:Motion planning under uncertainty is of significant importance for safety-critical systems such as autonomous vehicles. Such systems have to satisfy necessary constraints (e.g., collision avoidance) with potential uncertainties coming from either disturbed system dynamics or noisy sensor measurements. However, existing motion planning methods cannot efficiently find the robust optimal solutions under general nonlinear and non-convex settings. In this paper, we formulate such problem as chance-constrained Gaussian belief space planning and propose the constrained iterative Linear Quadratic Gaussian (CILQG) algorithm as a real-time solution. In this algorithm, we iteratively calculate a Gaussian approximation of the belief and transform the chance-constraints. We evaluate the effectiveness of our method in simulations of autonomous driving planning tasks with static and dynamic obstacles. Results show that CILQG can handle uncertainties more appropriately and has faster computation time than baseline methods.

* IROS 2021

Via

Access Paper or Ask Questions

Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

May 28, 2021

Haitong Ma, Yang Guan, Shegnbo Eben Li, Xiangteng Zhang, Sifa Zheng, Jianyu Chen

Figure 1 for Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Figure 2 for Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Figure 3 for Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Figure 4 for Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety

Abstract:The safety constraints commonly used by existing safe reinforcement learning (RL) methods are defined only on expectation of initial states, but allow each certain state to be unsafe, which is unsatisfying for real-world safety-critical tasks. In this paper, we introduce the feasible actor-critic (FAC) algorithm, which is the first model-free constrained RL method that considers statewise safety, e.g, safety for each initial state. We claim that some states are inherently unsafe no matter what policy we choose, while for other states there exist policies ensuring safety, where we say such states and policies are feasible. By constructing a statewise Lagrange function available on RL sampling and adopting an additional neural network to approximate the statewise Lagrange multiplier, we manage to obtain the optimal feasible policy which ensures safety for each feasible state and the safest possible policy for infeasible states. Furthermore, the trained multiplier net can indicate whether a given state is feasible or not through the statewise complementary slackness condition. We provide theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization. Experimental results on both robot locomotive tasks and safe exploration tasks verify the safety enhancement and feasibility interpretation of the proposed method.

* There are some confusions in Theorem 2 in section 4. We will resubmit it until this problem is fixed

Via

Access Paper or Ask Questions

BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

May 16, 2021

Yu Shen, Sijie Zhu, Taojiannan Yang, Chen Chen, Delu Pan, Jianyu Chen, Liang Xiao, Qian Du

Figure 1 for BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

Figure 2 for BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

Figure 3 for BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

Figure 4 for BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

Abstract:Fast and effective responses are required when a natural disaster (e.g., earthquake, hurricane, etc.) strikes. Building damage assessment from satellite imagery is critical before relief effort is deployed. With a pair of pre- and post-disaster satellite images, building damage assessment aims at predicting the extent of damage to buildings. With the powerful ability of feature representation, deep neural networks have been successfully applied to building damage assessment. Most existing works simply concatenate pre- and post-disaster images as input of a deep neural network without considering their correlations. In this paper, we propose a novel two-stage convolutional neural network for Building Damage Assessment, called BDANet. In the first stage, a U-Net is used to extract the locations of buildings. Then the network weights from the first stage are shared in the second stage for building damage assessment. In the second stage, a two-branch multi-scale U-Net is employed as backbone, where pre- and post-disaster images are fed into the network separately. A cross-directional attention module is proposed to explore the correlations between pre- and post-disaster images. Moreover, CutMix data augmentation is exploited to tackle the challenge of difficult classes. The proposed method achieves state-of-the-art performance on a large-scale dataset -- xBD. The code is available at https://github.com/ShaneShen/BDANet-Building-Damage-Assessment.

* arXiv admin note: text overlap with arXiv:2010.14014

Via

Access Paper or Ask Questions

Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Mar 05, 2021

Haitong Ma, Jianyu Chen, Shengbo Eben Li, Ziyu Lin, Yang Guan, Yangang Ren, Sifa Zheng

Figure 1 for Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Figure 2 for Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Figure 3 for Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Figure 4 for Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Abstract:Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving. However, existing studies mostly use model-free constrained RL, which causes inevitable constraint violations. This paper proposes a model-based feasibility enhancement technique of constrained RL, which enhances the feasibility of policy using generalized control barrier function (GCBF) defined on the distance to constraint boundary. By using the model information, the policy can be optimized safely without violating actual safety constraints, and the sample efficiency is increased. The major difficulty of infeasibility in solving the constrained policy gradient is handled by an adaptive coefficient mechanism. We evaluate the proposed method in both simulations and real vehicle experiments in a complex autonomous driving collision avoidance task. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.

Via

Access Paper or Ask Questions