Alert button
Picture for Yujie Yang

Yujie Yang

Alert button

Safe Reinforcement Learning with Dual Robustness

Sep 13, 2023
Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, Shengbo Eben Li

Reinforcement learning (RL) agents are vulnerable to adversarial disturbances, which can deteriorate task performance or compromise safety specifications. Existing methods either address safety requirements under the assumption of no adversary (e.g., safe RL) or only focus on robustness against performance adversaries (e.g., robust RL). Learning one policy that is both safe and robust remains a challenging open problem. The difficulty is how to tackle two intertwined aspects in the worst cases: feasibility and optimality. Optimality is only valid inside a feasible region, while identification of maximal feasible region must rely on learning the optimal policy. To address this issue, we propose a systematic framework to unify safe RL and robust RL, including problem formulation, iteration scheme, convergence analysis and practical algorithm design. This unification is built upon constrained two-player zero-sum Markov games. A dual policy iteration scheme is proposed, which simultaneously optimizes a task policy and a safety policy. The convergence of this iteration scheme is proved. Furthermore, we design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC achieves high performance and persistent safety under all scenarios (no adversary, safety adversary, performance adversary), outperforming all baselines significantly.

Viaarxiv icon

S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields

Aug 14, 2023
Zeke Xie, Xindi Yang, Yujie Yang, Qi Sun, Yixiang Jiang, Haoran Wang, Yunfeng Cai, Mingming Sun

Figure 1 for S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields
Figure 2 for S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields
Figure 3 for S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields
Figure 4 for S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields

Recently, Neural Radiance Field (NeRF) has shown great success in rendering novel-view images of a given scene by learning an implicit representation with only posed RGB images. NeRF and relevant neural field methods (e.g., neural surface representation) typically optimize a point-wise loss and make point-wise predictions, where one data point corresponds to one pixel. Unfortunately, this line of research failed to use the collective supervision of distant pixels, although it is known that pixels in an image or scene can provide rich structural information. To the best of our knowledge, we are the first to design a nonlocal multiplex training paradigm for NeRF and relevant neural field methods via a novel Stochastic Structural SIMilarity (S3IM) loss that processes multiple data points as a whole set instead of process multiple inputs independently. Our extensive experiments demonstrate the unreasonable effectiveness of S3IM in improving NeRF and neural surface representation for nearly free. The improvements of quality metrics can be particularly significant for those relatively difficult tasks: e.g., the test MSE loss unexpectedly drops by more than 90% for TensoRF and DVGO over eight novel view synthesis tasks; a 198% F-score gain and a 64% Chamfer $L_{1}$ distance reduction for NeuS over eight surface reconstruction tasks. Moreover, S3IM is consistently robust even with sparse inputs, corrupted images, and dynamic scenes.

* ICCV 2023 main conference. Code: https://github.com/Madaoer/S3IM. 14 pages, 5 figures, 17 tables 
Viaarxiv icon

Feasible Policy Iteration

Apr 18, 2023
Yujie Yang, Zhilong Zheng, Shengbo Eben Li

Figure 1 for Feasible Policy Iteration
Figure 2 for Feasible Policy Iteration
Figure 3 for Feasible Policy Iteration
Figure 4 for Feasible Policy Iteration

Safe reinforcement learning (RL) aims to solve an optimal control problem under safety constraints. Existing $\textit{direct}$ safe RL methods use the original constraint throughout the learning process. They either lack theoretical guarantees of the policy during iteration or suffer from infeasibility problems. To address this issue, we propose an $\textit{indirect}$ safe RL method called feasible policy iteration (FPI) that iteratively uses the feasible region of the last policy to constrain the current policy. The feasible region is represented by a feasibility function called constraint decay function (CDF). The core of FPI is a region-wise policy update rule called feasible policy improvement, which maximizes the return under the constraint of the CDF inside the feasible region and minimizes the CDF outside the feasible region. This update rule is always feasible and ensures that the feasible region monotonically expands and the state-value function monotonically increases inside the feasible region. Using the feasible Bellman equation, we prove that FPI converges to the maximum feasible region and the optimal state-value function. Experiments on classic control tasks and Safety Gym show that our algorithms achieve lower constraint violations and comparable or higher performance than the baselines.

Viaarxiv icon

McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

Nov 16, 2022
Yujie Yang, Changsheng Quan, Xiaofei Li

Figure 1 for McNet: Fuse Multiple Cues for Multichannel Speech Enhancement
Figure 2 for McNet: Fuse Multiple Cues for Multichannel Speech Enhancement
Figure 3 for McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

In multichannel speech enhancement, both spectral and spatial information are vital for discriminating between speech and noise. How to fully exploit these two types of information and their temporal dynamics remains an interesting research problem. As a solution to this problem, this paper proposes a multi-cue fusion network named McNet, which cascades four modules to respectively exploit the full-band spatial, narrow-band spatial, sub-band spectral, and full-band spectral information. Experiments show that each module in the proposed network has its unique contribution and, as a whole, notably outperforms other state-of-the-art methods.

* submitted to icassp 2023 
Viaarxiv icon

Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Oct 14, 2022
Dongjie Yu, Wenjun Zou, Yujie Yang, Haitong Ma, Shengbo Eben Li, Jingliang Duan, Jianyu Chen

Figure 1 for Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate
Figure 2 for Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate
Figure 3 for Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate
Figure 4 for Safe Model-Based Reinforcement Learning with an Uncertainty-Aware Reachability Certificate

Safe reinforcement learning (RL) that solves constraint-satisfactory policies provides a promising way to the broader safety-critical applications of RL in real-world problems such as robotics. Among all safe RL approaches, model-based methods reduce training time violations further due to their high sample efficiency. However, lacking safety robustness against the model uncertainties remains an issue in safe model-based RL, especially in training time safety. In this paper, we propose a distributional reachability certificate (DRC) and its Bellman equation to address model uncertainties and characterize robust persistently safe states. Furthermore, we build a safe RL framework to resolve constraints required by the DRC and its corresponding shield policy. We also devise a line search method to maintain safety and reach higher returns simultaneously while leveraging the shield policy. Comprehensive experiments on classical benchmarks such as constrained tracking and navigation indicate that the proposed algorithm achieves comparable returns with much fewer constraint violations during training.

* 12 pages, 6 figures 
Viaarxiv icon

Steadily Learn to Drive with Virtual Memory

Feb 16, 2021
Yuhang Zhang, Yao Mu, Yujie Yang, Yang Guan, Shengbo Eben Li, Qi Sun, Jianyu Chen

Figure 1 for Steadily Learn to Drive with Virtual Memory
Figure 2 for Steadily Learn to Drive with Virtual Memory
Figure 3 for Steadily Learn to Drive with Virtual Memory
Figure 4 for Steadily Learn to Drive with Virtual Memory

Reinforcement learning has shown great potential in developing high-level autonomous driving. However, for high-dimensional tasks, current RL methods suffer from low data efficiency and oscillation in the training process. This paper proposes an algorithm called Learn to drive with Virtual Memory (LVM) to overcome these problems. LVM compresses the high-dimensional information into compact latent states and learns a latent dynamic model to summarize the agent's experience. Various imagined latent trajectories are generated as virtual memory by the latent dynamic model. The policy is learned by propagating gradient through the learned latent model with the imagined latent trajectories and thus leads to high data efficiency. Furthermore, a double critic structure is designed to reduce the oscillation during the training process. The effectiveness of LVM is demonstrated by an image-input autonomous driving task, in which LVM outperforms the existing method in terms of data efficiency, learning stability, and control performance.

* Submitted to the 32nd IEEE Intelligent Vehicles Symposium 
Viaarxiv icon