Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shengbo Eben Li

Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Nov 25, 2021

Haitong Ma, Changliu Liu, Shengbo Eben Li, Sifa Zheng, Wenchao Sun, Jianyu Chen

Figure 1 for Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Figure 2 for Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Figure 3 for Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Figure 4 for Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

Abstract:In the trial-and-error mechanism of reinforcement learning (RL), a notorious contradiction arises when we expect to learn a safe policy: how to learn a safe policy without enough data and prior model about the dangerous region? Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger. This fact causes that the agent cannot learn a zero-violation policy even after convergence. Otherwise, it would not receive any penalty and lose the knowledge about danger. In this paper, we propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions, or the safety indexes. The safety index is designed to increase rapidly for potentially dangerous actions, which allows us to locate the safe set on the action space, or the control safe set. Therefore, we can identify the dangerous actions prior to taking them, and further obtain a zero constraint-violation policy after convergence.We claim that we can learn the energy function in a model-free manner similar to learning a value function. By using the energy function transition as the constraint objective, we formulate a constrained RL problem. We prove that our Lagrangian-based solutions make sure that the learned policy will converge to the constrained optimum under some assumptions. The proposed algorithm is evaluated on both the complex simulation environments and a hardware-in-loop (HIL) experiment with a real controller from the autonomous vehicle. Experimental results suggest that the converged policy in all environments achieves zero constraint violation and comparable performance with model-based baselines.

Via

Access Paper or Ask Questions

Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Nov 15, 2021

Haitong Ma, Changliu Liu, Shengbo Eben Li, Sifa Zheng, Jianyu Chen

Figure 1 for Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Figure 2 for Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Figure 3 for Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Figure 4 for Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Abstract:Safety is the major consideration in controlling complex dynamical systems using reinforcement learning (RL), where the safety certificate can provide provable safety guarantee. A valid safety certificate is an energy function indicating that safe states are with low energy, and there exists a corresponding safe control policy that allows the energy function to always dissipate. The safety certificate and the safe control policy are closely related to each other and both challenging to synthesize. Therefore, existing learning-based studies treat either of them as prior knowledge to learn the other, which limits their applicability with general unknown dynamics. This paper proposes a novel approach that simultaneously synthesizes the energy-function-based safety certificate and learns the safe control policy with CRL. We do not rely on prior knowledge about either an available model-based controller or a perfect safety certificate. In particular, we formulate a loss function to optimize the safety certificate parameters by minimizing the occurrence of energy increases. By adding this optimization procedure as an outer loop to the Lagrangian-based constrained reinforcement learning (CRL), we jointly update the policy and safety certificate parameters and prove that they will converge to their respective local optima, the optimal safe policy and a valid safety certificate. We evaluate our algorithms on multiple safety-critical benchmark environments. The results show that the proposed algorithm learns provably safe policies with no constraint violation. The validity or feasibility of synthesized safety certificate is also verified numerically.

Via

Access Paper or Ask Questions

Self-learned Intelligence for Integrated Decision and Control of Automated Vehicles at Signalized Intersections

Nov 10, 2021

Yangang Ren, Jianhua Jiang, Dongjie Yu, Shengbo Eben Li, Jingliang Duan, Chen Chen, Keqiang Li

Figure 1 for Self-learned Intelligence for Integrated Decision and Control of Automated Vehicles at Signalized Intersections

Figure 2 for Self-learned Intelligence for Integrated Decision and Control of Automated Vehicles at Signalized Intersections

Figure 3 for Self-learned Intelligence for Integrated Decision and Control of Automated Vehicles at Signalized Intersections

Figure 4 for Self-learned Intelligence for Integrated Decision and Control of Automated Vehicles at Signalized Intersections

Abstract:Intersection is one of the most complex and accident-prone urban scenarios for autonomous driving wherein making safe and computationally efficient decisions is non-trivial. Current research mainly focuses on the simplified traffic conditions while ignoring the existence of mixed traffic flows, i.e., vehicles, cyclists and pedestrians. For urban roads, different participants leads to a quite dynamic and complex interaction, posing great difficulty to learn an intelligent policy. This paper develops the dynamic permutation state representation in the framework of integrated decision and control (IDC) to handle signalized intersections with mixed traffic flows. Specially, this representation introduces an encoding function and summation operator to construct driving states from environmental observation, capable of dealing with different types and variant number of traffic participants. A constrained optimal control problem is built wherein the objective involves tracking performance and the constraints for different participants and signal lights are designed respectively to assure safety. We solve this problem by offline optimizing encoding function, value function and policy function, wherein the reasonable state representation will be given by the encoding function and then served as the input of policy and value function. An off-policy training is designed to reuse observations from driving environment and backpropagation through time is utilized to update the policy function and encoding function jointly. Verification result shows that the dynamic permutation state representation can enhance the driving performance of IDC, including comfort, decision compliance and safety with a large margin. The trained driving policy can realize efficient and smooth passing in the complex intersection, guaranteeing driving intelligence and safety simultaneously.

Via

Access Paper or Ask Questions

Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios

Sep 12, 2021

Jingliang Duan, Yangang Ren, Fawang Zhang, Yang Guan, Dongjie Yu, Shengbo Eben Li, Bo Cheng, Lin Zhao

Figure 1 for Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios

Figure 2 for Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios

Figure 3 for Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios

Figure 4 for Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios

Abstract:In this paper, we propose a new reinforcement learning (RL) algorithm, called encoding distributional soft actor-critic (E-DSAC), for decision-making in autonomous driving. Unlike existing RL-based decision-making methods, E-DSAC is suitable for situations where the number of surrounding vehicles is variable and eliminates the requirement for manually pre-designed sorting rules, resulting in higher policy performance and generality. We first develop an encoding distributional policy iteration (DPI) framework by embedding a permutation invariant module, which employs a feature neural network (NN) to encode the indicators of each vehicle, in the distributional RL framework. The proposed DPI framework is proved to exhibit important properties in terms of convergence and global optimality. Next, based on the developed encoding DPI framework, we propose the E-DSAC algorithm by adding the gradient-based update rule of the feature NN to the policy evaluation process of the DSAC algorithm. Then, the multi-lane driving task and the corresponding reward function are designed to verify the effectiveness of the proposed algorithm. Results show that the policy learned by E-DSAC can realize efficient, smooth, and relatively safe autonomous driving in the designed scenario. And the final policy performance learned by E-DSAC is about three times that of DSAC. Furthermore, its effectiveness has also been verified in real vehicle experiments.

Via

Access Paper or Ask Questions

Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow

Aug 30, 2021

Jianhua Jiang, Yangang Ren, Yang Guan, Shengbo Eben Li, Yuming Yin, Xiaoping Jin

Figure 1 for Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow

Figure 2 for Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow

Figure 3 for Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow

Figure 4 for Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow

Abstract:Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians. The driving policy should make safe decisions to handle the dynamic traffic conditions and meet the requirements of on-board computation. However, most of the current researches focuses on simplified intersections considering only the surrounding vehicles and idealized traffic lights. This paper improves the integrated decision and control framework and develops a learning-based algorithm to deal with complex intersections with mixed traffic flows, which can not only take account of realistic characteristics of traffic lights, but also learn a safe policy under different safety constraints. We first consider different velocity models for green and red lights in the training process and use a finite state machine to handle different modes of light transformation. Then we design different types of distance constraints for vehicles, traffic lights, pedestrians, bicycles respectively and formulize the constrained optimal control problems (OCPs) to be optimized. Finally, reinforcement learning (RL) with value and policy networks is adopted to solve the series of OCPs. In order to verify the safety and efficiency of the proposed method, we design a multi-lane intersection with the existence of large-scale mixed traffic participants and set practical traffic light phases. The simulation results indicate that the trained decision and control policy can well balance safety and tracking performance. Compared with model predictive control (MPC), the computational time is three orders of magnitude lower.

* 8 pages, 10 figures, 11 equations and 14 conferences

Via

Access Paper or Ask Questions

Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Aug 26, 2021

Baiyu Peng, Jingliang Duan, Jianyu Chen, Shengbo Eben Li, Genjin Xie, Congsheng Zhang, Yang Guan, Yao Mu, Enxin Sun

Figure 1 for Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Figure 2 for Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Figure 3 for Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Figure 4 for Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Abstract:Safety is essential for reinforcement learning (RL) applied in the real world. Adding chance constraints (or probabilistic constraints) is a suitable way to enhance RL safety under uncertainty. Existing chance-constrained RL methods like the penalty methods and the Lagrangian methods either exhibit periodic oscillations or learn an over-conservative or unsafe policy. In this paper, we address these shortcomings by proposing a separated proportional-integral Lagrangian (SPIL) algorithm. We first review the constrained policy optimization process from a feedback control perspective, which regards the penalty weight as the control input and the safe probability as the control output. Based on this, the penalty method is formulated as a proportional controller, and the Lagrangian method is formulated as an integral controller. We then unify them and present a proportional-integral Lagrangian method to get both their merits, with an integral separation technique to limit the integral value in a reasonable range. To accelerate training, the gradient of safe probability is computed in a model-based manner. We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation. To prove its practicality, we also apply our method to a real-world mobile robot navigation task, where our robot successfully avoids a moving obstacle with highly uncertain or even aggressive behaviors.

Via

Access Paper or Ask Questions

Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving

May 24, 2021

Jingliang Duan, Dongjie Yu, Shengbo Eben Li, Wenxuan Wang, Yangang Ren, Ziyu Lin, Bo Cheng

Figure 1 for Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving

Figure 2 for Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving

Figure 3 for Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving

Figure 4 for Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving

Abstract:In this paper, we propose a new state representation method, called encoding sum and concatenation (ESC), for the state representation of decision-making in autonomous driving. Unlike existing state representation methods, ESC is applicable to a variable number of surrounding vehicles and eliminates the need for manually pre-designed sorting rules, leading to higher representation ability and generality. The proposed ESC method introduces a representation neural network (NN) to encode each surrounding vehicle into an encoding vector, and then adds these vectors to obtain the representation vector of the set of surrounding vehicles. By concatenating the set representation with other variables, such as indicators of the ego vehicle and road, we realize the fixed-dimensional and permutation invariant state representation. This paper has further proved that the proposed ESC method can realize the injective representation if the output dimension of the representation NN is greater than the number of variables of all surrounding vehicles. This means that by taking the ESC representation as policy inputs, we can find the nearly optimal representation NN and policy NN by simultaneously optimizing them using gradient-based updating. Experiments demonstrate that compared with the fixed-permutation representation method, the proposed method improves the representation ability of the surrounding vehicles, and the corresponding approximation error is reduced by 62.2%.

Via

Access Paper or Ask Questions

Integrated Decision and Control: Towards Interpretable and Efficient Driving Intelligence

Mar 18, 2021

Yang Guan, Yangang Ren, Shengbo Eben Li, Haitong Ma, Jingliang Duan, Bo Cheng

Figure 1 for Integrated Decision and Control: Towards Interpretable and Efficient Driving Intelligence

Figure 2 for Integrated Decision and Control: Towards Interpretable and Efficient Driving Intelligence

Figure 3 for Integrated Decision and Control: Towards Interpretable and Efficient Driving Intelligence

Figure 4 for Integrated Decision and Control: Towards Interpretable and Efficient Driving Intelligence

Abstract:Decision and control are two of the core functionalities of high-level automated vehicles. Current mainstream methods, such as functionality decomposition or end-to-end reinforcement learning (RL), either suffer high time complexity or poor interpretability and limited safety performance in real-world complex autonomous driving tasks. In this paper, we present an interpretable and efficient decision and control framework for automated vehicles, which decomposes the driving task into multi-path planning and optimal tracking that are structured hierarchically. First, the multi-path planning is to generate several paths only considering static constraints. Then, the optimal tracking is designed to track the optimal path while considering the dynamic obstacles. To that end, in theory, we formulate a constrained optimal control problem (OCP) for each candidate path, optimize them separately and choose the one with the best tracking performance to follow. More importantly, we propose a model-based reinforcement learning (RL) algorithm, which is served as an approximate constrained OCP solver, to unload the heavy computation by the paradigm of offline training and online application. Specifically, the OCPs for all paths are considered together to construct a multi-task RL problem and then solved offline by our algorithm into value and policy networks, for real-time online path selecting and tracking respectively. We verify our framework in both simulation and the real world. Results show that our method has better online computing efficiency and driving performance including traffic efficiency and safety compared with baseline methods. In addition, it yields great interpretability and adaptability among different driving tasks. The real road test also suggests that it is applicable in complicated traffic scenarios without even tuning.

Via

Access Paper or Ask Questions

Approximate Optimal Filter for Linear Gaussian Time-invariant Systems

Mar 09, 2021

Kaiming Tang, Shengbo Eben Li, Yuming Yin, Yang Guan, Jingliang Duan, Wenhan Cao, Jie Li

Figure 1 for Approximate Optimal Filter for Linear Gaussian Time-invariant Systems

Figure 2 for Approximate Optimal Filter for Linear Gaussian Time-invariant Systems

Figure 3 for Approximate Optimal Filter for Linear Gaussian Time-invariant Systems

Figure 4 for Approximate Optimal Filter for Linear Gaussian Time-invariant Systems

Abstract:State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, called Approximate Optimal Filtering (AOF) problem. The equivalence holds given certain conditions about initial state distributions and policy formats, in which the system state is the estimation error, control input is the filter gain, and control objective function is the accumulated estimation error. We present a policy iteration algorithm to solve the AOF problem in steady-state. A classic vehicle state estimation problem finally evaluates the approximate filter. The results show that the policy converges to the steady-state Kalman gain, and its accuracy is within 2 %.

Via

Access Paper or Ask Questions

Decision-Making under On-Ramp merge Scenarios by Distributional Soft Actor-Critic Algorithm

Mar 08, 2021

Yiting Kong, Yang Guan, Jingliang Duan, Shengbo Eben Li, Qi Sun, Bingbing Nie

Figure 1 for Decision-Making under On-Ramp merge Scenarios by Distributional Soft Actor-Critic Algorithm

Figure 2 for Decision-Making under On-Ramp merge Scenarios by Distributional Soft Actor-Critic Algorithm

Figure 3 for Decision-Making under On-Ramp merge Scenarios by Distributional Soft Actor-Critic Algorithm

Figure 4 for Decision-Making under On-Ramp merge Scenarios by Distributional Soft Actor-Critic Algorithm

Abstract:Merging into the highway from the on-ramp is an essential scenario for automated driving. The decision-making under the scenario needs to balance the safety and efficiency performance to optimize a long-term objective, which is challenging due to the dynamic, stochastic, and adversarial characteristics. The Rule-based methods often lead to conservative driving on this task while the learning-based methods have difficulties meeting the safety requirements. In this paper, we propose an RL-based end-to-end decision-making method under a framework of offline training and online correction, called the Shielded Distributional Soft Actor-critic (SDSAC). The SDSAC adopts the policy evaluation with safety consideration and a safety shield parameterized with the barrier function in its offline training and online correction, respectively. These two measures support each other for better safety while not damaging the efficiency performance severely. We verify the SDSAC on an on-ramp merge scenario in simulation. The results show that the SDSAC has the best safety performance compared to baseline algorithms and achieves efficient driving simultaneously.

Via

Access Paper or Ask Questions