Alert button
Picture for Ziyu Lin

Ziyu Lin

Alert button

Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving

May 24, 2021
Jingliang Duan, Dongjie Yu, Shengbo Eben Li, Wenxuan Wang, Yangang Ren, Ziyu Lin, Bo Cheng

Figure 1 for Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving
Figure 2 for Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving
Figure 3 for Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving
Figure 4 for Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving

In this paper, we propose a new state representation method, called encoding sum and concatenation (ESC), for the state representation of decision-making in autonomous driving. Unlike existing state representation methods, ESC is applicable to a variable number of surrounding vehicles and eliminates the need for manually pre-designed sorting rules, leading to higher representation ability and generality. The proposed ESC method introduces a representation neural network (NN) to encode each surrounding vehicle into an encoding vector, and then adds these vectors to obtain the representation vector of the set of surrounding vehicles. By concatenating the set representation with other variables, such as indicators of the ego vehicle and road, we realize the fixed-dimensional and permutation invariant state representation. This paper has further proved that the proposed ESC method can realize the injective representation if the output dimension of the representation NN is greater than the number of variables of all surrounding vehicles. This means that by taking the ESC representation as policy inputs, we can find the nearly optimal representation NN and policy NN by simultaneously optimizing them using gradient-based updating. Experiments demonstrate that compared with the fixed-permutation representation method, the proposed method improves the representation ability of the surrounding vehicles, and the corresponding approximation error is reduced by 62.2%.

Viaarxiv icon

Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Mar 05, 2021
Haitong Ma, Jianyu Chen, Shengbo Eben Li, Ziyu Lin, Yang Guan, Yangang Ren, Sifa Zheng

Figure 1 for Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function
Figure 2 for Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function
Figure 3 for Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function
Figure 4 for Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving. However, existing studies mostly use model-free constrained RL, which causes inevitable constraint violations. This paper proposes a model-based feasibility enhancement technique of constrained RL, which enhances the feasibility of policy using generalized control barrier function (GCBF) defined on the distance to constraint boundary. By using the model information, the policy can be optimized safely without violating actual safety constraints, and the sample efficiency is increased. The major difficulty of infeasibility in solving the constrained policy gradient is handled by an adaptive coefficient mechanism. We evaluate the proposed method in both simulations and real vehicle experiments in a complex autonomous driving collision avoidance task. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.

Viaarxiv icon

Model-based Safe Reinforcement Learning using Generalized Control Barrier Function

Mar 02, 2021
Haitong Ma, Jianyu Chen, Shengbo Eben Li, Ziyu Lin, Sifa Zheng

Figure 1 for Model-based Safe Reinforcement Learning using Generalized Control Barrier Function
Figure 2 for Model-based Safe Reinforcement Learning using Generalized Control Barrier Function
Figure 3 for Model-based Safe Reinforcement Learning using Generalized Control Barrier Function
Figure 4 for Model-based Safe Reinforcement Learning using Generalized Control Barrier Function

Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving. However, existing studies mostly use model-free constrained RL, which causes inevitable constraint violations. This paper proposes a model-based feasibility enhancement technique of constrained RL, which enhances the feasibility of policy using generalized control barrier function (GCBF) defined on the distance to constraint boundary. By using the model information, the policy can be optimized safely without violating actual safety constraints, and the sample efficiency is increased. The major difficulty of infeasibility in solving the constrained policy gradient is handled by an adaptive coefficient mechanism. We evaluate the proposed method in both simulations and real vehicle experiments in a complex autonomous driving collision avoidance task. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.

Viaarxiv icon

Recurrent Model Predictive Control

Feb 23, 2021
Zhengyu Liu, Jingliang Duan, Wenxuan Wang, Shengbo Eben Li, Yuming Yin, Ziyu Lin, Qi Sun, Bo Cheng

Figure 1 for Recurrent Model Predictive Control
Figure 2 for Recurrent Model Predictive Control
Figure 3 for Recurrent Model Predictive Control
Figure 4 for Recurrent Model Predictive Control

This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The number of prediction steps is equal to the number of recurrent cycles of the learned policy function. With an arbitrary initial policy function, the proposed RMPC algorithm can converge to the optimal policy by directly minimizing the designed loss function. We further prove the convergence and optimality of the RMPC algorithm thorough Bellman optimality principle, and demonstrate its generality and efficiency using two numerical examples.

* arXiv admin note: substantial text overlap with arXiv:2102.10289 
Viaarxiv icon