Alert button
Picture for Fei Miao

Fei Miao

Alert button

Safe and Robust Multi-Agent Reinforcement Learning for Connected Autonomous Vehicles under State Perturbations

Sep 20, 2023
Zhili Zhang, Yanchao Sun, Furong Huang, Fei Miao

Sensing and communication technologies have enhanced learning-based decision making methodologies for multi-agent systems such as connected autonomous vehicles (CAV). However, most existing safe reinforcement learning based methods assume accurate state information. It remains challenging to achieve safety requirement under state uncertainties for CAVs, considering the noisy sensor measurements and the vulnerability of communication channels. In this work, we propose a Robust Multi-Agent Proximal Policy Optimization with robust Safety Shield (SR-MAPPO) for CAVs in various driving scenarios. Both robust MARL algorithm and control barrier function (CBF)-based safety shield are used in our approach to cope with the perturbed or uncertain state inputs. The robust policy is trained with a worst-case Q function regularization module that pursues higher lower-bounded reward in the former, whereas the latter, i.e., the robust CBF safety shield accounts for CAVs' collision-free constraints in complicated driving scenarios with even perturbed vehicle state information. We validate the advantages of SR-MAPPO in robustness and safety and compare it with baselines under different driving and state perturbation scenarios in CARLA simulator. The SR-MAPPO policy is verified to maintain higher safety rates and efficiency (reward) when threatened by both state perturbations and unconnected vehicles' dangerous behaviors.

* 6 pages, 5 figures 
Viaarxiv icon

Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach

Jul 30, 2023
Sihong He, Shuo Han, Fei Miao

Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's prediction uncertainty makes it an urgent and challenging task to design an integrated vehicle balancing solution under supply and demand uncertainties. Despite the success of reinforcement learning-based E-AMoD balancing algorithms, state uncertainties under the EV supply or mobility demand remain unexplored. In this work, we design a multi-agent reinforcement learning (MARL)-based framework for EAVs balancing in E-AMoD systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties that may undermine the vehicle balancing solutions. We then propose a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy to balance both the supply-demand ratio and charging utilization rate across the whole city. Experiments show that our proposed robust method performs better compared with a non-robust MARL method that does not consider state uncertainties; it improves the reward, charging utilization fairness, and supply-demand fairness by 19.28%, 28.18%, and 3.97%, respectively. Compared with a robust optimization-based method, the proposed MARL algorithm can improve the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively.

* accepted to International Conference on Intelligent Robots and Systems (IROS2023) 
Viaarxiv icon

Robust Multi-Agent Reinforcement Learning with State Uncertainty

Jul 30, 2023
Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao

Figure 1 for Robust Multi-Agent Reinforcement Learning with State Uncertainty
Figure 2 for Robust Multi-Agent Reinforcement Learning with State Uncertainty
Figure 3 for Robust Multi-Agent Reinforcement Learning with State Uncertainty
Figure 4 for Robust Multi-Agent Reinforcement Learning with State Uncertainty

In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design. Motivated by this robustness issue and the lack of corresponding studies, we study the problem of MARL with state uncertainty in this work. We provide the first attempt to the theoretical and empirical analysis of this challenging problem. We first model the problem as a Markov Game with state perturbation adversaries (MG-SPA) by introducing a set of state perturbation adversaries into a Markov Game. We then introduce robust equilibrium (RE) as the solution concept of an MG-SPA. We conduct a fundamental analysis regarding MG-SPA such as giving conditions under which such a robust equilibrium exists. Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees. To handle high-dimensional state-action space, we design a robust multi-agent actor-critic (RMAAC) algorithm based on an analytical expression of the policy gradient derived in the paper. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function; our RMAAC algorithm outperforms several MARL and robust MARL methods in multiple multi-agent environments when state uncertainty is present. The source code is public on \url{https://github.com/sihongho/robust_marl_with_state_uncertainty}.

* 50 pages, Published in TMLR, Transactions on Machine Learning Research (06/2023) 
Viaarxiv icon

Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications

Jun 11, 2023
Jiangwei Wang, Shuo Yang, Ziyan An, Songyang Han, Zhili Zhang, Rahul Mangharam, Meiyi Ma, Fei Miao

Figure 1 for Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications
Figure 2 for Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications
Figure 3 for Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications
Figure 4 for Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications

There has been growing interest in deep reinforcement learning (DRL) algorithm design, and reward design is one key component of DRL. Among the various techniques, formal methods integrated with DRL have garnered considerable attention due to their expressiveness and ability to define the requirements for the states and actions of the agent. However, the literature of Signal Temporal Logic (STL) in guiding multi-agent reinforcement learning (MARL) reward design remains limited. In this paper, we propose a novel STL-guided multi-agent reinforcement learning algorithm. The STL specifications are designed to include both task specifications according to the objective of each agent and safety specifications, and the robustness values of the STL specifications are leveraged to generate rewards. We validate the advantages of our method through empirical studies. The experimental results demonstrate significant performance improvements compared to MARL without STL guidance, along with a remarkable increase in the overall safety rate of the multi-agent systems.

Viaarxiv icon

Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning

Apr 08, 2023
Shanglin Zhou, Mikhail A. Bragin, Lynn Pepin, Deniz Gurevin, Fei Miao, Caiwen Ding

Figure 1 for Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning
Figure 2 for Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning
Figure 3 for Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning
Figure 4 for Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning

Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline significantly increases the overall training time. In this paper, we develop a systematic weight-pruning optimization approach based on Surrogate Lagrangian relaxation, which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art MLP-Mixer, Swin Transformer, and VGG-16, ResNet-18, ResNet-50 and ResNet-110, MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, KITTI benchmark, and TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy $3\times$ faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges $2\times$ faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hard-pruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model's accuracy.

* arXiv admin note: text overlap with arXiv:2012.10079 
Viaarxiv icon

Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation

Mar 25, 2023
Sanbao Su, Songyang Han, Yiming Li, Zhili Zhang, Chen Feng, Caiwen Ding, Fei Miao

Figure 1 for Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation
Figure 2 for Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation
Figure 3 for Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation
Figure 4 for Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation

Object detection and multiple object tracking (MOT) are essential components of self-driving systems. Accurate detection and uncertainty quantification are both critical for onboard modules, such as perception, prediction, and planning, to improve the safety and robustness of autonomous vehicles. Collaborative object detection (COD) has been proposed to improve detection accuracy and reduce uncertainty by leveraging the viewpoints of multiple agents. However, little attention has been paid on how to leverage the uncertainty quantification from COD to enhance MOT performance. In this paper, as the first attempt, we design the uncertainty propagation framework to address this challenge, called MOT-CUP. Our framework first quantifies the uncertainty of COD through direct modeling and conformal prediction, and propogates this uncertainty information during the motion prediction and association steps. MOT-CUP is designed to work with different collaborative object detectors and baseline MOT algorithms. We evaluate MOT-CUP on V2X-Sim, a comprehensive collaborative perception dataset, and demonstrate a 2% improvement in accuracy and a 2.67X reduction in uncertainty compared to the baselines, e.g., SORT and ByteTrack. MOT-CUP demonstrates the importance of uncertainty quantification in both COD and MOT, and provides the first attempt to improve the accuracy and reduce the uncertainty in MOT based on COD through uncertainty propogation.

Viaarxiv icon

Privacy-preserving and Uncertainty-aware Federated Trajectory Prediction for Connected Autonomous Vehicles

Mar 08, 2023
Muzi Peng, Jiangwei Wang, Dongjin Song, Fei Miao, Lili Su

Figure 1 for Privacy-preserving and Uncertainty-aware Federated Trajectory Prediction for Connected Autonomous Vehicles
Figure 2 for Privacy-preserving and Uncertainty-aware Federated Trajectory Prediction for Connected Autonomous Vehicles
Figure 3 for Privacy-preserving and Uncertainty-aware Federated Trajectory Prediction for Connected Autonomous Vehicles
Figure 4 for Privacy-preserving and Uncertainty-aware Federated Trajectory Prediction for Connected Autonomous Vehicles

Deep learning is the method of choice for trajectory prediction for autonomous vehicles. Unfortunately, its data-hungry nature implicitly requires the availability of sufficiently rich and high-quality centralized datasets, which easily leads to privacy leakage. Besides, uncertainty-awareness becomes increasingly important for safety-crucial cyber physical systems whose prediction module heavily relies on machine learning tools. In this paper, we relax the data collection requirement and enhance uncertainty-awareness by using Federated Learning on Connected Autonomous Vehicles with an uncertainty-aware global objective. We name our algorithm as FLTP. We further introduce ALFLTP which boosts FLTP via using active learning techniques in adaptatively selecting participating clients. We consider both negative log-likelihood (NLL) and aleatoric uncertainty (AU) as client selection metrics. Experiments on Argoverse dataset show that FLTP significantly outperforms the model trained on local data. In addition, ALFLTP-AU converges faster in training regression loss and performs better in terms of NLL, minADE and MR than FLTP in most rounds, and has more stable round-wise performance than ALFLTP-NLL.

Viaarxiv icon

Shared Information-Based Safe And Efficient Behavior Planning For Connected Autonomous Vehicles

Feb 15, 2023
Songyang Han, Shanglin Zhou, Lynn Pepin, Jiangwei Wang, Caiwen Ding, Fei Miao

Figure 1 for Shared Information-Based Safe And Efficient Behavior Planning For Connected Autonomous Vehicles
Figure 2 for Shared Information-Based Safe And Efficient Behavior Planning For Connected Autonomous Vehicles
Figure 3 for Shared Information-Based Safe And Efficient Behavior Planning For Connected Autonomous Vehicles
Figure 4 for Shared Information-Based Safe And Efficient Behavior Planning For Connected Autonomous Vehicles

The recent advancements in wireless technology enable connected autonomous vehicles (CAVs) to gather data via vehicle-to-vehicle (V2V) communication, such as processed LIDAR and camera data from other vehicles. In this work, we design an integrated information sharing and safe multi-agent reinforcement learning (MARL) framework for CAVs, to take advantage of the extra information when making decisions to improve traffic efficiency and safety. We first use weight pruned convolutional neural networks (CNN) to process the raw image and point cloud LIDAR data locally at each autonomous vehicle, and share CNN-output data with neighboring CAVs. We then design a safe actor-critic algorithm that utilizes both a vehicle's local observation and the information received via V2V communication to explore an efficient behavior planning policy with safety guarantees. Using the CARLA simulator for experiments, we show that our approach improves the CAV system's efficiency in terms of average velocity and comfort under different CAV ratios and different traffic densities. We also show that our approach avoids the execution of unsafe actions and always maintains a safe distance from other vehicles. We construct an obstacle-at-corner scenario to show that the shared vision can help CAVs to observe obstacles earlier and take action to avoid traffic jams.

* This paper gets the Best Paper Award in the DCAA workshop of AAAI 2023 
Viaarxiv icon

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Dec 07, 2022
Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Fei Miao

Figure 1 for What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?
Figure 2 for What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?
Figure 3 for What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?
Figure 4 for What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Various types of Multi-Agent Reinforcement Learning (MARL) methods have been developed, assuming that agents' policies are based on true states. Recent works have improved the robustness of MARL under uncertainties from the reward, transition probability, or other partners' policies. However, in real-world multi-agent systems, state estimations may be perturbed by sensor measurement noise or even adversaries. Agents' policies trained with only true state information will deviate from optimal solutions when facing adversarial state perturbations during execution. MARL under adversarial state perturbations has limited study. Hence, in this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to study the fundamental properties of MARL under state uncertainties. We prove that the optimal agent policy and the robust Nash equilibrium do not always exist for an SAMG. Instead, we define the solution concept, robust agent policy, of the proposed SAMG under adversarial state perturbations, where agents want to maximize the worst-case expected state value. We then design a gradient descent ascent-based robust MARL algorithm to learn the robust policies for the MARL agents. Our experiments show that adversarial state perturbations decrease agents' rewards for several baselines from the existing literature, while our algorithm outperforms baselines with state perturbations and significantly improves the robustness of the MARL policies under state uncertainties.

Viaarxiv icon