Autonomous vehicles need to handle various traffic conditions and make safe and efficient decisions and maneuvers. However, on the one hand, a single optimization/sampling-based motion planner cannot efficiently generate safe trajectories in real time, particularly when there are many interactive vehicles near by. On the other hand, end-to-end learning methods cannot assure the safety of the outcomes. To address this challenge, we propose a hierarchical behavior planning framework with a set of low-level safe controllers and a high-level reinforcement learning algorithm (H-CtRL) as a coordinator for the low-level controllers. Safety is guaranteed by the low-level optimization/sampling-based controllers, while the high-level reinforcement learning algorithm makes H-CtRL an adaptive and efficient behavior planner. To train and test our proposed algorithm, we built a simulator that can reproduce traffic scenes using real-world datasets. The proposed H-CtRL is proved to be effective in various realistic simulation scenarios, with satisfying performance in terms of both safety and efficiency.
Autonomous vehicles (AVs) need to interact with other traffic participants who can be either cooperative or aggressive, attentive or inattentive. Such different characteristics can lead to quite different interactive behaviors. Hence, to achieve safe and efficient autonomous driving, AVs need to be aware of such uncertainties when they plan their own behaviors. In this paper, we formulate such a behavior planning problem as a partially observable Markov Decision Process (POMDP) where the cooperativeness of other traffic participants is treated as an unobservable state. Under different cooperativeness levels, we learn the human behavior models from real traffic data via the principle of maximum likelihood. Based on that, the POMDP problem is solved by Monte-Carlo Tree Search. We verify the proposed algorithm in both simulations and real traffic data on a lane change scenario, and the results show that the proposed algorithm can successfully finish the lane changes without collisions.
Safety is an important topic in autonomous driving since any collision may cause serious damage to people and the environment. Hamilton-Jacobi (HJ) Reachability is a formal method that verifies safety in multi-agent interaction and provides a safety controller for collision avoidance. However, due to the worst-case assumption on the car's future actions, reachability might result in too much conservatism such that the normal operation of the vehicle is largely hindered. In this paper, we leverage the power of trajectory prediction, and propose a prediction-based reachability framework for the safety controller. Instead of always assuming for the worst-case, we first cluster the car's behaviors into multiple driving modes, e.g. left turn or right turn. Under each mode, a reachability-based safety controller is designed based on a less conservative action set. For online purpose, we first utilize the trajectory prediction and our proposed mode classifier to predict the possible modes, and then deploy the corresponding safety controller. Through simulations in a T-intersection and an 8-way roundabout, we demonstrate that our prediction-based reachability method largely avoids collision between two interacting cars and reduces the conservatism that the safety controller brings to the car's original operations.
As more and more autonomous vehicles (AVs) are being deployed on public roads, designing socially compatible behaviors for them is of critical importance. Based on observations, AVs need to predict the future behaviors of other traffic participants, and be aware of the uncertainties associated with such prediction so that safe, efficient, and human-like motions can be generated. In this paper, we propose an integrated prediction and planning framework that allows the AVs to online infer the characteristics of other road users and generate behaviors optimizing not only their own rewards, but also their courtesy to others, as well as their confidence on the consequences in the presence of uncertainties. Based on the definitions of courtesy and confidence, we explore the influences of such factors on the behaviors of AVs in interactive driving scenarios. Moreover, we evaluate the proposed algorithm on naturalistic human driving data by comparing the generated behavior with the ground truth. Results show that the online inference can significantly improve the human-likeness of the generated behaviors. Furthermore, we find that human drivers show great courtesy to others, even for those without right-of-way.
Autonomous vehicles (AVs) need to share the road with multiple, heterogeneous road users in a variety of driving scenarios. It is overwhelming and unnecessary to carefully interact with all observed agents, and AVs need to determine whether and when to interact with each surrounding agent. In order to facilitate the design and testing of prediction and planning modules of AVs, in-depth understanding of interactive behavior is expected with proper representation, and events in behavior data need to be extracted and categorized automatically. Answers to what are the essential patterns of interactions are also crucial for these motivations in addition to answering whether and when. Thus, learning to extract interactive driving events and patterns from human data for tackling the whether-when-what tasks is of critical importance for AVs. There is, however, no clear definition and taxonomy of interactive behavior, and most of the existing works are based on either manual labelling or hand-crafted rules and features. In this paper, we propose the Interactive Driving event and pattern Extraction Network (IDE-Net), which is a deep learning framework to automatically extract interaction events and patterns directly from vehicle trajectories. In IDE-Net, we leverage the power of multi-task learning and proposed three auxiliary tasks to assist the pattern extraction in an unsupervised fashion. We also design a unique spatial-temporal block to encode the trajectory data. Experimental results on the INTERACTION dataset verified the effectiveness of such designs in terms of better generalizability and effective pattern extraction. We find three interpretable patterns of interactions, bringing insights for driver behavior representation, modeling and comprehension. Both objective and subjective evaluation metrics are adopted in our analysis of the learned patterns.
Classical game-theoretic approaches for multi-agent systems in both the forward policy learning/design problem and the inverse reward learning problem often make strong rationality assumptions: agents are perfectly rational expected utility maximizers. Specifically, the agents are risk-neutral to all uncertainties, maximize their expected rewards, and have unlimited computation resources to explore such policies. Such assumptions, however, substantially mismatch with many observed humans' behaviors such as satisficing with sub-optimal policies, risk-seeking and loss-aversion decisions. In this paper, we investigate the problem of bounded risk-sensitive Markov Game (BRSMG) and its inverse reward learning problem. Instead of assuming unlimited computation resources, we consider the influence of bounded intelligence by exploiting iterative reasoning models in BRSMG. Instead of assuming agents maximize their expected utilities (a risk-neutral measure), we consider the impact of risk-sensitive measures such as the cumulative prospect theory. Convergence analysis of BRSMG for both the forward policy learning and the inverse reward learning are established. The proposed forward policy learning and inverse reward learning algorithms in BRSMG are validated through a navigation scenario. Simulation results show that the behaviors of agents in BRSMG demonstrate both risk-averse and risk-seeking phenomena, which are consistent with observations from humans. Moreover, in the inverse reward learning task, the proposed bounded risk-sensitive inverse learning algorithm outperforms the baseline risk-neutral inverse learning algorithm.
In human-robot interaction (HRI) systems, such as autonomous vehicles, understanding and representing human behavior are important. Human behavior is naturally rich and diverse. Cost/reward learning, as an efficient way to learn and represent human behavior, has been successfully applied in many domains. Most of traditional inverse reinforcement learning (IRL) algorithms, however, cannot adequately capture the diversity of human behavior since they assume that all behavior in a given dataset is generated by a single cost function.In this paper, we propose a probabilistic IRL framework that directly learns a distribution of cost functions in continuous domain. Evaluations on both synthetic data and real human driving data are conducted. Both the quantitative and subjective results show that our proposed framework can better express diverse human driving behaviors, as well as extracting different driving styles that match what human participants interpret in our user study.
In human-robot interaction (HRI) systems, such as autonomous vehicles, understanding and representing human behavior are important. Human behavior is naturally rich and diverse. Cost/reward learning, as an efficient way to learn and represent human behavior, has been successfully applied in many domains. Most of traditional inverse reinforcement learning (IRL) algorithms, however, cannot adequately capture the diversity of human behavior since they assume that all behavior in a given dataset is generated by a single cost function.In this paper, we propose a probabilistic IRL framework that directly learns a distribution of cost functions in continuous domain. Evaluations on both synthetic data and real human driving data are conducted. Both the quantitative and subjective results show that our proposed framework can better express diverse human driving behaviors, as well as extracting different driving styles that match what human participants interpret in our user study.