Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Masayoshi Tomizuka

Prediction-Based Reachability for Collision Avoidance in Autonomous Driving

Nov 24, 2020
Anjian Li, Liting Sun, Wei Zhan, Masayoshi Tomizuka, Mo Chen

Figure 1 for Prediction-Based Reachability for Collision Avoidance in Autonomous Driving

Figure 2 for Prediction-Based Reachability for Collision Avoidance in Autonomous Driving

Figure 3 for Prediction-Based Reachability for Collision Avoidance in Autonomous Driving

Figure 4 for Prediction-Based Reachability for Collision Avoidance in Autonomous Driving

Safety is an important topic in autonomous driving since any collision may cause serious damage to people and the environment. Hamilton-Jacobi (HJ) Reachability is a formal method that verifies safety in multi-agent interaction and provides a safety controller for collision avoidance. However, due to the worst-case assumption on the car's future actions, reachability might result in too much conservatism such that the normal operation of the vehicle is largely hindered. In this paper, we leverage the power of trajectory prediction, and propose a prediction-based reachability framework for the safety controller. Instead of always assuming for the worst-case, we first cluster the car's behaviors into multiple driving modes, e.g. left turn or right turn. Under each mode, a reachability-based safety controller is designed based on a less conservative action set. For online purpose, we first utilize the trajectory prediction and our proposed mode classifier to predict the possible modes, and then deploy the corresponding safety controller. Through simulations in a T-intersection and an 8-way roundabout, we demonstrate that our prediction-based reachability method largely avoids collision between two interacting cars and reduces the conservatism that the safety controller brings to the car's original operations.

Via

Access Paper or Ask Questions

COCOI: Contact-aware Online Context Inference for Generalizable Non-planar Pushing

Nov 23, 2020
Zhuo Xu, Wenhao Yu, Alexander Herzog, Wenlong Lu, Chuyuan Fu, Masayoshi Tomizuka, Yunfei Bai, C. Karen Liu, Daniel Ho

Figure 1 for COCOI: Contact-aware Online Context Inference for Generalizable Non-planar Pushing

Figure 2 for COCOI: Contact-aware Online Context Inference for Generalizable Non-planar Pushing

Figure 3 for COCOI: Contact-aware Online Context Inference for Generalizable Non-planar Pushing

Figure 4 for COCOI: Contact-aware Online Context Inference for Generalizable Non-planar Pushing

General contact-rich manipulation problems are long-standing challenges in robotics due to the difficulty of understanding complicated contact physics. Deep reinforcement learning (RL) has shown great potential in solving robot manipulation tasks. However, existing RL policies have limited adaptability to environments with diverse dynamics properties, which is pivotal in solving many contact-rich manipulation tasks. In this work, we propose Contact-aware Online COntext Inference (COCOI), a deep RL method that encodes a context embedding of dynamics properties online using contact-rich interactions. We study this method based on a novel and challenging non-planar pushing task, where the robot uses a monocular camera image and wrist force torque sensor reading to push an object to a goal location while keeping it upright. We run extensive experiments to demonstrate the capability of COCOI in a wide range of settings and dynamics properties in simulation, and also in a sim-to-real transfer scenario on a real robot (Video: https://youtu.be/nrmJYksh1Kc)

Via

Access Paper or Ask Questions

Learning Dense Rewards for Contact-Rich Manipulation Tasks

Nov 17, 2020
Zheng Wu, Wenzhao Lian, Vaibhav Unhelkar, Masayoshi Tomizuka, Stefan Schaal

Figure 1 for Learning Dense Rewards for Contact-Rich Manipulation Tasks

Figure 2 for Learning Dense Rewards for Contact-Rich Manipulation Tasks

Figure 3 for Learning Dense Rewards for Contact-Rich Manipulation Tasks

Figure 4 for Learning Dense Rewards for Contact-Rich Manipulation Tasks

Rewards play a crucial role in reinforcement learning. To arrive at the desired policy, the design of a suitable reward function often requires significant domain expertise as well as trial-and-error. Here, we aim to minimize the effort involved in designing reward functions for contact-rich manipulation tasks. In particular, we provide an approach capable of extracting dense reward functions algorithmically from robots' high-dimensional observations, such as images and tactile feedback. In contrast to state-of-the-art high-dimensional reward learning methodologies, our approach does not leverage adversarial training, and is thus less prone to the associated training instabilities. Instead, our approach learns rewards by estimating task progress in a self-supervised manner. We demonstrate the effectiveness and efficiency of our approach on two contact-rich manipulation tasks, namely, peg-in-hole and USB insertion. The experimental results indicate that the policies trained with the learned reward function achieves better performance and faster convergence compared to the baselines.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data

Nov 10, 2020
Letian Wang, Liting Sun, Masayoshi Tomizuka, Wei Zhan

Figure 1 for Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data

Figure 2 for Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data

Figure 3 for Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data

Figure 4 for Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data

As more and more autonomous vehicles (AVs) are being deployed on public roads, designing socially compatible behaviors for them is of critical importance. Based on observations, AVs need to predict the future behaviors of other traffic participants, and be aware of the uncertainties associated with such prediction so that safe, efficient, and human-like motions can be generated. In this paper, we propose an integrated prediction and planning framework that allows the AVs to online infer the characteristics of other road users and generate behaviors optimizing not only their own rewards, but also their courtesy to others, as well as their confidence on the consequences in the presence of uncertainties. Based on the definitions of courtesy and confidence, we explore the influences of such factors on the behaviors of AVs in interactive driving scenarios. Moreover, we evaluate the proposed algorithm on naturalistic human driving data by comparing the generated behavior with the ground truth. Results show that the online inference can significantly improve the human-likeness of the generated behaviors. Furthermore, we find that human drivers show great courtesy to others, even for those without right-of-way.

* Fix a bug in Figure 7 and remove some typos; 9 pages, 10 figures

Via

Access Paper or Ask Questions

IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data

Nov 04, 2020
Xiaosong Jia, Liting Sun, Masayoshi Tomizuka, Wei Zhan

Figure 1 for IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data

Figure 2 for IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data

Figure 3 for IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data

Figure 4 for IDE-Net: Interactive Driving Event and Pattern Extraction from Human Data

Autonomous vehicles (AVs) need to share the road with multiple, heterogeneous road users in a variety of driving scenarios. It is overwhelming and unnecessary to carefully interact with all observed agents, and AVs need to determine whether and when to interact with each surrounding agent. In order to facilitate the design and testing of prediction and planning modules of AVs, in-depth understanding of interactive behavior is expected with proper representation, and events in behavior data need to be extracted and categorized automatically. Answers to what are the essential patterns of interactions are also crucial for these motivations in addition to answering whether and when. Thus, learning to extract interactive driving events and patterns from human data for tackling the whether-when-what tasks is of critical importance for AVs. There is, however, no clear definition and taxonomy of interactive behavior, and most of the existing works are based on either manual labelling or hand-crafted rules and features. In this paper, we propose the Interactive Driving event and pattern Extraction Network (IDE-Net), which is a deep learning framework to automatically extract interaction events and patterns directly from vehicle trajectories. In IDE-Net, we leverage the power of multi-task learning and proposed three auxiliary tasks to assist the pattern extraction in an unsupervised fashion. We also design a unique spatial-temporal block to encode the trajectory data. Experimental results on the INTERACTION dataset verified the effectiveness of such designs in terms of better generalizability and effective pattern extraction. We find three interpretable patterns of interactions, bringing insights for driver behavior representation, modeling and comprehension. Both objective and subjective evaluation metrics are adopted in our analysis of the learned patterns.

Via

Access Paper or Ask Questions

Alternating Direction Method of Multipliers for Constrained Iterative LQR in Autonomous Driving

Nov 01, 2020
Jun Ma, Zilong Cheng, Xiaoxue Zhang, Masayoshi Tomizuka, Tong Heng Lee

Figure 1 for Alternating Direction Method of Multipliers for Constrained Iterative LQR in Autonomous Driving

Figure 2 for Alternating Direction Method of Multipliers for Constrained Iterative LQR in Autonomous Driving

Figure 3 for Alternating Direction Method of Multipliers for Constrained Iterative LQR in Autonomous Driving

Figure 4 for Alternating Direction Method of Multipliers for Constrained Iterative LQR in Autonomous Driving

In the context of autonomous driving, the iterative linear quadratic regulator (iLQR) is known to be an efficient approach to deal with the nonlinear vehicle models in motion planning problems. Particularly, the constrained iLQR algorithm has shown noteworthy advantageous outcomes of computation efficiency in achieving motion planning tasks under general constraints of different types. However, the constrained iLQR methodology requires a feasible trajectory at the first iteration as a prerequisite. Also, the methodology leaves open the possibility for incorporation of fast, efficient, and effective optimization methods (i.e., fast-solvers) to further speed up the optimization process such that the requirements of real-time implementation can be successfully fulfilled. In this paper, a well-defined and commonly-encountered motion planning problem is formulated under nonlinear vehicle dynamics and various constraints, and an alternating direction method of multipliers (ADMM) is developed to determine the optimal control actions. With this development, the approach is able to circumvent the feasibility requirement of the trajectory at the first iteration. An illustrative example of motion planning in autonomous vehicles is then investigated with different driving scenarios taken into consideration. As clearly observed from the simulation results, the significance of this work in terms of obstacle avoidance is demonstrated. Furthermore, a noteworthy achievement of high computation efficiency is attained; and as a result, real-time computation and implementation can be realized through this framework, and thus it provides additional safety to the on-road driving tasks.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Sep 05, 2020
Ran Tian, Liting Sun, Masayoshi Tomizuka

Figure 1 for Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Figure 2 for Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Figure 3 for Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Figure 4 for Bounded Risk-Sensitive Markov Game and Its Inverse Reward Learning Problem

Classical game-theoretic approaches for multi-agent systems in both the forward policy learning/design problem and the inverse reward learning problem often make strong rationality assumptions: agents are perfectly rational expected utility maximizers. Specifically, the agents are risk-neutral to all uncertainties, maximize their expected rewards, and have unlimited computation resources to explore such policies. Such assumptions, however, substantially mismatch with many observed humans' behaviors such as satisficing with sub-optimal policies, risk-seeking and loss-aversion decisions. In this paper, we investigate the problem of bounded risk-sensitive Markov Game (BRSMG) and its inverse reward learning problem. Instead of assuming unlimited computation resources, we consider the influence of bounded intelligence by exploiting iterative reasoning models in BRSMG. Instead of assuming agents maximize their expected utilities (a risk-neutral measure), we consider the impact of risk-sensitive measures such as the cumulative prospect theory. Convergence analysis of BRSMG for both the forward policy learning and the inverse reward learning are established. The proposed forward policy learning and inverse reward learning algorithms in BRSMG are validated through a navigation scenario. Simulation results show that the behaviors of agents in BRSMG demonstrate both risk-averse and risk-seeking phenomena, which are consistent with observations from humans. Moreover, in the inverse reward learning task, the proposed bounded risk-sensitive inverse learning algorithm outperforms the baseline risk-neutral inverse learning algorithm.

Via

Access Paper or Ask Questions

Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

Aug 21, 2020
Liting Sun, Zheng Wu, Hengbo Ma, Masayoshi Tomizuka

Figure 1 for Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

Figure 2 for Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

Figure 3 for Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

Figure 4 for Expressing Diverse Human Driving Behavior with Probabilistic Rewards and Online Inference

In human-robot interaction (HRI) systems, such as autonomous vehicles, understanding and representing human behavior are important. Human behavior is naturally rich and diverse. Cost/reward learning, as an efficient way to learn and represent human behavior, has been successfully applied in many domains. Most of traditional inverse reinforcement learning (IRL) algorithms, however, cannot adequately capture the diversity of human behavior since they assume that all behavior in a given dataset is generated by a single cost function.In this paper, we propose a probabilistic IRL framework that directly learns a distribution of cost functions in continuous domain. Evaluations on both synthetic data and real human driving data are conducted. Both the quantitative and subjective results show that our proposed framework can better express diverse human driving behaviors, as well as extracting different driving styles that match what human participants interpret in our user study.

* 7 pages, 9 figures, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Via

Access Paper or Ask Questions