Alert button
Picture for Jun Luo

Jun Luo

Alert button

Learning robust driving policies without online exploration

Mar 15, 2021
Daniel Graves, Nhat M. Nguyen, Kimia Hassanzadeh, Jun Jin, Jun Luo

Figure 1 for Learning robust driving policies without online exploration
Figure 2 for Learning robust driving policies without online exploration
Figure 3 for Learning robust driving policies without online exploration
Figure 4 for Learning robust driving policies without online exploration

We propose a multi-time-scale predictive representation learning method to efficiently learn robust driving policies in an offline manner that generalize well to novel road geometries, and damaged and distracting lane conditions which are not covered in the offline training data. We show that our proposed representation learning method can be applied easily in an offline (batch) reinforcement learning setting demonstrating the ability to generalize well and efficiently under novel conditions compared to standard batch RL methods. Our proposed method utilizes training data collected entirely offline in the real-world which removes the need of intensive online explorations that impede applying deep reinforcement learning on real-world robot training. Various experiments were conducted in both simulator and real-world scenarios for the purpose of evaluation and analysis of our proposed claims.

* Accepted in ICRA 2021. Due to format limitations of ICRA, we include appendix of our detailed evaluation results in this full version. arXiv admin note: substantial text overlap with arXiv:2006.15110 
Viaarxiv icon

Open-set Intersection Intention Prediction for Autonomous Driving

Mar 09, 2021
Fei Li, Xiangxu Li, Jun Luo, Shiwei Fan, Hongbo Zhang

Figure 1 for Open-set Intersection Intention Prediction for Autonomous Driving
Figure 2 for Open-set Intersection Intention Prediction for Autonomous Driving
Figure 3 for Open-set Intersection Intention Prediction for Autonomous Driving
Figure 4 for Open-set Intersection Intention Prediction for Autonomous Driving

Intention prediction is a crucial task for Autonomous Driving (AD). Due to the variety of size and layout of intersections, it is challenging to predict intention of human driver at different intersections, especially unseen and irregular intersections. In this paper, we formulate the prediction of intention at intersections as an open-set prediction problem that requires context specific matching of the target vehicle state and the diverse intersection configurations that are in principle unbounded. We capture map-centric features that correspond to intersection structures under a spatial-temporal graph representation, and use two MAAMs (mutually auxiliary attention module) that cover respectively lane-level and exitlevel intentions to predict a target that best matches intersection elements in map-centric feature space. Under our model, attention scores estimate the probability distribution of the openset intentions that are contextually defined by the structure of the current intersection. The proposed model is trained and evaluated on simulated dataset. Furthermore, the model, trained on simulated dataset and without any fine tuning, is directly validated on in-house real-world dataset collected at 98 realworld intersections and exhibits satisfactory performance,demonstrating the practical viability of our approach.

* Accepted by ICRA, 2021 
Viaarxiv icon

Self-Supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map

Mar 01, 2021
Elmira Amirloo, Mohsen Rohani, Ershad Banijamali, Jun Luo, Pascal Poupart

Figure 1 for Self-Supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map
Figure 2 for Self-Supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map
Figure 3 for Self-Supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map
Figure 4 for Self-Supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map

While supervised learning is widely used for perception modules in conventional autonomous driving solutions, scalability is hindered by the huge amount of data labeling needed. In contrast, while end-to-end architectures do not require labeled data and are potentially more scalable, interpretability is sacrificed. We introduce a novel architecture that is trained in a fully self-supervised fashion for simultaneous multi-step prediction of space-time cost map and road dynamics. Our solution replaces the manually designed cost function for motion planning with a learned high dimensional cost map that is naturally interpretable and allows diverse contextual information to be integrated without manual data labeling. Experiments on real world driving data show that our solution leads to lower number of collisions and road violations in long planning horizons in comparison to baselines, demonstrating the feasibility of fully self-supervised prediction without sacrificing either scalability or interpretability.

Viaarxiv icon

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

Feb 16, 2021
Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, Matthew E. Taylor

Multiagent reinforcement learning (MARL) has achieved a remarkable amount of success in solving various types of video games. A cornerstone of this success is the auto-curriculum framework, which shapes the learning process by continually creating new challenging tasks for agents to adapt to, thereby facilitating the acquisition of new skills. In order to extend MARL methods to real-world domains outside of video games, we envision in this blue sky paper that maintaining a diversity-aware auto-curriculum is critical for successful MARL applications. Specifically, we argue that \emph{behavioural diversity} is a pivotal, yet under-explored, component for real-world multiagent learning systems, and that significant work remains in understanding how to design a diversity-aware auto-curriculum. We list four open challenges for auto-curriculum techniques, which we believe deserve more attention from this community. Towards validating our vision, we recommend modelling realistic interactive behaviours in autonomous driving as an important test bed, and recommend the SMARTS/ULTRA benchmark.

* AAMAS 2021 
Viaarxiv icon

CoachNet: An Adversarial Sampling Approach for Reinforcement Learning

Jan 07, 2021
Elmira Amirloo Abolfathi, Jun Luo, Peyman Yadmellat, Kasra Rezaee

Figure 1 for CoachNet: An Adversarial Sampling Approach for Reinforcement Learning
Figure 2 for CoachNet: An Adversarial Sampling Approach for Reinforcement Learning
Figure 3 for CoachNet: An Adversarial Sampling Approach for Reinforcement Learning
Figure 4 for CoachNet: An Adversarial Sampling Approach for Reinforcement Learning

Despite the recent successes of reinforcement learning in games and robotics, it is yet to become broadly practical. Sample efficiency and unreliable performance in rare but challenging scenarios are two of the major obstacles. Drawing inspiration from the effectiveness of deliberate practice for achieving expert-level human performance, we propose a new adversarial sampling approach guided by a failure predictor named "CoachNet". CoachNet is trained online along with the agent to predict the probability of failure. This probability is then used in a stochastic sampling process to guide the agent to more challenging episodes. This way, instead of wasting time on scenarios that the agent has already mastered, training is focused on the agent's "weak spots". We present the design of CoachNet, explain its underlying principles, and empirically demonstrate its effectiveness in improving sample efficiency and test-time robustness in common continuous control tasks.

* NeurIPS2019 Workshop on Safety and Robustness in Decision Making 
Viaarxiv icon

LISPR: An Options Framework for Policy Reuse with Reinforcement Learning

Dec 29, 2020
Daniel Graves, Jun Jin, Jun Luo

Figure 1 for LISPR: An Options Framework for Policy Reuse with Reinforcement Learning
Figure 2 for LISPR: An Options Framework for Policy Reuse with Reinforcement Learning
Figure 3 for LISPR: An Options Framework for Policy Reuse with Reinforcement Learning
Figure 4 for LISPR: An Options Framework for Policy Reuse with Reinforcement Learning

We propose a framework for transferring any existing policy from a potentially unknown source MDP to a target MDP. This framework (1) enables reuse in the target domain of any form of source policy, including classical controllers, heuristic policies, or deep neural network-based policies, (2) attains optimality under suitable theoretical conditions, and (3) guarantees improvement over the source policy in the target MDP. These are achieved by packaging the source policy as a black-box option in the target MDP and providing a theoretically grounded way to learn the option's initiation set through general value functions. Our approach facilitates the learning of new policies by (1) maximizing the target MDP reward with the help of the black-box option, and (2) returning the agent to states in the learned initiation set of the black-box option where it is already optimal. We show that these two variants are equivalent in performance under some conditions. Through a series of experiments in simulated environments, we demonstrate that our framework performs excellently in sparse reward problems given (sub-)optimal source policies and improves upon prior art in transfer methods such as continual learning and progressive networks, which lack our framework's desirable theoretical properties.

Viaarxiv icon

Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning

Dec 25, 2020
Ershad Banijamali, Mohsen Rohani, Elmira Amirloo, Jun Luo, Pascal Poupart

Figure 1 for Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning
Figure 2 for Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning
Figure 3 for Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning
Figure 4 for Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning

In autonomous driving (AD), accurately predicting changes in the environment can effectively improve safety and comfort. Due to complex interactions among traffic participants, however, it is very hard to achieve accurate prediction for a long horizon. To address this challenge, we propose prediction by anticipation, which views interaction in terms of a latent probabilistic generative process wherein some vehicles move partly in response to the anticipated motion of other vehicles. Under this view, consecutive data frames can be factorized into sequential samples from an action-conditional distribution that effectively generalizes to a wider range of actions and driving situations. Our proposed prediction model, variational Bayesian in nature, is trained to maximize the evidence lower bound (ELBO) of the log-likelihood of this conditional distribution. Evaluations of our approach with prominent AD datasets NGSIM I-80 and Argoverse show significant improvement over current state-of-the-art in both accuracy and generalization.

Viaarxiv icon

PePScenes: A Novel Dataset and Baseline for Pedestrian Action Prediction in 3D

Dec 14, 2020
Amir Rasouli, Tiffany Yau, Peter Lakner, Saber Malekmohammadi, Mohsen Rohani, Jun Luo

Figure 1 for PePScenes: A Novel Dataset and Baseline for Pedestrian Action Prediction in 3D
Figure 2 for PePScenes: A Novel Dataset and Baseline for Pedestrian Action Prediction in 3D

Predicting the behavior of road users, particularly pedestrians, is vital for safe motion planning in the context of autonomous driving systems. Traditionally, pedestrian behavior prediction has been realized in terms of forecasting future trajectories. However, recent evidence suggests that predicting higher-level actions, such as crossing the road, can help improve trajectory forecasting and planning tasks accordingly. There are a number of existing datasets that cater to the development of pedestrian action prediction algorithms, however, they lack certain characteristics, such as bird's eye view semantic map information, 3D locations of objects in the scene, etc., which are crucial in the autonomous driving context. To this end, we propose a new pedestrian action prediction dataset created by adding per-frame 2D/3D bounding box and behavioral annotations to the popular autonomous driving dataset, nuScenes. In addition, we propose a hybrid neural network architecture that incorporates various data modalities for predicting pedestrian crossing action. By evaluating our model on the newly proposed dataset, the contribution of different data modalities to the prediction task is revealed. The dataset is available at https://github.com/huawei-noah/PePScenes.

* 1 Figure, 2 Table. ML4AD at NeurIPS, 2020 
Viaarxiv icon