We demonstrate the first large-scale application of model-based generative adversarial imitation learning (MGAIL) to the task of dense urban self-driving. We augment standard MGAIL using a hierarchical model to enable generalization to arbitrary goal routes, and measure performance using a closed-loop evaluation framework with simulated interactive agents. We train policies from expert trajectories collected from real vehicles driving over 100,000 miles in San Francisco, and demonstrate a steerable policy that can navigate robustly even in a zero-shot setting, generalizing to synthetic scenarios with novel goals that never occurred in real-world driving. We also demonstrate the importance of mixing closed-loop MGAIL losses with open-loop behavior cloning losses, and show our best policy approaches the performance of the expert. We evaluate our imitative model in both average and challenging scenarios, and show how it can serve as a useful prior to plan successful trajectories.
Collecting realistic driving trajectories is crucial for training machine learning models that imitate human driving behavior. Most of today's autonomous driving datasets contain only a few trajectories per location and are recorded with test vehicles that are cautiously driven by trained drivers. In particular in interactive scenarios such as highway merges, the test driver's behavior significantly influences other vehicles. This influence prevents recording the whole traffic space of human driving behavior. In this work, we present a novel methodology to extract trajectories of traffic objects using infrastructure sensors. Infrastructure sensors allow us to record a lot of data for one location and take the test drivers out of the loop. We develop both a hardware setup consisting of a camera and a traffic surveillance radar and a trajectory extraction algorithm. Our vision pipeline accurately detects objects, fuses camera and radar detections and tracks them over time. We improve a state-of-the-art object tracker by combining the tracking in image coordinates with a Kalman filter in road coordinates. We show that our sensor fusion approach successfully combines the advantages of camera and radar detections and outperforms either single sensor. Finally, we also evaluate the accuracy of our trajectory extraction pipeline. For that, we equip our test vehicle with a differential GPS sensor and use it to collect ground truth trajectories. With this data we compute the measurement errors. While we use the mean error to de-bias the trajectories, the error standard deviation is in the magnitude of the ground truth data inaccuracy. Hence, the extracted trajectories are not only naturalistic but also highly accurate and prove the potential of using infrastructure sensors to extract real-world trajectories.