Alert button
Picture for Boris Ivanovic

Boris Ivanovic

Alert button

Reinforcement Learning with Human Feedback for Realistic Traffic Simulation

Sep 01, 2023
Yulong Cao, Boris Ivanovic, Chaowei Xiao, Marco Pavone

In light of the challenges and costs of real-world testing, autonomous vehicle developers often rely on testing in simulation for the creation of reliable systems. A key element of effective simulation is the incorporation of realistic traffic models that align with human knowledge, an aspect that has proven challenging due to the need to balance realism and diversity. This works aims to address this by developing a framework that employs reinforcement learning with human preference (RLHF) to enhance the realism of existing traffic models. This study also identifies two main challenges: capturing the nuances of human preferences on realism and the unification of diverse traffic simulation models. To tackle these issues, we propose using human feedback for alignment and employ RLHF due to its sample efficiency. We also introduce the first dataset for realism alignment in traffic modeling to support such research. Our framework, named TrafficRLHF, demonstrates its proficiency in generating realistic traffic scenarios that are well-aligned with human preferences, as corroborated by comprehensive evaluations on the nuScenes dataset.

* 9 pages, 4 figures 
Viaarxiv icon

trajdata: A Unified Interface to Multiple Human Trajectory Datasets

Jul 26, 2023
Boris Ivanovic, Guanyu Song, Igor Gilitschenski, Marco Pavone

Figure 1 for trajdata: A Unified Interface to Multiple Human Trajectory Datasets
Figure 2 for trajdata: A Unified Interface to Multiple Human Trajectory Datasets
Figure 3 for trajdata: A Unified Interface to Multiple Human Trajectory Datasets
Figure 4 for trajdata: A Unified Interface to Multiple Human Trajectory Datasets

The field of trajectory forecasting has grown significantly in recent years, partially owing to the release of numerous large-scale, real-world human trajectory datasets for autonomous vehicles (AVs) and pedestrian motion tracking. While such datasets have been a boon for the community, they each use custom and unique data formats and APIs, making it cumbersome for researchers to train and evaluate methods across multiple datasets. To remedy this, we present trajdata: a unified interface to multiple human trajectory datasets. At its core, trajdata provides a simple, uniform, and efficient representation and API for trajectory and map data. As a demonstration of its capabilities, in this work we conduct a comprehensive empirical evaluation of existing trajectory datasets, providing users with a rich understanding of the data underpinning much of current pedestrian and AV motion forecasting research, and proposing suggestions for future datasets from these insights. trajdata is permissively licensed (Apache 2.0) and can be accessed online at https://github.com/NVlabs/trajdata

* 15 pages, 15 figures, 3 tables 
Viaarxiv icon

Language Conditioned Traffic Generation

Jul 16, 2023
Shuhan Tan, Boris Ivanovic, Xinshuo Weng, Marco Pavone, Philipp Kraehenbuehl

Simulation forms the backbone of modern self-driving development. Simulators help develop, test, and improve driving systems without putting humans, vehicles, or their environment at risk. However, simulators face a major challenge: They rely on realistic, scalable, yet interesting content. While recent advances in rendering and scene reconstruction make great strides in creating static scene assets, modeling their layout, dynamics, and behaviors remains challenging. In this work, we turn to language as a source of supervision for dynamic traffic scene generation. Our model, LCTGen, combines a large language model with a transformer-based decoder architecture that selects likely map locations from a dataset of maps, and produces an initial traffic distribution, as well as the dynamics of each vehicle. LCTGen outperforms prior work in both unconditional and conditional traffic scene generation in terms of realism and fidelity. Code and video will be available at https://ariostgx.github.io/lctgen.

* Technical Report. Website available at https://ariostgx.github.io/lctgen 
Viaarxiv icon

Language-Guided Traffic Simulation via Scene-Level Diffusion

Jun 10, 2023
Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, Baishakhi Ray

Figure 1 for Language-Guided Traffic Simulation via Scene-Level Diffusion
Figure 2 for Language-Guided Traffic Simulation via Scene-Level Diffusion
Figure 3 for Language-Guided Traffic Simulation via Scene-Level Diffusion
Figure 4 for Language-Guided Traffic Simulation via Scene-Level Diffusion

Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user's query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations.

Viaarxiv icon

Partial-View Object View Synthesis via Filtered Inversion

Apr 03, 2023
Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

Figure 1 for Partial-View Object View Synthesis via Filtered Inversion
Figure 2 for Partial-View Object View Synthesis via Filtered Inversion
Figure 3 for Partial-View Object View Synthesis via Filtered Inversion
Figure 4 for Partial-View Object View Synthesis via Filtered Inversion

We propose Filtering Inversion (FINV), a learning framework and optimization process that predicts a renderable 3D object representation from one or few partial views. FINV addresses the challenge of synthesizing novel views of objects from partial observations, spanning cases where the object is not entirely in view, is partially occluded, or is only observed from similar views. To achieve this, FINV learns shape priors by training a 3D generative model. At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds. Maintaining the set of latent codes, FINV filters and resamples them after receiving each new observation, akin to particle filtering. The generator is then finetuned for each latent code on the available views in order to adapt to novel objects. We show that FINV successfully synthesizes novel views of real-world objects (e.g., chairs, tables, and cars), even if the generative prior is trained only on synthetic objects. The ability to address the sim-to-real problem allows FINV to be used for object categories without real-world datasets. FINV achieves state-of-the-art performance on multiple real-world datasets, recovers object shape and texture from partial and sparse views, is robust to occlusion, and is able to incrementally improve its representation with more observations.

* project website: http://cs.stanford.edu/~sunfanyun/finv 
Viaarxiv icon

Tree-structured Policy Planning with Learned Behavior Models

Jan 27, 2023
Yuxiao Chen, Peter Karkus, Boris Ivanovic, Xinshuo Weng, Marco Pavone

Figure 1 for Tree-structured Policy Planning with Learned Behavior Models
Figure 2 for Tree-structured Policy Planning with Learned Behavior Models
Figure 3 for Tree-structured Policy Planning with Learned Behavior Models
Figure 4 for Tree-structured Policy Planning with Learned Behavior Models

Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete MDP through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.

Viaarxiv icon

DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles

Dec 13, 2022
Peter Karkus, Boris Ivanovic, Shie Mannor, Marco Pavone

Figure 1 for DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles
Figure 2 for DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles
Figure 3 for DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles
Figure 4 for DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles

Autonomous vehicle (AV) stacks are typically built in a modular fashion, with explicit components performing detection, tracking, prediction, planning, control, etc. While modularity improves reusability, interpretability, and generalizability, it also suffers from compounding errors, information bottlenecks, and integration challenges. To overcome these challenges, a prominent approach is to convert the AV stack into an end-to-end neural network and train it with data. While such approaches have achieved impressive results, they typically lack interpretability and reusability, and they eschew principled analytical components, such as planning and control, in favor of deep neural networks. To enable the joint optimization of AV stacks while retaining modularity, we present DiffStack, a differentiable and modular stack for prediction, planning, and control. Crucially, our model-based planning and control algorithms leverage recent advancements in differentiable optimization to produce gradients, enabling optimization of upstream components, such as prediction, via backpropagation through planning and control. Our results on the nuScenes dataset indicate that end-to-end training with DiffStack yields substantial improvements in open-loop and closed-loop planning metrics by, e.g., learning to make fewer prediction errors that would affect planning. Beyond these immediate benefits, DiffStack opens up new opportunities for fully data-driven yet modular and interpretable AV architectures. Project website: https://sites.google.com/view/diffstack

* CoRL 2022 camera ready 
Viaarxiv icon

Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models

Oct 26, 2022
Filippos Christianos, Peter Karkus, Boris Ivanovic, Stefano V. Albrecht, Marco Pavone

Figure 1 for Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models
Figure 2 for Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models
Figure 3 for Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models
Figure 4 for Planning with Occluded Traffic Agents using Bi-Level Variational Occlusion Models

Reasoning with occluded traffic agents is a significant open challenge for planning for autonomous vehicles. Recent deep learning models have shown impressive results for predicting occluded agents based on the behaviour of nearby visible agents; however, as we show in experiments, these models are difficult to integrate into downstream planning. To this end, we propose Bi-level Variational Occlusion Models (BiVO), a two-step generative model that first predicts likely locations of occluded agents, and then generates likely trajectories for the occluded agents. In contrast to existing methods, BiVO outputs a trajectory distribution which can then be sampled from and integrated into standard downstream planning. We evaluate the method in closed-loop replay simulation using the real-world nuScenes dataset. Our results suggest that BiVO can successfully learn to predict occluded agent trajectories, and these predictions lead to better subsequent motion plans in critical scenarios.

* 7 pages, 6 figures 
Viaarxiv icon

Robust and Controllable Object-Centric Learning through Energy-based Models

Oct 11, 2022
Ruixiang Zhang, Tong Che, Boris Ivanovic, Renhao Wang, Marco Pavone, Yoshua Bengio, Liam Paull

Figure 1 for Robust and Controllable Object-Centric Learning through Energy-based Models
Figure 2 for Robust and Controllable Object-Centric Learning through Energy-based Models
Figure 3 for Robust and Controllable Object-Centric Learning through Energy-based Models
Figure 4 for Robust and Controllable Object-Centric Learning through Energy-based Models

Humans are remarkably good at understanding and reasoning about complex visual scenes. The capability to decompose low-level observations into discrete objects allows us to build a grounded abstract representation and identify the compositional structure of the world. Accordingly, it is a crucial step for machine learning models to be capable of inferring objects and their properties from visual scenes without explicit supervision. However, existing works on object-centric representation learning either rely on tailor-made neural network modules or strong probabilistic assumptions in the underlying generative and inference processes. In this work, we present \ours, a conceptually simple and general approach to learning object-centric representations through an energy-based model. By forming a permutation-invariant energy function using vanilla attention blocks readily available in Transformers, we can infer object-centric latent variables via gradient-based MCMC methods where permutation equivariance is automatically guaranteed. We show that \ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations, leading to better segmentation accuracy and competitive downstream task performance. Further, empirical evaluations show that \ours's learned representations are robust against distribution shift. Finally, we demonstrate the effectiveness of \ours in systematic compositional generalization, by re-composing learned energy functions for novel scene generation and manipulation.

Viaarxiv icon

Expanding the Deployment Envelope of Behavior Prediction via Adaptive Meta-Learning

Sep 23, 2022
Boris Ivanovic, James Harrison, Marco Pavone

Figure 1 for Expanding the Deployment Envelope of Behavior Prediction via Adaptive Meta-Learning
Figure 2 for Expanding the Deployment Envelope of Behavior Prediction via Adaptive Meta-Learning
Figure 3 for Expanding the Deployment Envelope of Behavior Prediction via Adaptive Meta-Learning
Figure 4 for Expanding the Deployment Envelope of Behavior Prediction via Adaptive Meta-Learning

Learning-based behavior prediction methods are increasingly being deployed in real-world autonomous systems, e.g., in fleets of self-driving vehicles, which are beginning to commercially operate in major cities across the world. Despite their advancements, however, the vast majority of prediction systems are specialized to a set of well-explored geographic regions or operational design domains, complicating deployment to additional cities, countries, or continents. Towards this end, we present a novel method for efficiently adapting behavior prediction models to new environments. Our approach leverages recent advances in meta-learning, specifically Bayesian regression, to augment existing behavior prediction models with an adaptive layer that enables efficient domain transfer via offline fine-tuning, online adaptation, or both. Experiments across multiple real-world datasets demonstrate that our method can efficiently adapt to a variety of unseen environments.

* 12 pages, 13 figures, 2 tables 
Viaarxiv icon