Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zikang Zhou

RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning

Jan 21, 2025

Jiacheng Zuo, Haibo Hu, Zikang Zhou, Yufei Cui, Ziquan Liu, Jianping Wang, Nan Guan, Jin Wang, Chun Jason Xue

Figure 1 for RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning

Figure 2 for RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning

Figure 3 for RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning

Figure 4 for RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning

Abstract:In the pursuit of robust autonomous driving systems, models trained on real-world datasets often struggle to adapt to new environments, particularly when confronted with corner cases such as extreme weather conditions. Collecting these corner cases in the real world is non-trivial, which necessitates the use of simulators for validation. However,the high computational cost and the domain gap in data distribution have hindered the seamless transition between real and simulated driving scenarios. To tackle this challenge, we propose Retrieval-Augmented Learning for Autonomous Driving (RALAD), a novel framework designed to bridge the real-to-sim gap at a low cost. RALAD features three primary designs, including (1) domain adaptation via an enhanced Optimal Transport (OT) method that accounts for both individual and grouped image distances, (2) a simple and unified framework that can be applied to various models, and (3) efficient fine-tuning techniques that freeze the computationally expensive layers while maintaining robustness. Experimental results demonstrate that RALAD compensates for the performance degradation in simulated environments while maintaining accuracy in real-world scenarios across three different models. Taking Cross View as an example, the mIOU and mAP metrics in real-world scenarios remain stable before and after RALAD fine-tuning, while in simulated environments,the mIOU and mAP metrics are improved by 10.30% and 12.29%, respectively. Moreover, the re-training cost of our approach is reduced by approximately 88.1%. Our code is available at https://github.com/JiachengZuo/RALAD.git.

Via

Access Paper or Ask Questions

ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling

Nov 17, 2024

Zikang Zhou, Hengjian Zhou, Haibo Hu, Zihao Wen, Jianping Wang, Yung-Hui Li, Yu-Kai Huang

Figure 1 for ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling

Figure 2 for ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling

Figure 3 for ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling

Figure 4 for ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling

Abstract:Anticipating the multimodality of future events lays the foundation for safe autonomous driving. However, multimodal motion prediction for traffic agents has been clouded by the lack of multimodal ground truth. Existing works predominantly adopt the winner-take-all training strategy to tackle this challenge, yet still suffer from limited trajectory diversity and misaligned mode confidence. While some approaches address these limitations by generating excessive trajectory candidates, they necessitate a post-processing stage to identify the most representative modes, a process lacking universal principles and compromising trajectory accuracy. We are thus motivated to introduce ModeSeq, a new multimodal prediction paradigm that models modes as sequences. Unlike the common practice of decoding multiple plausible trajectories in one shot, ModeSeq requires motion decoders to infer the next mode step by step, thereby more explicitly capturing the correlation between modes and significantly enhancing the ability to reason about multimodality. Leveraging the inductive bias of sequential mode prediction, we also propose the Early-Match-Take-All (EMTA) training strategy to diversify the trajectories further. Without relying on dense mode prediction or rule-based trajectory selection, ModeSeq considerably improves the diversity of multimodal output while attaining satisfactory trajectory accuracy, resulting in balanced performance on motion prediction benchmarks. Moreover, ModeSeq naturally emerges with the capability of mode extrapolation, which supports forecasting more behavior modes when the future is highly uncertain.

Via

Access Paper or Ask Questions

BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

May 27, 2024

Zikang Zhou, Haibo Hu, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue

Figure 1 for BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Figure 2 for BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Figure 3 for BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Figure 4 for BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

Abstract:Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to low data utilization. To address these challenges, we propose Behavior Generative Pre-trained Transformers (BehaviorGPT), a decoder-only, autoregressive architecture designed to simulate the sequential motion of multiple agents. Crucially, our approach discards the traditional separation between "history" and "future," treating each time step as the "current" one, resulting in a simpler, more parameter- and data-efficient design that scales seamlessly with data and computation. Additionally, we introduce the Next-Patch Prediction Paradigm (NP3), which enables models to reason at the patch level of trajectories and capture long-range spatial-temporal interactions. BehaviorGPT ranks first across several metrics on the Waymo Sim Agents Benchmark, demonstrating its exceptional performance in multi-agent and agent-map interactions. We outperformed state-of-the-art models with a realism score of 0.741 and improved the minADE metric to 1.540, with an approximately 91.6% reduction in model parameters.

Via

Access Paper or Ask Questions

QCNeXt: A Next-Generation Framework For Joint Multi-Agent Trajectory Prediction

Jun 18, 2023

Zikang Zhou, Zihao Wen, Jianping Wang, Yung-Hui Li, Yu-Kai Huang

Abstract:Estimating the joint distribution of on-road agents' future trajectories is essential for autonomous driving. In this technical report, we propose a next-generation framework for joint multi-agent trajectory prediction called QCNeXt. First, we adopt the query-centric encoding paradigm for the task of joint multi-agent trajectory prediction. Powered by this encoding scheme, our scene encoder is equipped with permutation equivariance on the set elements, roto-translation invariance in the space dimension, and translation invariance in the time dimension. These invariance properties not only enable accurate multi-agent forecasting fundamentally but also empower the encoder with the capability of streaming processing. Second, we propose a multi-agent DETR-like decoder, which facilitates joint multi-agent trajectory prediction by modeling agents' interactions at future time steps. For the first time, we show that a joint prediction model can outperform marginal prediction models even on the marginal metrics, which opens up new research opportunities in trajectory prediction. Our approach ranks 1st on the Argoverse 2 multi-agent motion forecasting benchmark, winning the championship of the Argoverse Challenge at the CVPR 2023 Workshop on Autonomous Driving.

* Technical report for the 1st place solution of the Argoverse 2 Multi-Agent Motion Forecasting Competition at the CVPR 2023 Workshop on Autonomous Driving

Via

Access Paper or Ask Questions

Improving the Generalizability of Trajectory Prediction Models with Frenet-Based Domain Normalization

Jun 14, 2023

Luyao Ye, Zikang Zhou, Jianping Wang

Abstract:Predicting the future trajectories of nearby objects plays a pivotal role in Robotics and Automation such as autonomous driving. While learning-based trajectory prediction methods have achieved remarkable performance on public benchmarks, the generalization ability of these approaches remains questionable. The poor generalizability on unseen domains, a well-recognized defect of data-driven approaches, can potentially harm the real-world performance of trajectory prediction models. We are thus motivated to improve generalization ability of models instead of merely pursuing high accuracy on average. Due to the lack of benchmarks for quantifying the generalization ability of trajectory predictors, we first construct a new benchmark called argoverse-shift, where the data distributions of domains are significantly different. Using this benchmark for evaluation, we identify that the domain shift problem seriously hinders the generalization of trajectory predictors since state-of-the-art approaches suffer from severe performance degradation when facing those out-of-distribution scenes. To enhance the robustness of models against domain shift problem, we propose a plug-and-play strategy for domain normalization in trajectory prediction. Our strategy utilizes the Frenet coordinate frame for modeling and can effectively narrow the domain gap of different scenes caused by the variety of road geometry and topology. Experiments show that our strategy noticeably boosts the prediction performance of the state-of-the-art in domains that were previously unseen to the models, thereby improving the generalization ability of data-driven trajectory prediction methods.

* This paper was accepted by 2023 IEEE International Conference on Robotics and Automation (ICRA)

Via

Access Paper or Ask Questions