Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiye Wang

DriveCode: Domain Specific Numerical Encoding for LLM-Based Autonomous Driving

Mar 01, 2026

Zhiye Wang, Yanbo Jiang, Rui Zhou, Bo Zhang, Fang Zhang, Zhenhua Xu, Yaqin Zhang, Jianqiang Wang

Abstract:Large language models (LLMs) have shown great promise for autonomous driving. However, discretizing numbers into tokens limits precise numerical reasoning, fails to reflect the positional significance of digits in the training objective, and makes it difficult to achieve both decoding efficiency and numerical precision. These limitations affect both the processing of sensor measurements and the generation of precise control commands, creating a fundamental barrier for deploying LLM-based autonomous driving systems. In this paper, we introduce DriveCode, a novel numerical encoding method that represents numbers as dedicated embeddings rather than discrete text tokens. DriveCode employs a number projector to map numbers into the language model's hidden space, enabling seamless integration with visual and textual features in a unified multimodal sequence. Evaluated on OmniDrive, DriveGPT4, and DriveGPT4-V2 datasets, DriveCode demonstrates superior performance in trajectory prediction and control signal generation, confirming its effectiveness for LLM-based autonomous driving systems.

* The project page is available at https://shiftwilliam.github.io/DriveCode

Via

Access Paper or Ask Questions

Two-level Data Augmentation for Calibrated Multi-view Detection

Oct 19, 2022

Martin Engilberge, Haixin Shi, Zhiye Wang, Pascal Fua

Figure 1 for Two-level Data Augmentation for Calibrated Multi-view Detection

Figure 2 for Two-level Data Augmentation for Calibrated Multi-view Detection

Figure 3 for Two-level Data Augmentation for Calibrated Multi-view Detection

Figure 4 for Two-level Data Augmentation for Calibrated Multi-view Detection

Abstract:Data augmentation has proven its usefulness to improve model generalization and performance. While it is commonly applied in computer vision application when it comes to multi-view systems, it is rarely used. Indeed geometric data augmentation can break the alignment among views. This is problematic since multi-view data tend to be scarce and it is expensive to annotate. In this work we propose to solve this issue by introducing a new multi-view data augmentation pipeline that preserves alignment among views. Additionally to traditional augmentation of the input image we also propose a second level of augmentation applied directly at the scene level. When combined with our simple multi-view detection model, our two-level augmentation pipeline outperforms all existing baselines by a significant margin on the two main multi-view multi-person detection datasets WILDTRACK and MultiviewX.

* Accepted at WACV 2023

Via

Access Paper or Ask Questions