Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lu Deng

HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios

Jun 06, 2025

Daming Wang, Yuhao Song, Zijian He, Kangliang Chen, Xing Pan, Lu Deng, Weihao Gu

Abstract:We present HaoMo Vision-Language Model (HMVLM), an end-to-end driving framework that implements the slow branch of a cognitively inspired fast-slow architecture. A fast controller outputs low-level steering, throttle, and brake commands, while a slow planner-a large vision-language model-generates high-level intents such as "yield to pedestrian" or "merge after the truck" without compromising latency. HMVLM introduces three upgrades: (1) selective five-view prompting with an embedded 4s history of ego kinematics, (2) multi-stage chain-of-thought (CoT) prompting that enforces a Scene Understanding -> Driving Decision -> Trajectory Inference reasoning flow, and (3) spline-based trajectory post-processing that removes late-stage jitter and sharp turns. Trained on the Waymo Open Dataset, these upgrades enable HMVLM to achieve a Rater Feedback Score (RFS) of 7.7367, securing 2nd place in the 2025 Waymo Vision-based End-to-End (E2E) Driving Challenge and surpassing the public baseline by 2.77%.

* WOD Vision-based End-to-End Driving Challenge

Via

Access Paper or Ask Questions

CogAD: Cognitive-Hierarchy Guided End-to-End Autonomous Driving

May 27, 2025

Zhennan Wang, Jianing Teng, Canqun Xiang, Kangliang Chen, Xing Pan, Lu Deng, Weihao Gu

Abstract:While end-to-end autonomous driving has advanced significantly, prevailing methods remain fundamentally misaligned with human cognitive principles in both perception and planning. In this paper, we propose CogAD, a novel end-to-end autonomous driving model that emulates the hierarchical cognition mechanisms of human drivers. CogAD implements dual hierarchical mechanisms: global-to-local context processing for human-like perception and intent-conditioned multi-mode trajectory generation for cognitively-inspired planning. The proposed method demonstrates three principal advantages: comprehensive environmental understanding through hierarchical perception, robust planning exploration enabled by multi-level planning, and diverse yet reasonable multi-modal trajectory generation facilitated by dual-level uncertainty modeling. Extensive experiments on nuScenes and Bench2Drive demonstrate that CogAD achieves state-of-the-art performance in end-to-end planning, exhibiting particular superiority in long-tail scenarios and robust generalization to complex real-world driving conditions.

Via

Access Paper or Ask Questions

6DOF Pose Estimation of a 3D Rigid Object based on Edge-enhanced Point Pair Features

Sep 17, 2022

Chenyi Liu, Fei Chen, Lu Deng, Renjiao Yi, Lintao Zheng, Chenyang Zhu, Jia Wang, Kai Xu

Figure 1 for 6DOF Pose Estimation of a 3D Rigid Object based on Edge-enhanced Point Pair Features

Figure 2 for 6DOF Pose Estimation of a 3D Rigid Object based on Edge-enhanced Point Pair Features

Figure 3 for 6DOF Pose Estimation of a 3D Rigid Object based on Edge-enhanced Point Pair Features

Figure 4 for 6DOF Pose Estimation of a 3D Rigid Object based on Edge-enhanced Point Pair Features

Abstract:The point pair feature (PPF) is widely used for 6D pose estimation. In this paper, we propose an efficient 6D pose estimation method based on the PPF framework. We introduce a well-targeted down-sampling strategy that focuses more on edge area for efficient feature extraction of complex geometry. A pose hypothesis validation approach is proposed to resolve the symmetric ambiguity by calculating edge matching degree. We perform evaluations on two challenging datasets and one real-world collected dataset, demonstrating the superiority of our method on pose estimation of geometrically complex, occluded, symmetrical objects. We further validate our method by applying it to simulated punctures.

* 16 pages,20 figures

Via

Access Paper or Ask Questions

TriVoC: Efficient Voting-based Consensus Maximization for Robust Point Cloud Registration with Extreme Outlier Ratios

Nov 01, 2021

Lei Sun, Lu Deng

Figure 1 for TriVoC: Efficient Voting-based Consensus Maximization for Robust Point Cloud Registration with Extreme Outlier Ratios

Figure 2 for TriVoC: Efficient Voting-based Consensus Maximization for Robust Point Cloud Registration with Extreme Outlier Ratios

Figure 3 for TriVoC: Efficient Voting-based Consensus Maximization for Robust Point Cloud Registration with Extreme Outlier Ratios

Figure 4 for TriVoC: Efficient Voting-based Consensus Maximization for Robust Point Cloud Registration with Extreme Outlier Ratios

Abstract:Correspondence-based point cloud registration is a cornerstone in robotics perception and computer vision, which seeks to estimate the best rigid transformation aligning two point clouds from the putative correspondences. However, due to the limited robustness of 3D keypoint matching approaches, outliers, probably in large numbers, are prone to exist among the correspondences, which makes robust registration methods imperative. Unfortunately, existing robust methods have their own limitations (e.g. high computational cost or limited robustness) when facing high or extreme outlier ratios, probably unsuitable for practical use. In this paper, we present a novel, fast, deterministic and guaranteed robust solver, named TriVoC (Triple-layered Voting with Consensus maximization), for the robust registration problem. We decompose the selecting of the minimal 3-point sets into 3 consecutive layers, and in each layer we design an efficient voting and correspondence sorting framework on the basis of the pairwise equal-length constraint. In this manner, the 3-point sets can be selected independently from the reduced correspondence sets according to the sorted sequence, which can significantly lower the computational cost and meanwhile provide a strong guarantee to achieve the largest consensus set (as the final inlier set) as long as a probabilistic termination condition is fulfilled. Varied experiments show that our solver TriVoC is robust against up to 99% outliers, highly accurate, time-efficient even with extreme outlier ratios, and also practical for real-world applications, showing performance superior to other state-of-the-art competitors.

Via

Access Paper or Ask Questions