Abstract:Imitation learning (IL) is widely used for motion planning in autonomous driving due to its data efficiency and access to real-world driving data. For safe and robust real-world driving, IL-based planning requires capturing the complex driving contexts inherent in real-world data and enabling context-adaptive decision-making, rather than relying solely on expert trajectory imitation. In this paper, we propose CarPLAN, a novel IL-based motion planning framework that explicitly enhances driving context understanding and enables adaptive planning across diverse traffic scenarios. Our contributions are twofold: We introduce Displacement-Aware Predictive Encoding (DPE) to improve the model's spatial awareness by predicting future displacement vectors between the Autonomous Vehicle (AV) and surrounding scene elements. This allows the planner to account for relational spacing when generating trajectories. In addition to the standard imitation loss, we incorporate an augmented loss term that captures displacement prediction errors, ensuring planning decisions consider relative distances from other agents. To improve the model's ability to handle diverse driving contexts, we propose Context-Adaptive Multi-Expert Decoder (CMD), which leverages the Mixture of Experts (MoE) framework. CMD dynamically selects the most suitable expert decoders based on scene structure at each Transformer layer, enabling adaptive and context-aware planning in dynamic environments. We evaluate CarPLAN on the nuPlan benchmark and demonstrate state-of-the-art performance across all closed-loop simulation metrics. In particular, CarPLAN exhibits robust performance on challenging scenarios such as Test14-Hard, validating its effectiveness in complex driving conditions. Additional experiments on the Waymax benchmark further demonstrate its generalization capability across different benchmark settings.




Abstract:The autonomous car must recognize the driving environment quickly for safe driving. As the Light Detection And Range (LiDAR) sensor is widely used in the autonomous car, fast semantic segmentation of LiDAR point cloud, which is the point-wise classification of the point cloud within the sensor framerate, has attracted attention in recognition of the driving environment. Although the voxel and fusion-based semantic segmentation models are the state-of-the-art model in point cloud semantic segmentation recently, their real-time performance suffer from high computational load due to high voxel resolution. In this paper, we propose the fast voxel-based semantic segmentation model using Point Convolution and 3D Sparse Convolution (PCSCNet). The proposed model is designed to outperform at both high and low voxel resolution using point convolution-based feature extraction. Moreover, the proposed model accelerates the feature propagation using 3D sparse convolution after the feature extraction. The experimental results demonstrate that the proposed model outperforms the state-of-the-art real-time models in semantic segmentation of SemanticKITTI and nuScenes, and achieves the real-time performance in LiDAR point cloud inference.