3D lane detection is essential in autonomous driving as it extracts structural and traffic information from the road in three-dimensional space, aiding self-driving cars in logical, safe, and comfortable path planning and motion control. Given the cost of sensors and the advantages of visual data in color information, 3D lane detection based on monocular vision is an important research direction in the realm of autonomous driving, increasingly gaining attention in both industry and academia. Regrettably, recent advancements in visual perception seem inadequate for the development of fully reliable 3D lane detection algorithms, which also hampers the progress of vision-based fully autonomous vehicles. We believe that there is still considerable room for improvement in 3D lane detection algorithms for autonomous vehicles using visual sensors, and significant enhancements are needed. This review looks back and analyzes the current state of achievements in the field of 3D lane detection research. It covers all current monocular-based 3D lane detection processes, discusses the performance of these cutting-edge algorithms, analyzes the time complexity of various algorithms, and highlights the main achievements and limitations of ongoing research efforts. The survey also includes a comprehensive discussion of available 3D lane detection datasets and the challenges that researchers face but have not yet resolved. Finally, our work outlines future research directions and invites researchers and practitioners to join this exciting field.
A Colored point cloud, as a simple and efficient 3D representation, has many advantages in various fields, including robotic navigation and scene reconstruction. This representation is now commonly used in 3D reconstruction tasks relying on cameras and LiDARs. However, fusing data from these two types of sensors is poorly performed in many existing frameworks, leading to unsatisfactory mapping results, mainly due to inaccurate camera poses. This paper presents OmniColor, a novel and efficient algorithm to colorize point clouds using an independent 360-degree camera. Given a LiDAR-based point cloud and a sequence of panorama images with initial coarse camera poses, our objective is to jointly optimize the poses of all frames for mapping images onto geometric reconstructions. Our pipeline works in an off-the-shelf manner that does not require any feature extraction or matching process. Instead, we find optimal poses by directly maximizing the photometric consistency of LiDAR maps. In experiments, we show that our method can overcome the severe visual distortion of omnidirectional images and greatly benefit from the wide field of view (FOV) of 360-degree cameras to reconstruct various scenarios with accuracy and stability. The code will be released at https://github.com/liubonan123/OmniColor/.
Curb detection is an important function in intelligent driving and can be used to determine drivable areas of the road. However, curbs are difficult to detect due to the complex road environment. This paper introduces CurbNet, a novel framework for curb detection, leveraging point cloud segmentation. Addressing the dearth of comprehensive curb datasets and the absence of 3D annotations, we have developed the 3D-Curb dataset, encompassing 7,100 frames, which represents the largest and most categorically diverse collection of curb point clouds currently available. Recognizing that curbs are primarily characterized by height variations, our approach harnesses spatially-rich 3D point clouds for training. To tackle the challenges presented by the uneven distribution of curb features on the xy-plane and their reliance on z-axis high-frequency features, we introduce the multi-scale and channel attention (MSCA) module, a bespoke solution designed to optimize detection performance. Moreover, we propose an adaptive weighted loss function group, specifically formulated to counteract the imbalance in the distribution of curb point clouds relative to other categories. Our extensive experimentation on 2 major datasets has yielded results that surpass existing benchmarks set by leading curb detection and point cloud segmentation models. By integrating multi-clustering and curve fitting techniques in our post-processing stage, we have substantially reduced noise in curb detection, thereby enhancing precision to 0.8744. Notably, CurbNet has achieved an exceptional average metrics of over 0.95 at a tolerance of just 0.15m, thereby establishing a new benchmark. Furthermore, corroborative real-world experiments and dataset analyzes mutually validate each other, solidifying CurbNet's superior detection proficiency and its robust generalizability.
The ability to scene understanding in adverse visual conditions, e.g., nighttime, has sparked active research for RGB-Thermal (RGB-T) semantic segmentation. However, it is essentially hampered by two critical problems: 1) the day-night gap of RGB images is larger than that of thermal images, and 2) the class-wise performance of RGB images at night is not consistently higher or lower than that of thermal images. we propose the first test-time adaptation (TTA) framework, dubbed Night-TTA, to address the problems for nighttime RGBT semantic segmentation without access to the source (daytime) data during adaptation. Our method enjoys three key technical parts. Firstly, as one modality (e.g., RGB) suffers from a larger domain gap than that of the other (e.g., thermal), Imaging Heterogeneity Refinement (IHR) employs an interaction branch on the basis of RGB and thermal branches to prevent cross-modal discrepancy and performance degradation. Then, Class Aware Refinement (CAR) is introduced to obtain reliable ensemble logits based on pixel-level distribution aggregation of the three branches. In addition, we also design a specific learning scheme for our TTA framework, which enables the ensemble logits and three student logits to collaboratively learn to improve the quality of predictions during the testing phase of our Night TTA. Extensive experiments show that our method achieves state-of-the-art (SoTA) performance with a 13.07% boost in mIoU.