Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mao Shan

Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration

Feb 19, 2025

Katie Z Luo, Minh-Quan Dao, Zhenzhen Liu, Mark Campbell, Wei-Lun Chao, Kilian Q. Weinberger, Ezio Malis, Vincent Fremont, Bharath Hariharan, Mao Shan(+2 more)

Abstract:Vehicle-to-everything (V2X) collaborative perception has emerged as a promising solution to address the limitations of single-vehicle perception systems. However, existing V2X datasets are limited in scope, diversity, and quality. To address these gaps, we present Mixed Signals, a comprehensive V2X dataset featuring 45.1k point clouds and 240.6k bounding boxes collected from three connected autonomous vehicles (CAVs) equipped with two different types of LiDAR sensors, plus a roadside unit with dual LiDARs. Our dataset provides precisely aligned point clouds and bounding box annotations across 10 classes, ensuring reliable data for perception training. We provide detailed statistical analysis on the quality of our dataset and extensively benchmark existing V2X methods on it. Mixed Signals V2X Dataset is one of the highest quality, large-scale datasets publicly available for V2X perception research. Details on the website https://mixedsignalsdataset.cs.cornell.edu/.

Via

Access Paper or Ask Questions

Endangered Alert: A Field-Validated Self-Training Scheme for Detecting and Protecting Threatened Wildlife on Roads and Roadsides

Dec 16, 2024

Kunming Li, Mao Shan, Stephany Berrio Perez, Katie Luo, Stewart Worrall

Abstract:Traffic accidents are a global safety concern, resulting in numerous fatalities each year. A considerable number of these deaths are caused by animal-vehicle collisions (AVCs), which not only endanger human lives but also present serious risks to animal populations. This paper presents an innovative self-training methodology aimed at detecting rare animals, such as the cassowary in Australia, whose survival is threatened by road accidents. The proposed method addresses critical real-world challenges, including acquiring and labelling sensor data for rare animal species in resource-limited environments. It achieves this by leveraging cloud and edge computing, and automatic data labelling to improve the detection performance of the field-deployed model iteratively. Our approach introduces Label-Augmentation Non-Maximum Suppression (LA-NMS), which incorporates a vision-language model (VLM) to enable automated data labelling. During a five-month deployment, we confirmed the method's robustness and effectiveness, resulting in improved object detection accuracy and increased prediction confidence. The source code is available: https://github.com/acfr/CassDetect

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Nov 25, 2024

Mohamed Mourad Ouazghire, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Figure 1 for Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Figure 2 for Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Figure 3 for Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Figure 4 for Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Abstract:Autonomous vehicles are being tested in diverse environments worldwide. However, a notable gap exists in evaluating datasets representing natural, unstructured environments such as forests or gardens. To address this, we present a study on localisation at the Australian Botanic Garden Mount Annan. This area encompasses open grassy areas, paved pathways, and densely vegetated sections with trees and other objects. The dataset was recorded using a 128-beam LiDAR sensor and GPS and IMU readings to track the ego-vehicle. This paper evaluates the performance of two state-of-the-art LiDARinertial odometry frameworks, COIN-LIO and LIO-SAM, on this dataset. We analyse trajectory estimates in both horizontal and vertical dimensions and assess relative translation and yaw errors over varying distances. Our findings reveal that while both frameworks perform adequately in the vertical plane, COINLIO demonstrates superior accuracy in the horizontal plane, particularly over extended trajectories. In contrast, LIO-SAM shows increased drift and yaw errors over longer distances.

* Redacted with respect to the Australian Conference on Robotics and Automoation 2024's writing criteria. Style file: acra.sty, template: acra.tex, bibtex: named.bst 10 pages, 18 figures

Via

Access Paper or Ask Questions

Label-Efficient 3D Object Detection For Road-Side Units

Apr 09, 2024

Minh-Quan Dao, Holger Caesar, Julie Stephany Berrio, Mao Shan, Stewart Worrall, Vincent Frémont, Ezio Malis

Figure 1 for Label-Efficient 3D Object Detection For Road-Side Units

Figure 2 for Label-Efficient 3D Object Detection For Road-Side Units

Figure 3 for Label-Efficient 3D Object Detection For Road-Side Units

Figure 4 for Label-Efficient 3D Object Detection For Road-Side Units

Abstract:Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made, the data-hungry nature of these methods creates a major hurdle for their real-world deployment, particularly due to the need for annotated RSU data. Manually annotating the vast amount of RSU data required for training is prohibitively expensive, given the sheer number of intersections and the effort involved in annotating point clouds. We address this challenge by devising a label-efficient object detection method for RSU based on unsupervised object discovery. Our paper introduces two new modules: one for object discovery based on a spatial-temporal aggregation of point clouds, and another for refinement. Furthermore, we demonstrate that fine-tuning on a small portion of annotated data allows our object discovery models to narrow the performance gap with, or even surpass, fully supervised models. Extensive experiments are carried out in simulated and real-world datasets to evaluate our method.

* IV 2024

Via

Access Paper or Ask Questions

OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Mar 13, 2024

Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Figure 1 for OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Figure 2 for OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Figure 3 for OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Figure 4 for OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Abstract:This paper introduces OccFusion, a straightforward and efficient sensor fusion framework for predicting 3D occupancy. A comprehensive understanding of 3D scenes is crucial in autonomous driving, and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at https://github.com/DanielMing123/OCCFusion.

Via

Access Paper or Ask Questions

InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Jan 23, 2024

Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Figure 1 for InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Figure 2 for InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Figure 3 for InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Figure 4 for InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Abstract:This paper introduces InverseMatrixVT3D, an efficient method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Existing methods for constructing 3D volumes often rely on depth estimation, device-specific operators, or transformer queries, which hinders the widespread adoption of 3D occupancy models. In contrast, our approach leverages two projection matrices to store the static mapping relationships and matrix multiplications to efficiently generate global Bird's Eye View (BEV) features and local 3D feature volumes. Specifically, we achieve this by performing matrix multiplications between multi-view image feature maps and two sparse projection matrices. We introduce a sparse matrix handling technique for the projection matrices to optimise GPU memory usage. Moreover, a global-local attention fusion module is proposed to integrate the global BEV features with the local 3D feature volumes to obtain the final 3D volume. We also employ a multi-scale supervision mechanism to further enhance performance. Comprehensive experiments on the nuScenes dataset demonstrate the simplicity and effectiveness of our method. The code will be made available at:https://github.com/DanielMing123/InverseMatrixVT3D

Via

Access Paper or Ask Questions

Classification of Safety Driver Attention During Autonomous Vehicle Operation

Oct 17, 2023

Santiago Gerling Konrad, Julie Stephany Berrio, Mao Shan, Favio Masson, Stewart Worrall

Figure 1 for Classification of Safety Driver Attention During Autonomous Vehicle Operation

Figure 2 for Classification of Safety Driver Attention During Autonomous Vehicle Operation

Figure 3 for Classification of Safety Driver Attention During Autonomous Vehicle Operation

Figure 4 for Classification of Safety Driver Attention During Autonomous Vehicle Operation

Abstract:Despite the continual advances in Advanced Driver Assistance Systems (ADAS) and the development of high-level autonomous vehicles (AV), there is a general consensus that for the short to medium term, there is a requirement for a human supervisor to handle the edge cases that inevitably arise. Given this requirement, it is essential that the state of the vehicle operator is monitored to ensure they are contributing to the vehicle's safe operation. This paper introduces a dual-source approach integrating data from an infrared camera facing the vehicle operator and vehicle perception systems to produce a metric for driver alertness in order to promote and ensure safe operator behaviour. The infrared camera detects the driver's head, enabling the calculation of head orientation, which is relevant as the head typically moves according to the individual's focus of attention. By incorporating environmental data from the perception system, it becomes possible to determine whether the vehicle operator observes objects in the surroundings. Experiments were conducted using data collected in Sydney, Australia, simulating AV operations in an urban environment. Our results demonstrate that the proposed system effectively determines a metric for the attention levels of the vehicle operator, enabling interventions such as warnings or reducing autonomous functionality as appropriate. This comprehensive solution shows promise in contributing to ADAS and AVs' overall safety and efficiency in a real-world setting.

Via

Access Paper or Ask Questions

MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaption in 3D Object Detection

Aug 11, 2023

Darren Tsai, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Figure 1 for MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaption in 3D Object Detection

Figure 2 for MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaption in 3D Object Detection

Figure 3 for MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaption in 3D Object Detection

Figure 4 for MS3D++: Ensemble of Experts for Multi-Source Unsupervised Domain Adaption in 3D Object Detection

Abstract:Deploying 3D detectors in unfamiliar domains has been demonstrated to result in a drastic drop of up to 70-90% in detection rate due to variations in lidar, geographical region, or weather conditions from their original training dataset. This domain gap leads to missing detections for densely observed objects, misaligned confidence scores, and increased high-confidence false positives, rendering the detector highly unreliable. To address this, we introduce MS3D++, a self-training framework for multi-source unsupervised domain adaptation in 3D object detection. MS3D++ provides a straightforward approach to domain adaptation by generating high-quality pseudo-labels, enabling the adaptation of 3D detectors to a diverse range of lidar types, regardless of their density. Our approach effectively fuses predictions of an ensemble of multi-frame pre-trained detectors from different source domains to improve domain generalization. We subsequently refine the predictions temporally to ensure temporal consistency in box localization and object classification. Furthermore, we present an in-depth study into the performance and idiosyncrasies of various 3D detector components in a cross-domain context, providing valuable insights for improved cross-domain detector ensembling. Experimental results on Waymo, nuScenes and Lyft demonstrate that detectors trained with MS3D++ pseudo-labels achieve state-of-the-art performance, comparable to training with human-annotated labels in Bird's Eye View (BEV) evaluation for both low and high density lidar.

* Code is available at https://github.com/darrenjkt/MS3D

Via

Access Paper or Ask Questions

LightFormer: An End-to-End Model for Intersection Right-of-Way Recognition Using Traffic Light Signals and an Attention Mechanism

Jul 14, 2023

Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Eduardo Nebot, Stewart Worrall

Abstract:For smart vehicles driving through signalised intersections, it is crucial to determine whether the vehicle has right of way given the state of the traffic lights. To address this issue, camera based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left or turn right. This paper proposes a novel end to end intersection right of way recognition model called LightFormer to generate right of way status for available driving directions in complex urban intersections. The model includes a spatial temporal inner structure with an attention mechanism, which incorporates features from past image to contribute to the classification of the current frame right of way status. In addition, a modified, multi weight arcface loss is introduced to enhance the model classification performance. Finally, the proposed LightFormer is trained and tested on two public traffic light datasets with manually augmented labels to demonstrate its effectiveness.

Via

Access Paper or Ask Questions

Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Jul 09, 2023

Minh-Quan Dao, Julie Stephany Berrio, Vincent Frémont, Mao Shan, Elwan Héry, Stewart Worrall

Figure 1 for Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Figure 2 for Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Figure 3 for Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Figure 4 for Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection

Abstract:Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.

* Work in progress

Via

Access Paper or Ask Questions