Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stewart Worrall

Panoptic-CUDAL Technical Report: Rural Australia Point Cloud Dataset in Rainy Conditions

Mar 20, 2025

Tzu-Yun Tseng, Alexey Nekrasov, Malcolm Burdorf, Bastian Leibe, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Abstract:Existing autonomous driving datasets are predominantly oriented towards well-structured urban settings and favorable weather conditions, leaving the complexities of rural environments and adverse weather conditions largely unaddressed. Although some datasets encompass variations in weather and lighting, bad weather scenarios do not appear often. Rainfall can significantly impair sensor functionality, introducing noise and reflections in LiDAR and camera data and reducing the system's capabilities for reliable environmental perception and safe navigation. We introduce the Panoptic-CUDAL dataset, a novel dataset purpose-built for panoptic segmentation in rural areas subject to rain. By recording high-resolution LiDAR, camera, and pose data, Panoptic-CUDAL offers a diverse, information-rich dataset in a challenging scenario. We present analysis of the recorded data and provide baseline results for panoptic and semantic segmentation methods on LiDAR point clouds. The dataset can be found here: https://robotics.sydney.edu.au/our-research/intelligent-transportation-systems/

Via

Access Paper or Ask Questions

Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration

Feb 19, 2025

Katie Z Luo, Minh-Quan Dao, Zhenzhen Liu, Mark Campbell, Wei-Lun Chao, Kilian Q. Weinberger, Ezio Malis, Vincent Fremont, Bharath Hariharan, Mao Shan(+2 more)

Abstract:Vehicle-to-everything (V2X) collaborative perception has emerged as a promising solution to address the limitations of single-vehicle perception systems. However, existing V2X datasets are limited in scope, diversity, and quality. To address these gaps, we present Mixed Signals, a comprehensive V2X dataset featuring 45.1k point clouds and 240.6k bounding boxes collected from three connected autonomous vehicles (CAVs) equipped with two different types of LiDAR sensors, plus a roadside unit with dual LiDARs. Our dataset provides precisely aligned point clouds and bounding box annotations across 10 classes, ensuring reliable data for perception training. We provide detailed statistical analysis on the quality of our dataset and extensively benchmark existing V2X methods on it. Mixed Signals V2X Dataset is one of the highest quality, large-scale datasets publicly available for V2X perception research. Details on the website https://mixedsignalsdataset.cs.cornell.edu/.

Via

Access Paper or Ask Questions

Endangered Alert: A Field-Validated Self-Training Scheme for Detecting and Protecting Threatened Wildlife on Roads and Roadsides

Dec 16, 2024

Kunming Li, Mao Shan, Stephany Berrio Perez, Katie Luo, Stewart Worrall

Abstract:Traffic accidents are a global safety concern, resulting in numerous fatalities each year. A considerable number of these deaths are caused by animal-vehicle collisions (AVCs), which not only endanger human lives but also present serious risks to animal populations. This paper presents an innovative self-training methodology aimed at detecting rare animals, such as the cassowary in Australia, whose survival is threatened by road accidents. The proposed method addresses critical real-world challenges, including acquiring and labelling sensor data for rare animal species in resource-limited environments. It achieves this by leveraging cloud and edge computing, and automatic data labelling to improve the detection performance of the field-deployed model iteratively. Our approach introduces Label-Augmentation Non-Maximum Suppression (LA-NMS), which incorporates a vision-language model (VLM) to enable automated data labelling. During a five-month deployment, we confirmed the method's robustness and effectiveness, resulting in improved object detection accuracy and increased prediction confidence. The source code is available: https://github.com/acfr/CassDetect

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Robots in the Wild: Contextually-Adaptive Human-Robot Interactions in Urban Public Environments

Dec 10, 2024

Xinyan Yu, Yiyuan Wang, Tram Thi Minh Tran, Yi Zhao, Julie Stephany Berrio Perez, Marius Hoggenmuller, Justine Humphry, Lian Loke, Lynn Masuda, Callum Parker(+2 more)

Figure 1 for Robots in the Wild: Contextually-Adaptive Human-Robot Interactions in Urban Public Environments

Figure 2 for Robots in the Wild: Contextually-Adaptive Human-Robot Interactions in Urban Public Environments

Figure 3 for Robots in the Wild: Contextually-Adaptive Human-Robot Interactions in Urban Public Environments

Abstract:The increasing transition of human-robot interaction (HRI) context from controlled settings to dynamic, real-world public environments calls for enhanced adaptability in robotic systems. This can go beyond algorithmic navigation or traditional HRI strategies in structured settings, requiring the ability to navigate complex public urban systems containing multifaceted dynamics and various socio-technical needs. Therefore, our proposed workshop seeks to extend the boundaries of adaptive HRI research beyond predictable, semi-structured contexts and highlight opportunities for adaptable robot interactions in urban public environments. This half-day workshop aims to explore design opportunities and challenges in creating contextually-adaptive HRI within these spaces and establish a network of interested parties within the OzCHI research community. By fostering ongoing discussions, sharing of insights, and collaborations, we aim to catalyse future research that empowers robots to navigate the inherent uncertainties and complexities of real-world public interactions.

Via

Access Paper or Ask Questions

Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Nov 25, 2024

Mohamed Mourad Ouazghire, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Figure 1 for Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Figure 2 for Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Figure 3 for Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Figure 4 for Performance Assessment of Lidar Odometry Frameworks: A Case Study at the Australian Botanic Garden Mount Annan

Abstract:Autonomous vehicles are being tested in diverse environments worldwide. However, a notable gap exists in evaluating datasets representing natural, unstructured environments such as forests or gardens. To address this, we present a study on localisation at the Australian Botanic Garden Mount Annan. This area encompasses open grassy areas, paved pathways, and densely vegetated sections with trees and other objects. The dataset was recorded using a 128-beam LiDAR sensor and GPS and IMU readings to track the ego-vehicle. This paper evaluates the performance of two state-of-the-art LiDARinertial odometry frameworks, COIN-LIO and LIO-SAM, on this dataset. We analyse trajectory estimates in both horizontal and vertical dimensions and assess relative translation and yaw errors over varying distances. Our findings reveal that while both frameworks perform adequately in the vertical plane, COINLIO demonstrates superior accuracy in the horizontal plane, particularly over extended trajectories. In contrast, LIO-SAM shows increased drift and yaw errors over longer distances.

* Redacted with respect to the Australian Conference on Robotics and Automoation 2024's writing criteria. Style file: acra.sty, template: acra.tex, bibtex: named.bst 10 pages, 18 figures

Via

Access Paper or Ask Questions

Cross-cultural analysis of pedestrian group behaviour influence on crossing decisions in interactions with autonomous vehicles

Aug 06, 2024

Sergio Martín Serrano, Óscar Méndez Blanco, Stewart Worrall, Miguel Ángel Sotelo, David Fernández-Llorca

Abstract:Understanding cultural backgrounds is crucial for the seamless integration of autonomous driving into daily life as it ensures that systems are attuned to diverse societal norms and behaviours, enhancing acceptance and safety in varied cultural contexts. In this work, we investigate the impact of co-located pedestrians on crossing behaviour, considering cultural and situational factors. To accomplish this, a full-scale virtual reality (VR) environment was created in the CARLA simulator, enabling the identical experiment to be replicated in both Spain and Australia. Participants (N=30) attempted to cross the road at an urban crosswalk alongside other pedestrians exhibiting conservative to more daring behaviours, while an autonomous vehicle (AV) approached with different driving styles. For the analysis of interactions, we utilized questionnaires and direct measures of the moment when participants entered the lane. Our findings indicate that pedestrians tend to cross the same traffic gap together, even though reckless behaviour by the group reduces confidence and makes the situation perceived as more complex. Australian participants were willing to take fewer risks than Spanish participants, adopting more cautious behaviour when it was uncertain whether the AV would yield.

* Paper accepted at the 27th IEEE International Conference on Intelligent Transportation Systems (ITSC 2024)

Via

Access Paper or Ask Questions

Label-Efficient 3D Object Detection For Road-Side Units

Apr 09, 2024

Minh-Quan Dao, Holger Caesar, Julie Stephany Berrio, Mao Shan, Stewart Worrall, Vincent Frémont, Ezio Malis

Figure 1 for Label-Efficient 3D Object Detection For Road-Side Units

Figure 2 for Label-Efficient 3D Object Detection For Road-Side Units

Figure 3 for Label-Efficient 3D Object Detection For Road-Side Units

Figure 4 for Label-Efficient 3D Object Detection For Road-Side Units

Abstract:Occlusion presents a significant challenge for safety-critical applications such as autonomous driving. Collaborative perception has recently attracted a large research interest thanks to the ability to enhance the perception of autonomous vehicles via deep information fusion with intelligent roadside units (RSU), thus minimizing the impact of occlusion. While significant advancement has been made, the data-hungry nature of these methods creates a major hurdle for their real-world deployment, particularly due to the need for annotated RSU data. Manually annotating the vast amount of RSU data required for training is prohibitively expensive, given the sheer number of intersections and the effort involved in annotating point clouds. We address this challenge by devising a label-efficient object detection method for RSU based on unsupervised object discovery. Our paper introduces two new modules: one for object discovery based on a spatial-temporal aggregation of point clouds, and another for refinement. Furthermore, we demonstrate that fine-tuning on a small portion of annotated data allows our object discovery models to narrow the performance gap with, or even surpass, fully supervised models. Extensive experiments are carried out in simulated and real-world datasets to evaluate our method.

* IV 2024

Via

Access Paper or Ask Questions

OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Mar 13, 2024

Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Figure 1 for OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Figure 2 for OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Figure 3 for OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Figure 4 for OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction

Abstract:This paper introduces OccFusion, a straightforward and efficient sensor fusion framework for predicting 3D occupancy. A comprehensive understanding of 3D scenes is crucial in autonomous driving, and recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, existing methods for 3D occupancy prediction heavily rely on surround-view camera images, making them susceptible to changes in lighting and weather conditions. By integrating features from additional sensors, such as lidar and surround view radars, our framework enhances the accuracy and robustness of occupancy prediction, resulting in top-tier performance on the nuScenes benchmark. Furthermore, extensive experiments conducted on the nuScenes dataset, including challenging night and rainy scenarios, confirm the superior performance of our sensor fusion strategy across various perception ranges. The code for this framework will be made available at https://github.com/DanielMing123/OCCFusion.

Via

Access Paper or Ask Questions

InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Jan 23, 2024

Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall

Figure 1 for InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Figure 2 for InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Figure 3 for InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Figure 4 for InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction

Abstract:This paper introduces InverseMatrixVT3D, an efficient method for transforming multi-view image features into 3D feature volumes for 3D semantic occupancy prediction. Existing methods for constructing 3D volumes often rely on depth estimation, device-specific operators, or transformer queries, which hinders the widespread adoption of 3D occupancy models. In contrast, our approach leverages two projection matrices to store the static mapping relationships and matrix multiplications to efficiently generate global Bird's Eye View (BEV) features and local 3D feature volumes. Specifically, we achieve this by performing matrix multiplications between multi-view image feature maps and two sparse projection matrices. We introduce a sparse matrix handling technique for the projection matrices to optimise GPU memory usage. Moreover, a global-local attention fusion module is proposed to integrate the global BEV features with the local 3D feature volumes to obtain the final 3D volume. We also employ a multi-scale supervision mechanism to further enhance performance. Comprehensive experiments on the nuScenes dataset demonstrate the simplicity and effectiveness of our method. The code will be made available at:https://github.com/DanielMing123/InverseMatrixVT3D

Via

Access Paper or Ask Questions

Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Jan 10, 2024

Yu Liu, Yuexin Zhang, Kunming Li, Yongliang Qiao, Stewart Worrall, You-Fu Li, He Kong

Figure 1 for Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Figure 2 for Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Figure 3 for Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Figure 4 for Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Abstract:Predicting pedestrian motion trajectories is crucial for path planning and motion control of autonomous vehicles. Accurately forecasting crowd trajectories is challenging due to the uncertain nature of human motions in different environments. For training, recent deep learning-based prediction approaches mainly utilize information like trajectory history and interactions between pedestrians, among others. This can limit the prediction performance across various scenarios since the discrepancies between training datasets have not been properly incorporated. To overcome this limitation, this paper proposes a graph transformer structure to improve prediction performance, capturing the differences between the various sites and scenarios contained in the datasets. In particular, a self-attention mechanism and a domain adaption module have been designed to improve the generalization ability of the model. Moreover, an additional metric considering cross-dataset sequences is introduced for training and performance evaluation purposes. The proposed framework is validated and compared against existing methods using popular public datasets, i.e., ETH and UCY. Experimental results demonstrate the improved performance of our proposed scheme.

* This paper was accepted to and presented at the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC), September 2023

Via

Access Paper or Ask Questions