Recognizing places from an opposing viewpoint during a return trip is a common experience for human drivers. However, the analogous robotics capability, visual place recognition (VPR) with limited field of view cameras under 180 degree rotations, has proven to be challenging to achieve. To address this problem, this paper presents Same Place Opposing Trajectory (SPOT), a technique for opposing viewpoint VPR that relies exclusively on structure estimated through stereo visual odometry (VO). The method extends recent advances in lidar descriptors and utilizes a novel double (similar and opposing) distance matrix sequence matching method. We evaluate SPOT on a publicly available dataset with 6.7-7.6 km routes driven in similar and opposing directions under various lighting conditions. The proposed algorithm demonstrates remarkable improvement over the state-of-the-art, achieving up to 91.7% recall at 100% precision in opposing viewpoint cases, while requiring less storage than all baselines tested and running faster than all but one. Moreover, the proposed method assumes no a priori knowledge of whether the viewpoint is similar or opposing, and also demonstrates competitive performance in similar viewpoint cases.
Terrain classification is an important problem for mobile robots operating in extreme environments as it can aid downstream tasks such as autonomous navigation and planning. While RGB cameras are widely used for terrain identification, vision-based methods can suffer due to poor lighting conditions and occlusions. In this paper, we propose the novel use of Ground Penetrating Radar (GPR) for terrain characterization for mobile robot platforms. Our approach leverages machine learning for surface terrain classification from GPR data. We collect a new dataset consisting of four different terrain types, and present qualitative and quantitative results. Our results demonstrate that classification networks can learn terrain categories from GPR signals. Additionally, we integrate our GPR-based classification approach into a multimodal semantic mapping framework to demonstrate a practical use case of GPR for surface terrain classification on mobile robots. Overall, this work extends the usability of GPR sensors deployed on robots to enable terrain classification in addition to GPR's existing scientific use cases.
In the field of 3D object detection for autonomous driving, LiDAR-Camera (LC) fusion is the top-performing sensor configuration. Still, LiDAR is relatively high cost, which hinders adoption of this technology for consumer automobiles. Alternatively, camera and radar are commonly deployed on vehicles already on the road today, but performance of Camera-Radar (CR) fusion falls behind LC fusion. In this work, we propose Camera-Radar Knowledge Distillation (CRKD) to bridge the performance gap between LC and CR detectors with a novel cross-modality KD framework. We use the Bird's-Eye-View (BEV) representation as the shared feature space to enable effective knowledge distillation. To accommodate the unique cross-modality KD path, we propose four distillation losses to help the student learn crucial features from the teacher model. We present extensive evaluations on the nuScenes dataset to demonstrate the effectiveness of the proposed CRKD framework. The project page for CRKD is https://song-jingyu.github.io/CRKD.
We propose LiRaFusion to tackle LiDAR-radar fusion for 3D object detection to fill the performance gap of existing LiDAR-radar detectors. To improve the feature extraction capabilities from these two modalities, we design an early fusion module for joint voxel feature encoding, and a middle fusion module to adaptively fuse feature maps via a gated network. We perform extensive evaluation on nuScenes to demonstrate that LiRaFusion leverages the complementary information of LiDAR and radar effectively and achieves notable improvement over existing methods.
Autonomous underwater vehicles often perform surveys that capture multiple views of targets in order to provide more information for human operators or automatic target recognition algorithms. In this work, we address the problem of choosing the most informative views that minimize survey time while maximizing classifier accuracy. We introduce a novel active perception framework for multi-view adaptive surveying and reacquisition using side scan sonar imagery. Our framework addresses this challenge by using a graph formulation for the adaptive survey task. We then use Graph Neural Networks (GNNs) to both classify acquired sonar views and to choose the next best view based on the collected data. We evaluate our method using simulated surveys in a high-fidelity side scan sonar simulator. Our results demonstrate that our approach is able to surpass the state-of-the-art in classification accuracy and survey efficiency. This framework is a promising approach for more efficient autonomous missions involving side scan sonar, such as underwater exploration, marine archaeology, and environmental monitoring.
Open-source benchmark datasets have been a critical component for advancing machine learning for robot perception in terrestrial applications. Benchmark datasets enable the widespread development of state-of-the-art machine learning methods, which require large datasets for training, validation, and thorough comparison to competing approaches. Underwater environments impose several operational challenges that hinder efforts to collect large benchmark datasets for marine robot perception. Furthermore, a low abundance of targets of interest relative to the size of the search space leads to increased time and cost required to collect useful datasets for a specific task. As a result, there is limited availability of labeled benchmark datasets for underwater applications. We present the AI4Shipwrecks dataset, which consists of 24 distinct shipwreck sites totaling 286 high-resolution labeled side scan sonar images to advance the state-of-the-art in autonomous sonar image understanding. We leverage the unique abundance of targets in Thunder Bay National Marine Sanctuary in Lake Huron, MI, to collect and compile a sonar imagery benchmark dataset through surveys with an autonomous underwater vehicle (AUV). We consulted with expert marine archaeologists for the labeling of robotically gathered data. We then leverage this dataset to perform benchmark experiments for comparison of state-of-the-art supervised segmentation methods, and we present insights on opportunities and open challenges for the field. The dataset and benchmarking tools will be released as an open-source benchmark dataset to spur innovation in machine learning for Great Lakes and ocean exploration. The dataset and accompanying software are available at https://umfieldrobotics.github.io/ai4shipwrecks/.
Conventional cameras employed in autonomous vehicle (AV) systems support many perception tasks, but are challenged by low-light or high dynamic range scenes, adverse weather, and fast motion. Novel sensors, such as event and thermal cameras, offer capabilities with the potential to address these scenarios, but they remain to be fully exploited. This paper introduces the Novel Sensors for Autonomous Vehicle Perception (NSAVP) dataset to facilitate future research on this topic. The dataset was captured with a platform including stereo event, thermal, monochrome, and RGB cameras as well as a high precision navigation system providing ground truth poses. The data was collected by repeatedly driving two ~8 km routes and includes varied lighting conditions and opposing viewpoint perspectives. We provide benchmarking experiments on the task of place recognition to demonstrate challenges and opportunities for novel sensors to enhance critical AV perception tasks. To our knowledge, the NSAVP dataset is the first to include stereo thermal cameras together with stereo event and monochrome cameras. The dataset and supporting software suite is available at: https://umautobots.github.io/nsavp
Autonomous terrain classification is an important problem in planetary navigation, whether the goal is to identify scientific sites of interest or to traverse treacherous areas safely. Past Martian rovers have relied on human operators to manually identify a navigable path from transmitted imagery. Our goals on Mars in the next few decades will eventually require rovers that can autonomously move farther, faster, and through more dangerous landscapes--demonstrating a need for improved terrain classification for traversability. Autonomous navigation through extreme environments will enable the search for water on the Moon and Mars as well as preparations for human habitats. Advancements in machine learning techniques have demonstrated potential to improve terrain classification capabilities for ground vehicles on Earth. However, classification results for space applications are limited by the availability of training data suitable for supervised learning methods. This paper contributes an open source automatic data processing pipeline that uses camera geometry to co-locate Curiosity and Perseverance Mastcam image products with Mars overhead maps via ray projection over a terrain model. In future work, this automated data processing pipeline will be leveraged for development of machine learning methods for terrain classification.
In this paper, we address the problem of sim-to-real transfer for object segmentation when there is no access to real examples of an object of interest during training, i.e. zero-shot sim-to-real transfer for segmentation. We focus on the application of shipwreck segmentation in side scan sonar imagery. Our novel segmentation network, STARS, addresses this challenge by fusing a predicted deformation field and anomaly volume, allowing it to generalize better to real sonar images and achieve more effective zero-shot sim-to-real transfer for image segmentation. We evaluate the sim-to-real transfer capabilities of our method on a real, expert-labeled side scan sonar dataset of shipwrecks collected from field work surveys with an autonomous underwater vehicle (AUV). STARS is trained entirely in simulation and performs zero-shot shipwreck segmentation with no additional fine-tuning on real data. Our method provides a significant 20% increase in segmentation performance for the targeted shipwreck class compared to the best baseline.
This paper proposes LONER, the first real-time LiDAR SLAM algorithm that uses a neural implicit scene representation. Existing implicit mapping methods for LiDAR show promising results in large-scale reconstruction, but either require groundtruth poses or run slower than real-time. In contrast, LONER uses LiDAR data to train an MLP to estimate a dense map in real-time, while simultaneously estimating the trajectory of the sensor. To achieve real-time performance, this paper proposes a novel information-theoretic loss function that accounts for the fact that different regions of the map may be learned to varying degrees throughout online training. The proposed method is evaluated qualitatively and quantitatively on two open-source datasets. This evaluation illustrates that the proposed loss function converges faster and leads to more accurate geometry reconstruction than other loss functions used in depth-supervised neural implicit frameworks. Finally, this paper shows that LONER estimates trajectories competitively with state-of-the-art LiDAR SLAM methods, while also producing dense maps competitive with existing real-time implicit mapping methods that use groundtruth poses.