Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wolfram Burgard

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

May 29, 2024

Nikhil Gosala, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Paulo Drews-Jr, Wolfram Burgard, Abhinav Valada

Figure 1 for LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Figure 2 for LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Figure 3 for LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Figure 4 for LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Abstract:Semantic Bird's Eye View (BEV) maps offer a rich representation with strong occlusion reasoning for various decision making tasks in autonomous driving. However, most BEV mapping approaches employ a fully supervised learning paradigm that relies on large amounts of human-annotated BEV ground truth data. In this work, we address this limitation by proposing the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner and then finetunes it for the task of semantic BEV mapping using only a small fraction of labels in the BEV. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene geometry while relying on a novel temporal masked autoencoder formulation to encode the scene representation. Extensive evaluations on the KITTI-360 and nuScenes datasets demonstrate that our approach performs on par with the existing state-of-the-art approaches while using only 1% of BEV labels and no additional labeled data.

* 23 pages, 5 figures

Via

Access Paper or Ask Questions

A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation

May 29, 2024

Niclas Vödisch, Kürsat Petek, Markus Käppeler, Abhinav Valada, Wolfram Burgard

Abstract:A key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data while achieving accurate predictions. This is essential not only to decrease operating costs but also to speed up deployment time. In this work, we address this challenge for PAnoptic SegmenTation with fEw Labels (PASTEL) by exploiting the groundwork paved by visual foundation models. We leverage descriptive image features from such a model to train two lightweight network heads for semantic segmentation and object boundary detection, using very few annotated training samples. We then merge their predictions via a novel fusion module that yields panoptic maps based on normalized cut. To further enhance the performance, we utilize self-training on unlabeled images selected by a feature-driven similarity scheme. We underline the relevance of our approach by employing PASTEL to important robot perception use cases from autonomous driving and agricultural robotics. In extensive experiments, we demonstrate that PASTEL significantly outperforms previous methods for label-efficient segmentation even when using fewer annotations. The code of our work is publicly available at http://pastel.cs.uni-freiburg.de.

Via

Access Paper or Ask Questions

Automatic Target-Less Camera-LiDAR Calibration From Motion and Deep Point Correspondences

Apr 26, 2024

Kürsat Petek, Niclas Vödisch, Johannes Meyer, Daniele Cattaneo, Abhinav Valada, Wolfram Burgard

Figure 1 for Automatic Target-Less Camera-LiDAR Calibration From Motion and Deep Point Correspondences

Figure 2 for Automatic Target-Less Camera-LiDAR Calibration From Motion and Deep Point Correspondences

Figure 3 for Automatic Target-Less Camera-LiDAR Calibration From Motion and Deep Point Correspondences

Figure 4 for Automatic Target-Less Camera-LiDAR Calibration From Motion and Deep Point Correspondences

Abstract:Sensor setups of robotic platforms commonly include both camera and LiDAR as they provide complementary information. However, fusing these two modalities typically requires a highly accurate calibration between them. In this paper, we propose MDPCalib which is a novel method for camera-LiDAR calibration that requires neither human supervision nor any specific target objects. Instead, we utilize sensor motion estimates from visual and LiDAR odometry as well as deep learning-based 2D-pixel-to-3D-point correspondences that are obtained without in-domain retraining. We represent the camera-LiDAR calibration as a graph optimization problem and minimize the costs induced by constraints from sensor motion and point correspondences. In extensive experiments, we demonstrate that our approach yields highly accurate extrinsic calibration parameters and is robust to random initialization. Additionally, our approach generalizes to a wide range of sensor setups, which we demonstrate by employing it on various robotic platforms including a self-driving perception car, a quadruped robot, and a UAV. To make our calibration method publicly accessible, we release the code on our project website at http://calibration.cs.uni-freiburg.de.

Via

Access Paper or Ask Questions

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Mar 26, 2024

Abdelrhman Werby, Chenguang Huang, Martin Büchner, Abhinav Valada, Wolfram Burgard

Figure 1 for Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Figure 2 for Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Figure 3 for Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Figure 4 for Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Abstract:Recent open-vocabulary robot mapping methods enrich dense geometric maps with pre-trained visual-language features. While these maps allow for the prediction of point-wise saliency maps when queried for a certain language concept, large-scale environments and abstract queries beyond the object level still pose a considerable hurdle, ultimately limiting language-grounded robotic navigation. In this work, we present HOV-SG, a hierarchical open-vocabulary 3D scene graph mapping approach for language-grounded robot navigation. Leveraging open-vocabulary vision foundation models, we first obtain state-of-the-art open-vocabulary segment-level maps in 3D and subsequently construct a 3D scene graph hierarchy consisting of floor, room, and object concepts, each enriched with open-vocabulary features. Our approach is able to represent multi-story buildings and allows robotic traversal of those using a cross-floor Voronoi graph. HOV-SG is evaluated on three distinct datasets and surpasses previous baselines in open-vocabulary semantic accuracy on the object, room, and floor level while producing a 75% reduction in representation size compared to dense open-vocabulary maps. In order to prove the efficacy and generalization capabilities of HOV-SG, we showcase successful long-horizon language-conditioned robot navigation within real-world multi-storage environments. We provide code and trial video data at http://hovsg.github.io/.

* Code and video are available at http://hovsg.github.io/

Via

Access Paper or Ask Questions

Bayesian Optimization for Sample-Efficient Policy Improvement in Robotic Manipulation

Mar 21, 2024

Adrian Röfer, Iman Nematollahi, Tim Welschehold, Wolfram Burgard, Abhinav Valada

Abstract:Sample efficient learning of manipulation skills poses a major challenge in robotics. While recent approaches demonstrate impressive advances in the type of task that can be addressed and the sensing modalities that can be incorporated, they still require large amounts of training data. Especially with regard to learning actions on robots in the real world, this poses a major problem due to the high costs associated with both demonstrations and real-world robot interactions. To address this challenge, we introduce BOpt-GMM, a hybrid approach that combines imitation learning with own experience collection. We first learn a skill model as a dynamical system encoded in a Gaussian Mixture Model from a few demonstrations. We then improve this model with Bayesian optimization building on a small number of autonomous skill executions in a sparse reward setting. We demonstrate the sample efficiency of our approach on multiple complex manipulation skills in both simulations and real-world experiments. Furthermore, we make the code and pre-trained models publicly available at http://bopt-gmm. cs.uni-freiburg.de.

* 7 pages, 5 figures, 2 tables, submitted to IROS2024

Via

Access Paper or Ask Questions

Single-Agent Actor Critic for Decentralized Cooperative Driving

Mar 18, 2024

Shengchao Yan, Lukas König, Wolfram Burgard

Figure 1 for Single-Agent Actor Critic for Decentralized Cooperative Driving

Figure 2 for Single-Agent Actor Critic for Decentralized Cooperative Driving

Figure 3 for Single-Agent Actor Critic for Decentralized Cooperative Driving

Figure 4 for Single-Agent Actor Critic for Decentralized Cooperative Driving

Abstract:Active traffic management incorporating autonomous vehicles (AVs) promises a future with diminished congestion and enhanced traffic flow. However, developing algorithms for real-world application requires addressing the challenges posed by continuous traffic flow and partial observability. To bridge this gap and advance the field of active traffic management towards greater decentralization, we introduce a novel asymmetric actor-critic model aimed at learning decentralized cooperative driving policies for autonomous vehicles using single-agent reinforcement learning. Our approach employs attention neural networks with masking to handle the dynamic nature of real-world traffic flow and partial observability. Through extensive evaluations against baseline controllers across various traffic scenarios, our model shows great potential for improving traffic flow at diverse bottleneck locations within the road system. Additionally, we explore the challenge associated with the conservative driving behaviors of autonomous vehicles that adhere strictly to traffic regulations. The experiment results illustrate that our proposed cooperative policy can mitigate potential traffic slowdowns without compromising safety.

Via

Access Paper or Ask Questions

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Mar 18, 2024

Jonas Schramm, Niclas Vödisch, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Wolfram Burgard, Abhinav Valada

Figure 1 for BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Figure 2 for BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Figure 3 for BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Figure 4 for BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Abstract:Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at http://bevcar.cs.uni-freiburg.de.

Via

Access Paper or Ask Questions

Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision

Feb 12, 2024

Georg K. J. Fischer, Max Bergau, D. Adriana Gómez-Rosal, Andreas Wachaja, Johannes Gräter, Matthias Odenweller, Uwe Piechottka, Fabian Hoeflinger, Nikhil Gosala, Niklas Wetzel(+3 more)

Figure 1 for Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision

Figure 2 for Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision

Figure 3 for Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision

Figure 4 for Evaluation of a Smart Mobile Robotic System for Industrial Plant Inspection and Supervision

Abstract:Automated and autonomous industrial inspection is a longstanding research field, driven by the necessity to enhance safety and efficiency within industrial settings. In addressing this need, we introduce an autonomously navigating robotic system designed for comprehensive plant inspection. This innovative system comprises a robotic platform equipped with a diverse array of sensors integrated to facilitate the detection of various process and infrastructure parameters. These sensors encompass optical (LiDAR, Stereo, UV/IR/RGB cameras), olfactory (electronic nose), and acoustic (microphone array) capabilities, enabling the identification of factors such as methane leaks, flow rates, and infrastructural anomalies. The proposed system underwent individual evaluation at a wastewater treatment site within a chemical plant, providing a practical and challenging environment for testing. The evaluation process encompassed key aspects such as object detection, 3D localization, and path planning. Furthermore, specific evaluations were conducted for optical methane leak detection and localization, as well as acoustic assessments focusing on pump equipment and gas leak localization.

* Submitted for publication in IEEE Sensors Journal

Via

Access Paper or Ask Questions

uPLAM: Robust Panoptic Localization and Mapping Leveraging Perception Uncertainties

Feb 08, 2024

Kshitij Sirohi, Daniel Büscher, Wolfram Burgard

Figure 1 for uPLAM: Robust Panoptic Localization and Mapping Leveraging Perception Uncertainties

Figure 2 for uPLAM: Robust Panoptic Localization and Mapping Leveraging Perception Uncertainties

Figure 3 for uPLAM: Robust Panoptic Localization and Mapping Leveraging Perception Uncertainties

Figure 4 for uPLAM: Robust Panoptic Localization and Mapping Leveraging Perception Uncertainties

Abstract:The availability of a reliable map and a robust localization system is critical for the operation of an autonomous vehicle. In a modern system, both mapping and localization solutions generally employ convolutional neural network (CNN) --based perception. Hence, any algorithm should consider potential errors in perception for safe and robust functioning. In this work, we present uncertainty-aware panoptic Localization and Mapping (uPLAM), which employs perception uncertainty as a bridge to fuse the perception information with classical localization and mapping approaches. We introduce an uncertainty-based map aggregation technique to create a long-term panoptic bird's eye view map and provide an associated mapping uncertainty. Our map consists of surface semantics and landmarks with unique IDs. Moreover, we present panoptic uncertainty-aware particle filter-based localization. To this end, we propose an uncertainty-based particle importance weight calculation for the adaptive incorporation of perception information into localization. We also present a new dataset for evaluating long-term panoptic mapping and map-based localization. Extensive evaluations showcase that our proposed uncertainty incorporation leads to better mapping with reliable uncertainty estimates and accurate localization. We make our dataset and code available at: \url{http://uplam.cs.uni-freiburg.de}

Via

Access Paper or Ask Questions

CenterGrasp: Object-Aware Implicit Representation Learning for Simultaneous Shape Reconstruction and 6-DoF Grasp Estimation

Dec 13, 2023

Eugenio Chisari, Nick Heppert, Tim Welschehold, Wolfram Burgard, Abhinav Valada

Abstract:Reliable object grasping is a crucial capability for autonomous robots. However, many existing grasping approaches focus on general clutter removal without explicitly modeling objects and thus only relying on the visible local geometry. We introduce CenterGrasp, a novel framework that combines object awareness and holistic grasping. CenterGrasp learns a general object prior by encoding shapes and valid grasps in a continuous latent space. It consists of an RGB-D image encoder that leverages recent advances to detect objects and infer their pose and latent code, and a decoder to predict shape and grasps for each object in the scene. We perform extensive experiments on simulated as well as real-world cluttered scenes and demonstrate strong scene reconstruction and 6-DoF grasp-pose estimation performance. Compared to the state of the art, CenterGrasp achieves an improvement of 38.5 mm in shape reconstruction and 33 percentage points on average in grasp success. We make the code and trained models publicly available at http://centergrasp.cs.uni-freiburg.de.

* Submitted to RA-L. Video, code and models available at http://centergrasp.cs.uni-freiburg.de

Via

Access Paper or Ask Questions