Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhinav Valada

AmodalSynthDrive: A Synthetic Amodal Perception Dataset for Autonomous Driving

Sep 12, 2023

Ahmed Rida Sekkat, Rohit Mohan, Oliver Sawade, Elmar Matthes, Abhinav Valada

Abstract:Unlike humans, who can effortlessly estimate the entirety of objects even when partially occluded, modern computer vision algorithms still find this aspect extremely challenging. Leveraging this amodal perception for autonomous driving remains largely untapped due to the lack of suitable datasets. The curation of these datasets is primarily hindered by significant annotation costs and mitigating annotator subjectivity in accurately labeling occluded regions. To address these limitations, we introduce AmodalSynthDrive, a synthetic multi-task multi-modal amodal perception dataset. The dataset provides multi-view camera images, 3D bounding boxes, LiDAR data, and odometry for 150 driving sequences with over 1M object annotations in diverse traffic, weather, and lighting conditions. AmodalSynthDrive supports multiple amodal scene understanding tasks including the introduced amodal depth estimation for enhanced spatial understanding. We evaluate several baselines for each of these tasks to illustrate the challenges and set up public benchmarking servers. The dataset is available at http://amodalsynthdrive.cs.uni-freiburg.de.

Via

Access Paper or Ask Questions

A Smart Robotic System for Industrial Plant Supervision

Sep 01, 2023

D. Adriana Gómez-Rosal, Max Bergau, Georg K. J. Fischer, Andreas Wachaja, Johannes Gräter, Matthias Odenweller, Uwe Piechottka, Fabian Hoeflinger, Nikhil Gosala, Niklas Wetzel(+3 more)

Figure 1 for A Smart Robotic System for Industrial Plant Supervision

Figure 2 for A Smart Robotic System for Industrial Plant Supervision

Figure 3 for A Smart Robotic System for Industrial Plant Supervision

Abstract:In today's chemical plants, human field operators perform frequent integrity checks to guarantee high safety standards, and thus are possibly the first to encounter dangerous operating conditions. To alleviate their task, we present a system consisting of an autonomously navigating robot integrated with various sensors and intelligent data processing. It is able to detect methane leaks and estimate its flow rate, detect more general gas anomalies, recognize oil films, localize sound sources and detect failure cases, map the environment in 3D, and navigate autonomously, employing recognition and avoidance of dynamic obstacles. We evaluate our system at a wastewater facility in full working conditions. Our results demonstrate that the system is able to robustly navigate the plant and provide useful information about critical operating conditions.

* Final submission for IEEE Sensors 2023

Via

Access Paper or Ask Questions

Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities

Aug 06, 2023

Rohit Mohan, José Arce, Sassan Mokhtar, Daniele Cattaneo, Abhinav Valada

Figure 1 for Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities

Figure 2 for Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities

Figure 3 for Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities

Figure 4 for Syn-Mediverse: A Multimodal Synthetic Dataset for Intelligent Scene Understanding of Healthcare Facilities

Abstract:Safety and efficiency are paramount in healthcare facilities where the lives of patients are at stake. Despite the adoption of robots to assist medical staff in challenging tasks such as complex surgeries, human expertise is still indispensable. The next generation of autonomous healthcare robots hinges on their capacity to perceive and understand their complex and frenetic environments. While deep learning models are increasingly used for this purpose, they require extensive annotated training data which is impractical to obtain in real-world healthcare settings. To bridge this gap, we present Syn-Mediverse, the first hyper-realistic multimodal synthetic dataset of diverse healthcare facilities. Syn-Mediverse contains over \num{48000} images from a simulated industry-standard optical tracking camera and provides more than 1.5M annotations spanning five different scene understanding tasks including depth estimation, object detection, semantic segmentation, instance segmentation, and panoptic segmentation. We demonstrate the complexity of our dataset by evaluating the performance on a broad range of state-of-the-art baselines for each task. To further advance research on scene understanding of healthcare facilities, along with the public dataset we provide an online evaluation benchmark available at \url{http://syn-mediverse.cs.uni-freiburg.de}

Via

Access Paper or Ask Questions

Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

Jul 12, 2023

Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada

Abstract:Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real-world that demonstrate that HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations

May 31, 2023

Raphael Trumpp, Martin Büchner, Abhinav Valada, Marco Caccamo

Figure 1 for Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations

Figure 2 for Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations

Figure 3 for Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations

Figure 4 for Efficient Learning of Urban Driving Policies Using Bird's-Eye-View State Representations

Abstract:Autonomous driving involves complex decision-making in highly interactive environments, requiring thoughtful negotiation with other traffic participants. While reinforcement learning provides a way to learn such interaction behavior, efficient learning critically depends on scalable state representations. Contrary to imitation learning methods, high-dimensional state representations still constitute a major bottleneck for deep reinforcement learning methods in autonomous driving. In this paper, we study the challenges of constructing bird's-eye-view representations for autonomous driving and propose a recurrent learning architecture for long-horizon driving. Our PPO-based approach, called RecurrDriveNet, is demonstrated on a simulated autonomous driving task in CARLA, where it outperforms traditional frame-stacking methods while only requiring one million experiences for training. RecurrDriveNet causes less than one infraction per driven kilometer by interacting safely with other road users.

Via

Access Paper or Ask Questions

The Treachery of Images: Bayesian Scene Keypoints for Deep Policy Learning in Robotic Manipulation

May 08, 2023

Jan Ole von Hartz, Eugenio Chisari, Tim Welschehold, Wolfram Burgard, Joschka Boedecker, Abhinav Valada

Abstract:In policy learning for robotic manipulation, sample efficiency is of paramount importance. Thus, learning and extracting more compact representations from camera observations is a promising avenue. However, current methods often assume full observability of the scene and struggle with scale invariance. In many tasks and settings, this assumption does not hold as objects in the scene are often occluded or lie outside the field of view of the camera, rendering the camera observation ambiguous with regard to their location. To tackle this problem, we present BASK, a Bayesian approach to tracking scale-invariant keypoints over time. Our approach successfully resolves inherent ambiguities in images, enabling keypoint tracking on symmetrical objects and occluded and out-of-view objects. We employ our method to learn challenging multi-object robot manipulation tasks from wrist camera observations and demonstrate superior utility for policy learning compared to other representation learning techniques. Furthermore, we show outstanding robustness towards disturbances such as clutter, occlusions, and noisy depth measurements, as well as generalization to unseen objects both in simulation and real-world robotic experiments.

Via

Access Paper or Ask Questions

Self-Supervised Multi-Object Tracking From Consistency Across Timescales

Apr 25, 2023

Christopher Lang, Alexander Braun, Lars Schillingmann, Abhinav Valada

Abstract:Self-supervised multi-object trackers have the potential to leverage the vast amounts of raw data recorded worldwide. However, they still fall short in re-identification accuracy compared to their supervised counterparts. We hypothesize that this deficiency results from restricting self-supervised objectives to single frames or frame pairs. Such designs lack sufficient visual appearance variations during training to learn consistent re-identification features. Therefore, we propose a training objective that learns re-identification features over a sequence of frames by enforcing consistent association scores across short and long timescales. Extensive evaluations on the BDD100K and MOT17 benchmarks demonstrate that our learned ReID features significantly reduce ID switches compared to other self-supervised methods, setting the new state of the art for self-supervised multi-object tracking and even performing on par with supervised methods on the BDD100k benchmark.

* 8 pages, 3 figures, 5 tables

Via

Access Paper or Ask Questions

INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields

Mar 31, 2023

Julia Hindel, Nikhil Gosala, Kevin Bregler, Abhinav Valada

Abstract:Perception datasets for agriculture are limited both in quantity and diversity which hinders effective training of supervised learning approaches. Self-supervised learning techniques alleviate this problem, however, existing methods are not optimized for dense prediction tasks in agriculture domains which results in degraded performance. In this work, we address this limitation with our proposed Injected Noise Discriminator (INoD) which exploits principles of feature replacement and dataset discrimination for self-supervised representation learning. INoD interleaves feature maps from two disjoint datasets during their convolutional encoding and predicts the dataset affiliation of the resultant feature map as a pretext task. Our approach enables the network to learn unequivocal representations of objects seen in one dataset while observing them in conjunction with similar features from the disjoint dataset. This allows the network to reason about higher-level semantics of the entailed objects, thus improving its performance on various downstream tasks. Additionally, we introduce the novel Fraunhofer Potato 2022 dataset consisting of over 16,800 images for object detection in potato fields. Extensive evaluations of our proposed INoD pretraining strategy for the tasks of object detection, semantic segmentation, and instance segmentation on the Sugar Beets 2016 and our potato dataset demonstrate that it achieves state-of-the-art performance.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

Mar 28, 2023

Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Abhinav Valada, Thomas Kollar

Figure 1 for CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

Figure 2 for CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

Figure 3 for CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

Figure 4 for CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects

Abstract:We present CARTO, a novel approach for reconstructing multiple articulated objects from a single stereo RGB observation. We use implicit object-centric representations and learn a single geometry and articulation decoder for multiple object categories. Despite training on multiple categories, our decoder achieves a comparable reconstruction accuracy to methods that train bespoke decoders separately for each category. Combined with our stereo image encoder we infer the 3D shape, 6D pose, size, joint type, and the joint state of multiple unknown objects in a single forward pass. Our method achieves a 20.4% absolute improvement in mAP 3D IOU50 for novel instances when compared to a two-stage pipeline. Inference time is fast and can run on a NVIDIA TITAN XP GPU at 1 HZ for eight or less objects present. While only trained on simulated data, CARTO transfers to real-world object instances. Code and evaluation data is available at: http://carto.cs.uni-freiburg.de

* 20 pages, 11 figures, accepted at CVPR 2023

Via

Access Paper or Ask Questions

CoVIO: Online Continual Learning for Visual-Inertial Odometry

Mar 17, 2023

Niclas Vödisch, Daniele Cattaneo, Wolfram Burgard, Abhinav Valada

Abstract:Visual odometry is a fundamental task for many applications on mobile devices and robotic platforms. Since such applications are oftentimes not limited to predefined target domains and learning-based vision systems are known to generalize poorly to unseen environments, methods for continual adaptation during inference time are of significant interest. In this work, we introduce CoVIO for online continual learning of visual-inertial odometry. CoVIO effectively adapts to new domains while mitigating catastrophic forgetting by exploiting experience replay. In particular, we propose a novel sampling strategy to maximize image diversity in a fixed-size replay buffer that targets the limited storage capacity of embedded devices. We further provide an asynchronous version that decouples the odometry estimation from the network weight update step enabling continuous inference in real time. We extensively evaluate CoVIO on various real-world datasets demonstrating that it successfully adapts to new domains while outperforming previous methods. The code of our work is publicly available at http://continual-slam.cs.uni-freiburg.de.

Via

Access Paper or Ask Questions