Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matthias Rottmann

Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving

Jan 14, 2025

Nert Keser, Halil Ibrahim Orhan, Niki Amini-Naieni, Gesina Schwalbe, Alois Knoll, Matthias Rottmann

Abstract:Deep neural networks (DNNs) remain challenged by distribution shifts in complex open-world domains like automated driving (AD): Absolute robustness against yet unknown novel objects (semantic shift) or styles like lighting conditions (covariate shift) cannot be guaranteed. Hence, reliable operation-time monitors for identification of out-of-training-data-distribution (OOD) scenarios are imperative. Current approaches for OOD classification are untested for complex domains like AD, are limited in the kinds of shifts they detect, or even require supervision with OOD samples. To prepare for unanticipated shifts, we instead establish a framework around a principled, unsupervised, and model-agnostic method that unifies detection of all kinds of shifts: Find a full model of the training data's feature distribution, to then use its density at new points as in-distribution (ID) score. To implement this, we propose to combine the newly available Vision Foundation Models (VFM) as feature extractors with one of four alternative density modeling techniques. In an extensive benchmark of 4 VFMs against 20 baselines, we show the superior performance of VFM feature encodings compared to shift-specific OOD monitors. Additionally, we find that sophisticated architectures outperform larger latent space dimensionality; and our method identifies samples with higher risk of errors on downstream tasks, despite being model-agnostic. This suggests that VFMs are promising to realize model-agnostic, unsupervised, reliable safety monitors in complex vision tasks.

Via

Access Paper or Ask Questions

On the Influence of Shape, Texture and Color for Learning Semantic Segmentation

Oct 18, 2024

Annika Mütze, Natalie Grabowsky, Edgar Heinert, Matthias Rottmann, Hanno Gottschalk

Figure 1 for On the Influence of Shape, Texture and Color for Learning Semantic Segmentation

Figure 2 for On the Influence of Shape, Texture and Color for Learning Semantic Segmentation

Figure 3 for On the Influence of Shape, Texture and Color for Learning Semantic Segmentation

Figure 4 for On the Influence of Shape, Texture and Color for Learning Semantic Segmentation

Abstract:In recent years, a body of works has emerged, studying shape and texture biases of off-the-shelf pre-trained deep neural networks (DNN) for image classification. These works study how much a trained DNN relies on image cues, predominantly shape and texture. In this work, we switch the perspective, posing the following questions: What can a DNN learn from each of the image cues, i.e., shape, texture and color, respectively? How much does each cue influence the learning success? And what are the synergy effects between different cues? Studying these questions sheds light upon cue influences on learning and thus the learning capabilities of DNNs. We study these questions on semantic segmentation which allows us to address our questions on pixel level. To conduct this study, we develop a generic procedure to decompose a given dataset into multiple ones, each of them only containing either a single cue or a chosen mixture. This framework is then applied to two real-world datasets, Cityscapes and PASCAL Context, and a synthetic data set based on the CARLA simulator. We learn the given semantic segmentation task from these cue datasets, creating cue experts. Early fusion of cues is performed by constructing appropriate datasets. This is complemented by a late fusion of experts which allows us to study cue influence location-dependent on pixel level. Our study on three datasets reveals that neither texture nor shape clearly dominate the learning success, however a combination of shape and color but without texture achieves surprisingly strong results. Our findings hold for convolutional and transformer backbones. In particular, qualitatively there is almost no difference in how both of the architecture types extract information from the different cues.

Via

Access Paper or Ask Questions

Unveiling Ontological Commitment in Multi-Modal Foundation Models

Sep 25, 2024

Mert Keser, Gesina Schwalbe, Niki Amini-Naieni, Matthias Rottmann, Alois Knoll

Figure 1 for Unveiling Ontological Commitment in Multi-Modal Foundation Models

Figure 2 for Unveiling Ontological Commitment in Multi-Modal Foundation Models

Figure 3 for Unveiling Ontological Commitment in Multi-Modal Foundation Models

Figure 4 for Unveiling Ontological Commitment in Multi-Modal Foundation Models

Abstract:Ontological commitment, i.e., used concepts, relations, and assumptions, are a corner stone of qualitative reasoning (QR) models. The state-of-the-art for processing raw inputs, though, are deep neural networks (DNNs), nowadays often based off from multimodal foundation models. These automatically learn rich representations of concepts and respective reasoning. Unfortunately, the learned qualitative knowledge is opaque, preventing easy inspection, validation, or adaptation against available QR models. So far, it is possible to associate pre-defined concepts with latent representations of DNNs, but extractable relations are mostly limited to semantic similarity. As a next step towards QR for validation and verification of DNNs: Concretely, we propose a method that extracts the learned superclass hierarchy from a multimodal DNN for a given set of leaf concepts. Under the hood we (1) obtain leaf concept embeddings using the DNN's textual input modality; (2) apply hierarchical clustering to them, using that DNNs encode semantic similarities via vector distances; and (3) label the such-obtained parent concepts using search in available ontologies from QR. An initial evaluation study shows that meaningful ontological class hierarchies can be extracted from state-of-the-art foundation models. Furthermore, we demonstrate how to validate and verify a DNN's learned representations against given ontologies. Lastly, we discuss potential future applications in the context of QR.

* Qualitative Reasoning Workshop 2024 (QR2024) colocated with ECAI2024, camera-ready submission; first two authors contributed equally; 10 pages, 4 figures, 3 tables

Via

Access Paper or Ask Questions

Uncertainty and Prediction Quality Estimation for Semantic Segmentation via Graph Neural Networks

Sep 17, 2024

Edgar Heinert, Stephan Tilgner, Timo Palm, Matthias Rottmann

Abstract:When employing deep neural networks (DNNs) for semantic segmentation in safety-critical applications like automotive perception or medical imaging, it is important to estimate their performance at runtime, e.g. via uncertainty estimates or prediction quality estimates. Previous works mostly performed uncertainty estimation on pixel-level. In a line of research, a connected-component-wise (segment-wise) perspective was taken, approaching uncertainty estimation on an object-level by performing so-called meta classification and regression to estimate uncertainty and prediction quality, respectively. In those works, each predicted segment is considered individually to estimate its uncertainty or prediction quality. However, the neighboring segments may provide additional hints on whether a given predicted segment is of high quality, which we study in the present work. On the basis of uncertainty indicating metrics on segment-level, we use graph neural networks (GNNs) to model the relationship of a given segment's quality as a function of the given segment's metrics as well as those of its neighboring segments. We compare different GNN architectures and achieve a notable performance improvement.

* 11 pages, 3 figures, submitted to BMVC "Workshop on Robust Recognition in the Open World" (https://rrow2024.github.io/call-for-papers)

Via

Access Paper or Ask Questions

On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes

Aug 20, 2024

Sadia Ilyas, Ido Freeman, Matthias Rottmann

Figure 1 for On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes

Figure 2 for On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes

Figure 3 for On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes

Figure 4 for On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes

Abstract:Out-of-distribution (OOD) object detection is a critical task focused on detecting objects that originate from a data distribution different from that of the training data. In this study, we investigate to what extent state-of-the-art open-vocabulary object detectors can detect unusual objects in street scenes, which are considered as OOD or rare scenarios with respect to common street scene datasets. Specifically, we evaluate their performance on the OoDIS Benchmark, which extends RoadAnomaly21 and RoadObstacle21 from SegmentMeIfYouCan, as well as LostAndFound, which was recently extended to object level annotations. The objective of our study is to uncover short-comings of contemporary object detectors in challenging real-world, and particularly in open-world scenarios. Our experiments reveal that open vocabulary models are promising for OOD object detection scenarios, however far from perfect. Substantial improvements are required before they can be reliably deployed in real-world applications. We benchmark four state-of-the-art open-vocabulary object detection models on three different datasets. Noteworthily, Grounding DINO achieves the best results on RoadObstacle21 and LostAndFound in our study with an AP of 48.3% and 25.4% respectively. YOLO-World excels on RoadAnomaly21 with an AP of 21.2%.

Via

Access Paper or Ask Questions

SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data

Jun 18, 2024

Jialong Wu, Mirko Meuter, Markus Schoeler, Matthias Rottmann

Figure 1 for SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data

Figure 2 for SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data

Figure 3 for SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data

Figure 4 for SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data

Abstract:Radar-based perception has gained increasing attention in autonomous driving, yet the inherent sparsity of radars poses challenges. Radar raw data often contains excessive noise, whereas radar point clouds retain only limited information. In this work, we holistically treat the sparse nature of radar data by introducing an adaptive subsampling method together with a tailored network architecture that exploits the sparsity patterns to discover global and local dependencies in the radar signal. Our subsampling module selects a subset of pixels from range-doppler (RD) spectra that contribute most to the downstream perception tasks. To improve the feature extraction on sparse subsampled data, we propose a new way of applying graph neural networks on radar data and design a novel two-branch backbone to capture both global and local neighbor information. An attentive fusion module is applied to combine features from both branches. Experiments on the RADIal dataset show that our SparseRadNet exceeds state-of-the-art (SOTA) performance in object detection and achieves close to SOTA accuracy in freespace segmentation, meanwhile using sparse subsampled input data.

* 18 pages, 4 figures, 5 tables

Via

Access Paper or Ask Questions

OoDIS: Anomaly Instance Segmentation Benchmark

Jun 17, 2024

Alexey Nekrasov, Rui Zhou, Miriam Ackermann, Alexander Hermans, Bastian Leibe, Matthias Rottmann

Figure 1 for OoDIS: Anomaly Instance Segmentation Benchmark

Figure 2 for OoDIS: Anomaly Instance Segmentation Benchmark

Figure 3 for OoDIS: Anomaly Instance Segmentation Benchmark

Figure 4 for OoDIS: Anomaly Instance Segmentation Benchmark

Abstract:Autonomous vehicles require a precise understanding of their environment to navigate safely. Reliable identification of unknown objects, especially those that are absent during training, such as wild animals, is critical due to their potential to cause serious accidents. Significant progress in semantic segmentation of anomalies has been driven by the availability of out-of-distribution (OOD) benchmarks. However, a comprehensive understanding of scene dynamics requires the segmentation of individual objects, and thus the segmentation of instances is essential. Development in this area has been lagging, largely due to the lack of dedicated benchmarks. To address this gap, we have extended the most commonly used anomaly segmentation benchmarks to include the instance segmentation task. Our evaluation of anomaly instance segmentation methods shows that this challenge remains an unsolved problem. The benchmark website and the competition page can be found at: https://vision.rwth-aachen.de/oodis .

* Accepted at the VAND 2.0 Workshop at CVPR 2024. Project page: https://vision.rwth-aachen.de/oodis

Via

Access Paper or Ask Questions

Reducing Texture Bias of Deep Neural Networks via Edge Enhancing Diffusion

Feb 14, 2024

Edgar Heinert, Matthias Rottmann, Kira Maag, Karsten Kahl

Abstract:Convolutional neural networks (CNNs) for image processing tend to focus on localized texture patterns, commonly referred to as texture bias. While most of the previous works in the literature focus on the task of image classification, we go beyond this and study the texture bias of CNNs in semantic segmentation. In this work, we propose to train CNNs on pre-processed images with less texture to reduce the texture bias. Therein, the challenge is to suppress image texture while preserving shape information. To this end, we utilize edge enhancing diffusion (EED), an anisotropic image diffusion method initially introduced for image compression, to create texture reduced duplicates of existing datasets. Extensive numerical studies are performed with both CNNs and vision transformer models trained on original data and EED-processed data from the Cityscapes dataset and the CARLA driving simulator. We observe strong texture-dependence of CNNs and moderate texture-dependence of transformers. Training CNNs on EED-processed images enables the models to become completely ignorant with respect to texture, demonstrating resilience with respect to texture re-introduction to any degree. Additionally we analyze the performance reduction in depth on a level of connected components in the semantic segmentation and study the influence of EED pre-processing on domain generalization as well as adversarial robustness.

Via

Access Paper or Ask Questions

Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Oct 05, 2023

Lorenc Kapllani, Long Teng, Matthias Rottmann

Figure 1 for Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Figure 2 for Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Figure 3 for Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Figure 4 for Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

Abstract:Deep learning-based numerical schemes for solving high-dimensional backward stochastic differential equations (BSDEs) have recently raised plenty of scientific interest. While they enable numerical methods to approximate very high-dimensional BSDEs, their reliability has not been studied and is thus not understood. In this work, we study uncertainty quantification (UQ) for a class of deep learning-based BSDE schemes. More precisely, we review the sources of uncertainty involved in the schemes and numerically study the impact of different sources. Usually, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. This approach is computationally quite expensive, especially for high-dimensional problems. Hence, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be leveraged to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.

* 41 pages, 23 figures and 15 tables

Via

Access Paper or Ask Questions

Deep Active Learning with Noisy Oracle in Object Detection

Sep 30, 2023

Marius Schubert, Tobias Riedlinger, Karsten Kahl, Matthias Rottmann

Figure 1 for Deep Active Learning with Noisy Oracle in Object Detection

Figure 2 for Deep Active Learning with Noisy Oracle in Object Detection

Figure 3 for Deep Active Learning with Noisy Oracle in Object Detection

Figure 4 for Deep Active Learning with Noisy Oracle in Object Detection

Abstract:Obtaining annotations for complex computer vision tasks such as object detection is an expensive and time-intense endeavor involving a large number of human workers or expert opinions. Reducing the amount of annotations required while maintaining algorithm performance is, therefore, desirable for machine learning practitioners and has been successfully achieved by active learning algorithms. However, it is not merely the amount of annotations which influences model performance but also the annotation quality. In practice, the oracles that are queried for new annotations frequently contain significant amounts of noise. Therefore, cleansing procedures are oftentimes necessary to review and correct given labels. This process is subject to the same budget as the initial annotation itself since it requires human workers or even domain experts. Here, we propose a composite active learning framework including a label review module for deep object detection. We show that utilizing part of the annotation budget to correct the noisy annotations partially in the active dataset leads to early improvements in model performance, especially when coupled with uncertainty-based query strategies. The precision of the label error proposals has a significant influence on the measured effect of the label review. In our experiments we achieve improvements of up to 4.5 mAP points of object detection performance by incorporating label reviews at equal annotation budget.

Via

Access Paper or Ask Questions