Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Cremers

DeepLab2: A TensorFlow Library for Deep Labeling

Jun 17, 2021

Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers(+5 more)

Figure 1 for DeepLab2: A TensorFlow Library for Deep Labeling

Figure 2 for DeepLab2: A TensorFlow Library for Deep Labeling

Figure 3 for DeepLab2: A TensorFlow Library for Deep Labeling

Abstract:DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision. DeepLab2 includes all our recently developed DeepLab model variants with pretrained checkpoints as well as model training and evaluation code, allowing the community to reproduce and further improve upon the state-of-art systems. To showcase the effectiveness of DeepLab2, our Panoptic-DeepLab employing Axial-SWideRNet as network backbone achieves 68.0% PQ or 83.5% mIoU on Cityscaspes validation set, with only single-scale inference and ImageNet-1K pretrained checkpoints. We hope that publicly sharing our library could facilitate future research on dense pixel labeling tasks and envision new applications of this technology. Code is made publicly available at \url{https://github.com/google-research/deeplab2}.

* 4-page technical report. The first three authors contributed equally to this work

Via

Access Paper or Ask Questions

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

Jun 17, 2021

Marvin Eisenberger, David Novotny, Gael Kerchenbaum, Patrick Labatut, Natalia Neverova, Daniel Cremers, Andrea Vedaldi

Figure 1 for NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

Figure 2 for NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

Figure 3 for NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

Figure 4 for NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

Abstract:We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them. The interpolation, expressed as a deformation field, changes the pose of the source shape to resemble the target, but leaves the object identity unchanged. NeuroMorph uses an elegant architecture combining graph convolutions with global feature pooling to extract local features. During training, the model is incentivized to create realistic deformations by approximating geodesics on the underlying shape space manifold. This strong geometric prior allows to train our model end-to-end and in a fully unsupervised manner without requiring any manual correspondence annotations. NeuroMorph works well for a large variety of input shapes, including non-isometric pairs from different object categories. It obtains state-of-the-art results for both shape correspondence and interpolation tasks, matching or surpassing the performance of recent unsupervised and supervised methods on multiple benchmarks.

* Published at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021

Via

Access Paper or Ask Questions

Joint Deep Multi-Graph Matching and 3D Geometry Learning from Inhomogeneous 2D Image Collections

Mar 31, 2021

Zhenzhang Ye, Tarun Yenamandra, Florian Bernard, Daniel Cremers

Figure 1 for Joint Deep Multi-Graph Matching and 3D Geometry Learning from Inhomogeneous 2D Image Collections

Figure 2 for Joint Deep Multi-Graph Matching and 3D Geometry Learning from Inhomogeneous 2D Image Collections

Figure 3 for Joint Deep Multi-Graph Matching and 3D Geometry Learning from Inhomogeneous 2D Image Collections

Figure 4 for Joint Deep Multi-Graph Matching and 3D Geometry Learning from Inhomogeneous 2D Image Collections

Abstract:Graph matching aims to establish correspondences between vertices of graphs such that both the node and edge attributes agree. Various learning-based methods were recently proposed for finding correspondences between image key points based on deep graph matching formulations. While these approaches mainly focus on learning node and edge attributes, they completely ignore the 3D geometry of the underlying 3D objects depicted in the 2D images. We fill this gap by proposing a trainable framework that takes advantage of graph neural networks for learning a deformable 3D geometry model from inhomogeneous image collections, i.e. a set of images that depict different instances of objects from the same category. Experimentally we demonstrate that our method outperforms recent learning-based approaches for graph matching considering both accuracy and cycle-consistency error, while we in addition obtain the underlying 3D geometry of the objects depicted in the 2D images.

Via

Access Paper or Ask Questions

Square Root Bundle Adjustment for Large-Scale Reconstruction

Mar 30, 2021

Nikolaus Demmel, Christiane Sommer, Daniel Cremers, Vladyslav Usenko

Figure 1 for Square Root Bundle Adjustment for Large-Scale Reconstruction

Figure 2 for Square Root Bundle Adjustment for Large-Scale Reconstruction

Figure 3 for Square Root Bundle Adjustment for Large-Scale Reconstruction

Figure 4 for Square Root Bundle Adjustment for Large-Scale Reconstruction

Abstract:We propose a new formulation for the bundle adjustment problem which relies on nullspace marginalization of landmark variables by QR decomposition. Our approach, which we call square root bundle adjustment, is algebraically equivalent to the commonly used Schur complement trick, improves the numeric stability of computations, and allows for solving large-scale bundle adjustment problems with single-precision floating-point numbers. We show in real-world experiments with the BAL datasets that even in single precision the proposed solver achieves on average equally accurate solutions compared to Schur complement solvers using double precision. It runs significantly faster, but can require larger amounts of memory on dense problems. The proposed formulation relies on simple linear algebra operations and opens the way for efficient implementations of bundle adjustment on hardware platforms optimized for single-precision linear algebra processing.

* Accepted to CVPR 2021. Updated version corresponding to CVPR camera-ready. Formatting changes and minor tweaks to fit page requirements

Via

Access Paper or Ask Questions

Self-Supervised Steering Angle Prediction for Vehicle Control Using Visual Odometry

Mar 20, 2021

Qadeer Khan, Patrick Wenzel, Daniel Cremers

Figure 1 for Self-Supervised Steering Angle Prediction for Vehicle Control Using Visual Odometry

Figure 2 for Self-Supervised Steering Angle Prediction for Vehicle Control Using Visual Odometry

Figure 3 for Self-Supervised Steering Angle Prediction for Vehicle Control Using Visual Odometry

Figure 4 for Self-Supervised Steering Angle Prediction for Vehicle Control Using Visual Odometry

Abstract:Vision-based learning methods for self-driving cars have primarily used supervised approaches that require a large number of labels for training. However, those labels are usually difficult and expensive to obtain. In this paper, we demonstrate how a model can be trained to control a vehicle's trajectory using camera poses estimated through visual odometry methods in an entirely self-supervised fashion. We propose a scalable framework that leverages trajectory information from several different runs using a camera setup placed at the front of a car. Experimental results on the CARLA simulator demonstrate that our proposed approach performs at par with the model trained with supervision.

* Accepted at International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

Via

Access Paper or Ask Questions

Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning

Mar 08, 2021

Patrick Wenzel, Torsten Schön, Laura Leal-Taixé, Daniel Cremers

Figure 1 for Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning

Figure 2 for Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning

Figure 3 for Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning

Figure 4 for Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning

Abstract:Obstacle avoidance is a fundamental and challenging problem for autonomous navigation of mobile robots. In this paper, we consider the problem of obstacle avoidance in simple 3D environments where the robot has to solely rely on a single monocular camera. In particular, we are interested in solving this problem without relying on localization, mapping, or planning techniques. Most of the existing work consider obstacle avoidance as two separate problems, namely obstacle detection, and control. Inspired by the recent advantages of deep reinforcement learning in Atari games and understanding highly complex situations in Go, we tackle the obstacle avoidance problem as a data-driven end-to-end deep learning approach. Our approach takes raw images as input and generates control commands as output. We show that discrete action spaces are outperforming continuous control commands in terms of expected average reward in maze-like environments. Furthermore, we show how to accelerate the learning and increase the robustness of the policy by incorporating predicted depth maps by a generative adversarial network.

* Accepted at 2021 IEEE International Conference on Robotics and Automation (ICRA)

Via

Access Paper or Ask Questions

Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration

Feb 24, 2021

Christian Tomani, Daniel Cremers, Florian Buettner

Figure 1 for Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration

Figure 2 for Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration

Figure 3 for Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration

Figure 4 for Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration

Abstract:We address the problem of uncertainty calibration and introduce a novel calibration method, Parametrized Temperature Scaling (PTS). Standard deep neural networks typically yield uncalibrated predictions, which can be transformed into calibrated confidence scores using post-hoc calibration methods. In this contribution, we demonstrate that the performance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by their intrinsic expressive power. We generalize temperature scaling by computing prediction-specific temperatures, parameterized by a neural network. We show with extensive experiments that our novel accuracy-preserving approach consistently outperforms existing algorithms across a large number of model architectures, datasets and metrics.

* Technical report

Via

Access Paper or Ask Questions

STEP: Segmenting and Tracking Every Pixel

Feb 23, 2021

Mark Weber, Jun Xie, Maxwell Collins, Yukun Zhu, Paul Voigtlaender, Hartwig Adam, Bradley Green, Andreas Geiger, Bastian Leibe, Daniel Cremers(+3 more)

Figure 1 for STEP: Segmenting and Tracking Every Pixel

Figure 2 for STEP: Segmenting and Tracking Every Pixel

Figure 3 for STEP: Segmenting and Tracking Every Pixel

Figure 4 for STEP: Segmenting and Tracking Every Pixel

Abstract:In this paper, we tackle video panoptic segmentation, a task that requires assigning semantic classes and track identities to all pixels in a video. To study this important problem in a setting that requires a continuous interpretation of sensory data, we present a new benchmark: Segmenting and Tracking Every Pixel (STEP), encompassing two datasets, KITTI-STEP, and MOTChallenge-STEP together with a new evaluation metric. Our work is the first that targets this task in a real-world setting that requires dense interpretation in both spatial and temporal domains. As the ground-truth for this task is difficult and expensive to obtain, existing datasets are either constructed synthetically or only sparsely annotated within short video clips. By contrast, our datasets contain long video sequences, providing challenging examples and a test-bed for studying long-term pixel-precise segmentation and tracking. For measuring the performance, we propose a novel evaluation metric Segmentation and Tracking Quality (STQ) that fairly balances semantic and tracking aspects of this task and is suitable for evaluating sequences of arbitrary length. We will make our datasets, metric, and baselines publicly available.

* Datasets, metric, and baselines will be made publicly available soon

Via

Access Paper or Ask Questions

Variational Data Assimilation with a Learned Inverse Observation Operator

Feb 22, 2021

Thomas Frerix, Dmitrii Kochkov, Jamie A. Smith, Daniel Cremers, Michael P. Brenner, Stephan Hoyer

Figure 1 for Variational Data Assimilation with a Learned Inverse Observation Operator

Figure 2 for Variational Data Assimilation with a Learned Inverse Observation Operator

Figure 3 for Variational Data Assimilation with a Learned Inverse Observation Operator

Figure 4 for Variational Data Assimilation with a Learned Inverse Observation Operator

Abstract:Variational data assimilation optimizes for an initial state of a dynamical system such that its evolution fits observational data. The physical model can subsequently be evolved into the future to make predictions. This principle is a cornerstone of large scale forecasting applications such as numerical weather prediction. As such, it is implemented in current operational systems of weather forecasting agencies across the globe. However, finding a good initial state poses a difficult optimization problem in part due to the non-invertible relationship between physical states and their corresponding observations. We learn a mapping from observational data to physical states and show how it can be used to improve optimizability. We employ this mapping in two ways: to better initialize the non-convex optimization problem, and to reformulate the objective function in better behaved physics space instead of observation space. Our experimental results for the Lorenz96 model and a two-dimensional turbulent fluid flow demonstrate that this procedure significantly improves forecast quality for chaotic systems.

Via

Access Paper or Ask Questions

Rotation-Equivariant Deep Learning for Diffusion MRI

Feb 13, 2021

Philip Müller, Vladimir Golkov, Valentina Tomassini, Daniel Cremers

Figure 1 for Rotation-Equivariant Deep Learning for Diffusion MRI

Figure 2 for Rotation-Equivariant Deep Learning for Diffusion MRI

Figure 3 for Rotation-Equivariant Deep Learning for Diffusion MRI

Figure 4 for Rotation-Equivariant Deep Learning for Diffusion MRI

Abstract:Convolutional networks are successful, but they have recently been outperformed by new neural networks that are equivariant under rotations and translations. These new networks work better because they do not struggle with learning each possible orientation of each image feature separately. So far, they have been proposed for 2D and 3D data. Here we generalize them to 6D diffusion MRI data, ensuring joint equivariance under 3D roto-translations in image space and the matching 3D rotations in $q$-space, as dictated by the image formation. Such equivariant deep learning is appropriate for diffusion MRI, because microstructural and macrostructural features such as neural fibers can appear at many different orientations, and because even non-rotation-equivariant deep learning has so far been the best method for many diffusion MRI tasks. We validate our equivariant method on multiple-sclerosis lesion segmentation. Our proposed neural networks yield better results and require fewer scans for training compared to non-rotation-equivariant deep learning. They also inherit all the advantages of deep learning over classical diffusion MRI methods. Our implementation is available at https://github.com/philip-mueller/equivariant-deep-dmri and can be used off the shelf without understanding the mathematical background.

* 24 pages, 8 figures

Via

Access Paper or Ask Questions