Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eric Brachmann

Map-free Visual Relocalization: Metric Pose Relative to a Single Image

Oct 11, 2022

Eduardo Arnold, Jamie Wynn, Sara Vicente, Guillermo Garcia-Hernando, Áron Monszpart, Victor Adrian Prisacariu, Daniyar Turmukhambetov, Eric Brachmann

Figure 1 for Map-free Visual Relocalization: Metric Pose Relative to a Single Image

Figure 2 for Map-free Visual Relocalization: Metric Pose Relative to a Single Image

Figure 3 for Map-free Visual Relocalization: Metric Pose Relative to a Single Image

Figure 4 for Map-free Visual Relocalization: Metric Pose Relative to a Single Image

Abstract:Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization, i.e., using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.

* ECCV2022 camera-ready. 14 pages + 4 reference pages

Via

Access Paper or Ask Questions

On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Sep 01, 2021

Eric Brachmann, Martin Humenberger, Carsten Rother, Torsten Sattler

Figure 1 for On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Figure 2 for On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Figure 3 for On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Figure 4 for On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation

Abstract:Benchmark datasets that measure camera pose accuracy have driven progress in visual re-localisation research. To obtain poses for thousands of images, it is common to use a reference algorithm to generate pseudo ground truth. Popular choices include Structure-from-Motion (SfM) and Simultaneous-Localisation-and-Mapping (SLAM) using additional sensors like depth cameras if available. Re-localisation benchmarks thus measure how well each method replicates the results of the reference algorithm. This begs the question whether the choice of the reference algorithm favours a certain family of re-localisation methods. This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm. We thus question common beliefs in the re-localisation literature, namely that learning-based scene coordinate regression outperforms classical feature-based methods, and that RGB-D-based methods outperform RGB-based methods. We argue that any claims on ranking re-localisation methods should take the type of the reference algorithm, and the similarity of the methods to the reference algorithm, into account.

* ICCV 2021

Via

Access Paper or Ask Questions

Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images

May 05, 2021

Florian Kluger, Hanno Ackermann, Eric Brachmann, Michael Ying Yang, Bodo Rosenhahn

Figure 1 for Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images

Figure 2 for Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images

Figure 3 for Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images

Figure 4 for Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images

Abstract:Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.

* CVPR 2021

Via

Access Paper or Ask Questions

Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

Apr 12, 2021

Mehmet Ozgur Turkoglu, Eric Brachmann, Konrad Schindler, Gabriel Brostow, Aron Monszpart

Figure 1 for Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

Figure 2 for Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

Figure 3 for Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

Figure 4 for Visual Camera Re-Localization Using Graph Neural Networks and Relative Pose Supervision

Abstract:Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment. The highest-scoring methods are "structure based," and need the query camera's intrinsics as an input to the model, with careful geometric optimization. When intrinsics are absent, methods vie for accuracy by making various other assumptions. This yields fairly good localization scores, but the models are "narrow" in some way, eg., requiring costly test-time computations, or depth sensors, or multiple query frames. In contrast, our proposed method makes few special assumptions, and is fairly lightweight in training and testing. Our pose regression network learns from only relative poses of training scenes. For inference, it builds a graph connecting the query image to training counterparts and uses a graph neural network (GNN) with image representations on nodes and image-pair representations on edges. By efficiently passing messages between them, both representation types are refined to produce a consistent camera pose estimate. We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks. Our relative pose regression method matches the accuracy of absolute pose regression networks, while retaining the relative-pose models' test-time speed and ability to generalize to non-training scenes.

Via

Access Paper or Ask Questions

BOP Challenge 2020 on 6D Object Localization

Oct 13, 2020

Tomas Hodan, Martin Sundermeyer, Bertram Drost, Yann Labbe, Eric Brachmann, Frank Michel, Carsten Rother, Jiri Matas

Figure 1 for BOP Challenge 2020 on 6D Object Localization

Figure 2 for BOP Challenge 2020 on 6D Object Localization

Figure 3 for BOP Challenge 2020 on 6D Object Localization

Figure 4 for BOP Challenge 2020 on 6D Object Localization

Abstract:This paper presents the evaluation methodology, datasets, and results of the BOP Challenge 2020, the third in a series of public competitions organized with the goal to capture the status quo in the field of 6D object pose estimation from an RGB-D image. In 2020, to reduce the domain gap between synthetic training and real test RGB images, the participants were provided 350K photorealistic training images generated by BlenderProc4BOP, a new open-source and light-weight physically-based renderer (PBR) and procedural data generator. Methods based on deep neural networks have finally caught up with methods based on point pair features, which were dominating previous editions of the challenge. Although the top-performing methods rely on RGB-D image channels, strong results were achieved when only RGB channels were used at both training and test time - out of the 26 evaluated methods, the third method was trained on RGB channels of PBR and real images, while the fifth on RGB channels of PBR images only. Strong data augmentation was identified as a key component of the top-performing CosyPose method, and the photorealism of PBR images was demonstrated effective despite the augmentation. The online evaluation system stays open and is available on the project website: bop.felk.cvut.cz.

* In ECCV 2020 Workshops Proceedings

Via

Access Paper or Ask Questions

Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

Feb 27, 2020

Eric Brachmann, Carsten Rother

Figure 1 for Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

Figure 2 for Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

Figure 3 for Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

Figure 4 for Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

Abstract:We describe a learning-based system that estimates the camera position and orientation from a single input image relative to a known environment. The system is flexible w.r.t. the amount of information available at test and at training time, catering to different applications. Input images can be RGB-D or RGB, and a 3D model of the environment can be utilized for training but is not necessary. In the minimal case, our system requires only RGB images and ground truth poses at training time, and it requires only a single RGB image at test time. The framework consists of a deep neural network and fully differentiable pose optimization. The neural network predicts so called scene coordinates, i.e. dense correspondences between the input image and 3D scene space of the environment. The pose optimization implements robust fitting of pose parameters using differentiable RANSAC (DSAC) to facilitate end-to-end training. The system, an extension of DSAC++ and referred to as DSAC*, achieves state-of-the-art accuracy an various public datasets for RGB-based re-localization, and competitive accuracy for RGB-D based re-localization.

Via

Access Paper or Ask Questions

CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Jan 08, 2020

Florian Kluger, Eric Brachmann, Hanno Ackermann, Carsten Rother, Michael Ying Yang, Bodo Rosenhahn

Figure 1 for CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Figure 2 for CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Figure 3 for CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Figure 4 for CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Abstract:We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. Applications include finding multiple vanishing points in man-made scenes, fitting planes to architectural imagery, or estimating multiple rigid motions within the same sequence. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data. A neural network conditioned on previously detected models guides a RANSAC estimator to different subsets of all measurements, thereby finding model instances one after another. We train our method supervised as well as self-supervised. For supervised training of the search strategy, we contribute a new dataset for vanishing point estimation. Leveraging this dataset, the proposed algorithm is superior with respect to other robust estimators as well as to designated vanishing point estimation algorithms. For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.

Via

Access Paper or Ask Questions

Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

Dec 02, 2019

Aritra Bhowmik, Stefan Gumhold, Carsten Rother, Eric Brachmann

Figure 1 for Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

Figure 2 for Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

Figure 3 for Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

Figure 4 for Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

Abstract:We address a core problem of computer vision: Detection and description of 2D feature points for image matching. For a long time, hand-crafted designs, like the seminal SIFT algorithm, were unsurpassed in accuracy and efficiency. Recently, learned feature detectors emerged that implement detection and description using neural networks. Training these networks usually resorts to optimizing low-level matching scores, often pre-defining sets of image patches which should or should not match, or which should or should not contain key points. Unfortunately, increased accuracy for these low-level matching scores does not necessarily translate to better performance in high-level vision tasks. We propose a new training methodology which embeds the feature detector in a complete vision pipeline, and where the learnable parameters are trained in an end-to-end fashion. We overcome the discrete nature of key point selection and descriptor matching using principles from reinforcement learning. As an example, we address the task of relative pose estimation between a pair of images. We demonstrate that the accuracy of a state-of-the-art learning-based feature detector can be increased when trained for the task it is supposed to solve at test time. Our training methodology poses little restrictions on the task to learn, and works for any architecture which predicts key point heat maps, and descriptors for key point locations.

Via

Access Paper or Ask Questions

Expert Sample Consensus Applied to Camera Re-Localization

Aug 07, 2019

Eric Brachmann, Carsten Rother

Figure 1 for Expert Sample Consensus Applied to Camera Re-Localization

Figure 2 for Expert Sample Consensus Applied to Camera Re-Localization

Figure 3 for Expert Sample Consensus Applied to Camera Re-Localization

Figure 4 for Expert Sample Consensus Applied to Camera Re-Localization

Abstract:Fitting model parameters to a set of noisy data points is a common problem in computer vision. In this work, we fit the 6D camera pose to a set of noisy correspondences between the 2D input image and a known 3D environment. We estimate these correspondences from the image using a neural network. Since the correspondences often contain outliers, we utilize a robust estimator such as Random Sample Consensus (RANSAC) or Differentiable RANSAC (DSAC) to fit the pose parameters. When the problem domain, e.g. the space of all 2D-3D correspondences, is large or ambiguous, a single network does not cover the domain well. Mixture of Experts (MoE) is a popular strategy to divide a problem domain among an ensemble of specialized networks, so called experts, where a gating network decides which expert is responsible for a given input. In this work, we introduce Expert Sample Consensus (ESAC), which integrates DSAC in a MoE. Our main technical contribution is an efficient method to train ESAC jointly and end-to-end. We demonstrate experimentally that ESAC handles two real-world problems better than competing methods, i.e. scalability and ambiguity. We apply ESAC to fitting simple geometric models to synthetic images, and to camera re-localization for difficult, real datasets.

* ICCV 2019. Supplementary materials included

Via

Access Paper or Ask Questions

Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses

May 10, 2019

Eric Brachmann, Carsten Rother

Figure 1 for Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses

Figure 2 for Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses

Figure 3 for Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses

Figure 4 for Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses

Abstract:We present Neural-Guided RANSAC (NG-RANSAC), an extension to the classic RANSAC algorithm from robust optimization. NG-RANSAC uses prior information to improve model hypothesis search, increasing the chance of finding outlier-free minimal sets. Previous works use heuristic side-information like hand-crafted descriptor distance to guide hypothesis search. In contrast, we learn hypothesis search in a principled fashion that lets us optimize an arbitrary task loss during training, leading to large improvements on classic computer vision tasks. We present two further extensions to NG-RANSAC. Firstly, using the inlier count itself as training signal allows us to train neural guidance in a self-supervised fashion. Secondly, we combine neural guidance with differentiable RANSAC to build neural networks which focus on certain parts of the input data and make the output predictions as good as possible. We evaluate NG-RANSAC on a wide array of computer vision tasks, namely estimation of epipolar geometry, horizon line estimation and camera re-localization. We achieve superior or competitive results compared to state-of-the-art robust estimators, including very recent, learned ones.

Via

Access Paper or Ask Questions