Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marc Pollefeys

Department of Computer Science, ETH Zurich, Switzerland and Microsoft Mixed Reality & AI Lab, Zurich, Switzerland

Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Registration Under Large Geometric and Temporal Change

Nov 15, 2023

Tao Sun, Yan Hao, Shengyu Huang, Silvio Savarese, Konrad Schindler, Marc Pollefeys, Iro Armeni

Abstract:Building 3D geometric maps of man-made spaces is a well-established and active field that is fundamental to computer vision and robotics. However, considering the evolving nature of built environments, it is essential to question the capabilities of current mapping efforts in handling temporal changes. In addition, spatiotemporal mapping holds significant potential for achieving sustainability and circularity goals. Existing mapping approaches focus on small changes, such as object relocation or self-driving car operation; in all cases where the main structure of the scene remains fixed. Consequently, these approaches fail to address more radical changes in the structure of the built environment, such as geometry and topology. To this end, we introduce the Nothing Stands Still (NSS) benchmark, which focuses on the spatiotemporal registration of 3D scenes undergoing large spatial and temporal change, ultimately creating one coherent spatiotemporal map. Specifically, the benchmark involves registering two or more partial 3D point clouds (fragments) from the same scene but captured from different spatiotemporal views. In addition to the standard pairwise registration, we assess the multi-way registration of multiple fragments that belong to any temporal stage. As part of NSS, we introduce a dataset of 3D point clouds recurrently captured in large-scale building indoor environments that are under construction or renovation. The NSS benchmark presents three scenarios of increasing difficulty, to quantify the generalization ability of point cloud registration methods over space (within one building and across buildings) and time. We conduct extensive evaluations of state-of-the-art methods on NSS. The results demonstrate the necessity for novel methods specifically designed to handle large spatiotemporal changes. The homepage of our benchmark is at http://nothing-stands-still.com.

* 27 pages, 29 figures. For the project page, see http://nothing-stands-still.com

Via

Access Paper or Ask Questions

Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences

Nov 06, 2023

Zador Pataki, Mohammad Altillawi, Menelaos Kanakis, Rémi Pautrat, Fengyi Shen, Ziyuan Liu, Luc Van Gool, Marc Pollefeys

Figure 1 for Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences

Figure 2 for Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences

Figure 3 for Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences

Figure 4 for Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences

Abstract:Modern learning-based visual feature extraction networks perform well in intra-domain localization, however, their performance significantly declines when image pairs are captured across long-term visual domain variations, such as different seasonal and daytime variations. In this paper, our first contribution is a benchmark to investigate the performance impact of long-term variations on visual localization. We conduct a thorough analysis of the performance of current state-of-the-art feature extraction networks under various domain changes and find a significant performance gap between intra- and cross-domain localization. We investigate different methods to close this gap by improving the supervision of modern feature extractor networks. We propose a novel data-centric method, Implicit Cross-Domain Correspondences (iCDC). iCDC represents the same environment with multiple Neural Radiance Fields, each fitting the scene under individual visual domains. It utilizes the underlying 3D representations to generate accurate correspondences across different long-term visual conditions. Our proposed method enhances cross-domain localization performance, significantly reducing the performance gap. When evaluated on popular long-term localization benchmarks, our trained networks consistently outperform existing methods. This work serves as a substantial stride toward more robust visual localization pipelines for long-term deployments, and opens up research avenues in the development of long-term invariant descriptors.

* 14 pages + 5 pages appendix, 13 figures

Via

Access Paper or Ask Questions

Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization

Oct 10, 2023

Le Chen, Weirong Chen, Rui Wang, Marc Pollefeys

Figure 1 for Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization

Figure 2 for Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization

Figure 3 for Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization

Figure 4 for Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization

Abstract:As a promising fashion for visual localization, scene coordinate regression (SCR) has seen tremendous progress in the past decade. Most recent methods usually adopt neural networks to learn the mapping from image pixels to 3D scene coordinates, which requires a vast amount of annotated training data. We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for SCR. Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain, which can hinder the regression accuracy or bring unnecessary computational costs with redundant data. These challenges are addressed in three folds in this paper: (1) A NeRF is designed to separately predict uncertainties for the rendered color and depth images, which reveal data reliability at the pixel level. (2) SCR is formulated as deep evidential learning with epistemic uncertainty, which is used to evaluate information gain and scene coordinate quality. (3) Based on the three arts of uncertainties, a novel view selection policy is formed that significantly improves data efficiency. Experiments on public datasets demonstrate that our method could select the samples that bring the most information gain and promote the performance with the highest efficiency.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation

Oct 08, 2023

Dominik Hollidt, Clinton Wang, Polina Golland, Marc Pollefeys

Figure 1 for Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation

Figure 2 for Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation

Figure 3 for Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation

Figure 4 for Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation

Abstract:We present a novel approach to perform 3D semantic segmentation solely from 2D supervision by leveraging Neural Radiance Fields (NeRFs). By extracting features along a surface point cloud, we achieve a compact representation of the scene which is sample-efficient and conducive to 3D reasoning. Learning this feature space in an unsupervised manner via masked autoencoding enables few-shot segmentation. Our method is agnostic to the scene parameterization, working on scenes fit with any type of NeRF.

* 8 pages

Via

Access Paper or Ask Questions

Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach

Oct 04, 2023

Matthew Hanlon, Boyang Sun, Marc Pollefeys, Hermann Blum

Figure 1 for Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach

Figure 2 for Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach

Figure 3 for Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach

Figure 4 for Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach

Abstract:Rather than having each newly deployed robot create its own map of its surroundings, the growing availability of SLAM-enabled devices provides the option of simply localizing in a map of another robot or device. In cases such as multi-robot or human-robot collaboration, localizing all agents in the same map is even necessary. However, localizing e.g. a ground robot in the map of a drone or head-mounted MR headset presents unique challenges due to viewpoint changes. This work investigates how active visual localization can be used to overcome such challenges of viewpoint changes. Specifically, we focus on the problem of selecting the optimal viewpoint at a given location. We compare existing approaches in the literature with additional proposed baselines and propose a novel data-driven approach. The result demonstrates the superior performance of the data-driven approach when compared to existing methods, both in controlled simulation experiments and real-world deployment.

Via

Access Paper or Ask Questions

A 3D Mixed Reality Interface for Human-Robot Teaming

Oct 03, 2023

Jiaqi Chen, Boyang Sun, Marc Pollefeys, Hermann Blum

Abstract:This paper presents a mixed-reality human-robot teaming system. It allows human operators to see in real-time where robots are located, even if they are not in line of sight. The operator can also visualize the map that the robots create of their environment and can easily send robots to new goal positions. The system mainly consists of a mapping and a control module. The mapping module is a real-time multi-agent visual SLAM system that co-localizes all robots and mixed-reality devices to a common reference frame. Visualizations in the mixed-reality device then allow operators to see a virtual life-sized representation of the cumulative 3D map overlaid onto the real environment. As such, the operator can effectively "see through" walls into other rooms. To control robots and send them to new locations, we propose a drag-and-drop interface. An operator can grab any robot hologram in a 3D mini map and drag it to a new desired goal pose. We validate the proposed system through a user study and real-world deployments. We make the mixed-reality application publicly available at https://github.com/cvg/HoloLens_ros.

Via

Access Paper or Ask Questions

HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Sep 29, 2023

Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri(+2 more)

Figure 1 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 2 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 3 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 4 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Abstract:Building an interactive AI assistant that can perceive, reason, and collaborate with humans in the real world has been a long-standing pursuit in the AI community. This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. As a first step in this direction, we introduce HoloAssist, a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks. The task performer executes the task while wearing a mixed-reality headset that captures seven synchronized data streams. The task instructor watches the performer's egocentric video in real time and guides them verbally. By augmenting the data with action and conversational annotations and observing the rich behaviors of various participants, we present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment. HoloAssist spans 166 hours of data captured by 350 unique instructor-performer pairs. Furthermore, we construct and present benchmarks on mistake detection, intervention type prediction, and hand forecasting, along with detailed analysis. We expect HoloAssist will provide an important resource for building AI assistants that can fluidly collaborate with humans in the real world. Data can be downloaded at https://holoassist.github.io/.

* ICCV 2023

Via

Access Paper or Ask Questions

Handbook on Leveraging Lines for Two-View Relative Pose Estimation

Sep 27, 2023

Petr Hruby, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, Daniel Barath

Figure 1 for Handbook on Leveraging Lines for Two-View Relative Pose Estimation

Figure 2 for Handbook on Leveraging Lines for Two-View Relative Pose Estimation

Figure 3 for Handbook on Leveraging Lines for Two-View Relative Pose Estimation

Figure 4 for Handbook on Leveraging Lines for Two-View Relative Pose Estimation

Abstract:We propose an approach for estimating the relative pose between calibrated image pairs by jointly exploiting points, lines, and their coincidences in a hybrid manner. We investigate all possible configurations where these data modalities can be used together and review the minimal solvers available in the literature. Our hybrid framework combines the advantages of all configurations, enabling robust and accurate estimation in challenging environments. In addition, we design a method for jointly estimating multiple vanishing point correspondences in two images, and a bundle adjustment that considers all relevant data modalities. Experiments on various indoor and outdoor datasets show that our approach outperforms point-based methods, improving AUC@10$^\circ$ by 1-7 points while running at comparable speeds. The source code of the solvers and hybrid framework will be made public.

* 2 view relative pose from special configurations of line

Via

Access Paper or Ask Questions

Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

Sep 27, 2023

Shengze Jin, Daniel Barath, Marc Pollefeys, Iro Armeni

Figure 1 for Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

Figure 2 for Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

Figure 3 for Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

Figure 4 for Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

Abstract:Point cloud registration has seen recent success with several learning-based methods that focus on correspondence matching and, as such, optimize only for this objective. Following the learning step of correspondence matching, they evaluate the estimated rigid transformation with a RANSAC-like framework. While it is an indispensable component of these methods, it prevents a fully end-to-end training, leaving the objective to minimize the pose error nonserved. We present a novel solution, Q-REG, which utilizes rich geometric information to estimate the rigid pose from a single correspondence. Q-REG allows to formalize the robust estimation as an exhaustive search, hence enabling end-to-end training that optimizes over both objectives of correspondence matching and rigid pose estimation. We demonstrate in the experiments that Q-REG is agnostic to the correspondence matching method and provides consistent improvement both when used only in inference and in end-to-end training. It sets a new state-of-the-art on the 3DMatch, KITTI, and ModelNet benchmarks.

Via

Access Paper or Ask Questions

Volumetric Semantically Consistent 3D Panoptic Mapping

Sep 26, 2023

Yang Miao, Iro Armeni, Marc Pollefeys, Daniel Barath

Abstract:We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data.

* 8 pages, 2 figures

Via

Access Paper or Ask Questions