Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nikolay Savinov

Semi-parametric Topological Memory for Navigation

Mar 01, 2018

Nikolay Savinov, Alexey Dosovitskiy, Vladlen Koltun

Figure 1 for Semi-parametric Topological Memory for Navigation

Figure 2 for Semi-parametric Topological Memory for Navigation

Figure 3 for Semi-parametric Topological Memory for Navigation

Figure 4 for Semi-parametric Topological Memory for Navigation

Abstract:We introduce a new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals. The proposed semi-parametric topological memory (SPTM) consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a (parametric) deep network capable of retrieving nodes from the graph based on observations. The graph stores no metric information, only connectivity of locations corresponding to the nodes. We use SPTM as a planning module in a navigation system. Given only 5 minutes of footage of a previously unseen maze, an SPTM-based navigation agent can build a topological map of the environment and use it to confidently navigate towards goals. The average success rate of the SPTM agent in goal-directed navigation across test environments is higher than the best-performing baseline by a factor of three. A video of the agent is available at https://youtu.be/vRF7f4lhswo

* Published at International Conference on Learning Representations (ICLR) 2018. Project website at https://sites.google.com/view/SPTM

Via

Access Paper or Ask Questions

Matching neural paths: transfer from recognition to correspondence search

Nov 05, 2017

Nikolay Savinov, Lubor Ladicky, Marc Pollefeys

Figure 1 for Matching neural paths: transfer from recognition to correspondence search

Figure 2 for Matching neural paths: transfer from recognition to correspondence search

Figure 3 for Matching neural paths: transfer from recognition to correspondence search

Figure 4 for Matching neural paths: transfer from recognition to correspondence search

Abstract:Many machine learning tasks require finding per-part correspondences between objects. In this work we focus on low-level correspondences - a highly ambiguous matching problem. We propose to use a hierarchical semantic representation of the objects, coming from a convolutional neural network, to solve this ambiguity. Training it for low-level correspondence prediction directly might not be an option in some domains where the ground-truth correspondences are hard to obtain. We show how transfer from recognition can be used to avoid such training. Our idea is to mark parts as "matching" if their features are close to each other at all the levels of convolutional feature hierarchy (neural paths). Although the overall number of such paths is exponential in the number of layers, we propose a polynomial algorithm for aggregating all of them in a single backward pass. The empirical validation is done on the task of stereo correspondence and demonstrates that we achieve competitive results among the methods which do not use labeled target domain data.

* Accepted at NIPS 2017

Via

Access Paper or Ask Questions

Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

Apr 12, 2017

Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D. Wegner, Konrad Schindler, Marc Pollefeys

Figure 1 for Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

Figure 2 for Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

Figure 3 for Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

Figure 4 for Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

Abstract:This paper presents a new 3D point cloud classification benchmark data set with over four billion manually labelled points, meant as input for data-hungry (deep) learning methods. We also discuss first submissions to the benchmark that use deep convolutional neural networks (CNNs) as a work horse, which already show remarkable performance improvements over state-of-the-art. CNNs have become the de-facto standard for many tasks in computer vision and machine learning like semantic segmentation or object detection in images, but have no yet led to a true breakthrough for 3D point cloud labelling tasks due to lack of training data. With the massive data set presented in this paper, we aim at closing this data gap to help unleash the full potential of deep learning methods for 3D labelling tasks. Our semantic3D.net data set consists of dense point clouds acquired with static terrestrial laser scanners. It contains 8 semantic classes and covers a wide range of urban outdoor scenes: churches, streets, railroad tracks, squares, villages, soccer fields and castles. We describe our labelling interface and show that our data set provides more dense and complete point clouds with much higher overall number of labelled points compared to those already available to the research community. We further provide baseline method descriptions and comparison between methods submitted to our online system. We hope semantic3D.net will pave the way for deep learning methods in 3D point cloud labelling to learn richer, more general 3D representations, and first submissions after only a few months indicate that this might indeed be the case.

* Accepted to ISPRS Annals. The benchmark website is available at http://www.semantic3d.net/ . The baseline code is available at https://github.com/nsavinov/semantic3dnet

Via

Access Paper or Ask Questions

Quad-networks: unsupervised learning to rank for interest point detection

Apr 10, 2017

Nikolay Savinov, Akihito Seki, Lubor Ladicky, Torsten Sattler, Marc Pollefeys

Figure 1 for Quad-networks: unsupervised learning to rank for interest point detection

Figure 2 for Quad-networks: unsupervised learning to rank for interest point detection

Figure 3 for Quad-networks: unsupervised learning to rank for interest point detection

Figure 4 for Quad-networks: unsupervised learning to rank for interest point detection

Abstract:Several machine learning tasks require to represent the data using only a sparse set of interest points. An ideal detector is able to find the corresponding interest points even if the data undergo a transformation typical for a given domain. Since the task is of high practical interest in computer vision, many hand-crafted solutions were proposed. In this paper, we ask a fundamental question: can we learn such detectors from scratch? Since it is often unclear what points are "interesting", human labelling cannot be used to find a truly unbiased solution. Therefore, the task requires an unsupervised formulation. We are the first to propose such a formulation: training a neural network to rank points in a transformation-invariant manner. Interest points are then extracted from the top/bottom quantiles of this ranking. We validate our approach on two tasks: standard RGB image interest point detection and challenging cross-modal interest point detection between RGB and depth images. We quantitatively show that our unsupervised method performs better or on-par with baselines.

* Accepted at CVPR 2017

Via

Access Paper or Ask Questions

TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

Sep 22, 2016

Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, Marc Pollefeys

Figure 1 for TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

Figure 2 for TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

Figure 3 for TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

Figure 4 for TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

Abstract:In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under- or overfitting. The main reason for these drawbacks is that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal "canonical" instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines.

* Accepted at CVPR 2016. The first two authors assert equal contribution and joint first authorship

Via

Access Paper or Ask Questions

Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

May 22, 2016

Nikolay Savinov, Christian Haene, Lubor Ladicky, Marc Pollefeys

Figure 1 for Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

Figure 2 for Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

Figure 3 for Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

Figure 4 for Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

Abstract:We propose an approach for dense semantic 3D reconstruction which uses a data term that is defined as potentials over viewing rays, combined with continuous surface area penalization. Our formulation is a convex relaxation which we augment with a crucial non-convex constraint that ensures exact handling of visibility. To tackle the non-convex minimization problem, we propose a majorize-minimize type strategy which converges to a critical point. We demonstrate the benefits of using the non-convex constraint experimentally. For the geometry-only case, we set a new state of the art on two datasets of the commonly used Middlebury multi-view stereo benchmark. Moreover, our general-purpose formulation directly reconstructs thin objects, which are usually treated with specialized algorithms. A qualitative evaluation on the dense semantic 3D reconstruction task shows that we improve significantly over previous methods.

* Accepted as a spotlight oral paper by CVPR 2016

Via

Access Paper or Ask Questions