Alert button
Picture for Fredrik Kahl

Fredrik Kahl

Alert button

Learning Structure-from-Motion with Graph Attention Networks

Aug 30, 2023
Lucas Brynte, José Pedro Iglesias, Carl Olsson, Fredrik Kahl

Figure 1 for Learning Structure-from-Motion with Graph Attention Networks
Figure 2 for Learning Structure-from-Motion with Graph Attention Networks
Figure 3 for Learning Structure-from-Motion with Graph Attention Networks
Figure 4 for Learning Structure-from-Motion with Graph Attention Networks

In this paper we tackle the problem of learning Structure-from-Motion (SfM) through the use of graph attention networks. SfM is a classic computer vision problem that is solved though iterative minimization of reprojection errors, referred to as Bundle Adjustment (BA), starting from a good initialization. In order to obtain a good enough initialization to BA, conventional methods rely on a sequence of sub-problems (such as pairwise pose estimation, pose averaging or triangulation) which provides an initial solution that can then be refined using BA. In this work we replace these sub-problems by learning a model that takes as input the 2D keypoints detected across multiple views, and outputs the corresponding camera poses and 3D keypoint coordinates. Our model takes advantage of graph neural networks to learn SfM-specific primitives, and we show that it can be used for fast inference of the reconstruction for new and unseen sequences. The experimental results show that the proposed model outperforms competing learning-based methods, and challenges COLMAP while having lower runtime.

Viaarxiv icon

Adjustable Visual Appearance for Generalizable Novel View Synthesis

Jun 02, 2023
Josef Bengtson, David Nilsson, Che-Tsung Lin, Marcel Büsching, Fredrik Kahl

Figure 1 for Adjustable Visual Appearance for Generalizable Novel View Synthesis
Figure 2 for Adjustable Visual Appearance for Generalizable Novel View Synthesis
Figure 3 for Adjustable Visual Appearance for Generalizable Novel View Synthesis
Figure 4 for Adjustable Visual Appearance for Generalizable Novel View Synthesis

We present a generalizable novel view synthesis method where it is possible to modify the visual appearance of rendered views to match a target weather or lighting condition. Our method is based on a generalizable transformer architecture, trained on synthetically generated scenes under different appearance conditions. This allows for rendering novel views in a consistent manner of 3D scenes that were not included in the training set, along with the ability to (i) modify their appearance to match the target condition and (ii) smoothly interpolate between different conditions. Experiments on both real and synthetic scenes are provided including both qualitative and quantitative evaluations. Please refer to our project page for video results: https://ava-nvs.github.io/

Viaarxiv icon

Investigating how ReLU-networks encode symmetries

May 26, 2023
Georg Bökman, Fredrik Kahl

Figure 1 for Investigating how ReLU-networks encode symmetries
Figure 2 for Investigating how ReLU-networks encode symmetries
Figure 3 for Investigating how ReLU-networks encode symmetries
Figure 4 for Investigating how ReLU-networks encode symmetries

Many data symmetries can be described in terms of group equivariance and the most common way of encoding group equivariances in neural networks is by building linear layers that are group equivariant. In this work we investigate whether equivariance of a network implies that all layers are equivariant. On the theoretical side we find cases where equivariance implies layerwise equivariance, but also demonstrate that this is not the case generally. Nevertheless, we conjecture that CNNs that are trained to be equivariant will exhibit layerwise equivariance and explain how this conjecture is a weaker version of the recent permutation conjecture by Entezari et al. [2022]. We perform quantitative experiments with VGG-nets on CIFAR10 and qualitative experiments with ResNets on ImageNet to illustrate and support our theoretical findings. These experiments are not only of interest for understanding how group equivariance is encoded in ReLU-networks, but they also give a new perspective on Entezari et al.'s permutation conjecture as we find that it is typically easier to merge a network with a group-transformed version of itself than merging two different networks.

Viaarxiv icon

Privacy-Preserving Representations are not Enough -- Recovering Scene Content from Camera Poses

May 08, 2023
Kunal Chelani, Torsten Sattler, Fredrik Kahl, Zuzana Kukelova

Figure 1 for Privacy-Preserving Representations are not Enough -- Recovering Scene Content from Camera Poses
Figure 2 for Privacy-Preserving Representations are not Enough -- Recovering Scene Content from Camera Poses
Figure 3 for Privacy-Preserving Representations are not Enough -- Recovering Scene Content from Camera Poses
Figure 4 for Privacy-Preserving Representations are not Enough -- Recovering Scene Content from Camera Poses

Visual localization is the task of estimating the camera pose from which a given image was taken and is central to several 3D computer vision applications. With the rapid growth in the popularity of AR/VR/MR devices and cloud-based applications, privacy issues are becoming a very important aspect of the localization process. Existing work on privacy-preserving localization aims to defend against an attacker who has access to a cloud-based service. In this paper, we show that an attacker can learn about details of a scene without any access by simply querying a localization service. The attack is based on the observation that modern visual localization algorithms are robust to variations in appearance and geometry. While this is in general a desired property, it also leads to algorithms localizing objects that are similar enough to those present in a scene. An attacker can thus query a server with a large enough set of images of objects, \eg, obtained from the Internet, and some of them will be localized. The attacker can thus learn about object placements from the camera poses returned by the service (which is the minimal information returned by such a service). In this paper, we develop a proof-of-concept version of this attack and demonstrate its practical feasibility. The attack does not place any requirements on the localization algorithm used, and thus also applies to privacy-preserving representations. Current work on privacy-preserving representations alone is thus insufficient.

Viaarxiv icon

Improving Open-Set Semi-Supervised Learning with Self-Supervision

Jan 24, 2023
Erik Wallin, Lennart Svensson, Fredrik Kahl, Lars Hammarstrand

Figure 1 for Improving Open-Set Semi-Supervised Learning with Self-Supervision
Figure 2 for Improving Open-Set Semi-Supervised Learning with Self-Supervision
Figure 3 for Improving Open-Set Semi-Supervised Learning with Self-Supervision
Figure 4 for Improving Open-Set Semi-Supervised Learning with Self-Supervision

Open-set semi-supervised learning (OSSL) is a realistic setting of semi-supervised learning where the unlabeled training set contains classes that are not present in the labeled set. Many existing OSSL methods assume that these out-of-distribution data are harmful and put effort into excluding data from unknown classes from the training objective. In contrast, we propose an OSSL framework that facilitates learning from all unlabeled data through self-supervision. Additionally, we utilize an energy-based score to accurately recognize data belonging to the known classes, making our method well-suited for handling uncurated data in deployment. We show through extensive experimental evaluations on several datasets that our method shows overall unmatched robustness and performance in terms of closed-set accuracy and open-set recognition compared with state-of-the-art for OSSL. Our code will be released upon publication.

* Preprint 
Viaarxiv icon

In Search of Projectively Equivariant Neural Networks

Sep 29, 2022
Georg Bökman, Axel Flinth, Fredrik Kahl

Figure 1 for In Search of Projectively Equivariant Neural Networks
Figure 2 for In Search of Projectively Equivariant Neural Networks
Figure 3 for In Search of Projectively Equivariant Neural Networks
Figure 4 for In Search of Projectively Equivariant Neural Networks

Equivariance of linear neural network layers is well studied. In this work, we relax the equivariance condition to only be true in a projective sense. In particular, we study the relation of projective and ordinary equivariance and show that for important examples, the problems are in fact equivalent. The rotation group in 3D acts projectively on the projective plane. We experimentally study the practical importance of rotation equivariance when designing networks for filtering 2D-2D correspondences. Fully equivariant models perform poorly, and while a simple addition of invariant features to a strong baseline yields improvements, this seems to not be due to improved equivariance.

Viaarxiv icon

DoubleMatch: Improving Semi-Supervised Learning with Self-Supervision

May 11, 2022
Erik Wallin, Lennart Svensson, Fredrik Kahl, Lars Hammarstrand

Figure 1 for DoubleMatch: Improving Semi-Supervised Learning with Self-Supervision
Figure 2 for DoubleMatch: Improving Semi-Supervised Learning with Self-Supervision
Figure 3 for DoubleMatch: Improving Semi-Supervised Learning with Self-Supervision
Figure 4 for DoubleMatch: Improving Semi-Supervised Learning with Self-Supervision

Following the success of supervised learning, semi-supervised learning (SSL) is now becoming increasingly popular. SSL is a family of methods, which in addition to a labeled training set, also use a sizable collection of unlabeled data for fitting a model. Most of the recent successful SSL methods are based on pseudo-labeling approaches: letting confident model predictions act as training labels. While these methods have shown impressive results on many benchmark datasets, a drawback of this approach is that not all unlabeled data are used during training. We propose a new SSL algorithm, DoubleMatch, which combines the pseudo-labeling technique with a self-supervised loss, enabling the model to utilize all unlabeled data in the training process. We show that this method achieves state-of-the-art accuracies on multiple benchmark datasets while also reducing training times compared to existing SSL methods. Code is available at https://github.com/walline/doublematch.

* ICPR2022 
Viaarxiv icon

A case for using rotation invariant features in state of the art feature matchers

Apr 21, 2022
Georg Bökman, Fredrik Kahl

Figure 1 for A case for using rotation invariant features in state of the art feature matchers
Figure 2 for A case for using rotation invariant features in state of the art feature matchers
Figure 3 for A case for using rotation invariant features in state of the art feature matchers
Figure 4 for A case for using rotation invariant features in state of the art feature matchers

The aim of this paper is to demonstrate that a state of the art feature matcher (LoFTR) can be made more robust to rotations by simply replacing the backbone CNN with a steerable CNN which is equivariant to translations and image rotations. It is experimentally shown that this boost is obtained without reducing performance on ordinary illumination and viewpoint matching sequences.

* CVPRW 2022 camera ready 
Viaarxiv icon

Rigidity Preserving Image Transformations and Equivariance in Perspective

Jan 31, 2022
Lucas Brynte, Georg Bökman, Axel Flinth, Fredrik Kahl

Figure 1 for Rigidity Preserving Image Transformations and Equivariance in Perspective
Figure 2 for Rigidity Preserving Image Transformations and Equivariance in Perspective
Figure 3 for Rigidity Preserving Image Transformations and Equivariance in Perspective
Figure 4 for Rigidity Preserving Image Transformations and Equivariance in Perspective

We characterize the class of image plane transformations which realize rigid camera motions and call these transformations `rigidity preserving'. In particular, 2D translations of pinhole images are not rigidity preserving. Hence, when using CNNs for 3D inference tasks, it can be beneficial to modify the inductive bias from equivariance towards translations to equivariance towards rigidity preserving transformations. We investigate how equivariance with respect to rigidity preserving transformations can be approximated in CNNs, and test our ideas on both 6D object pose estimation and visual localization. Experimentally, we improve on several competitive baselines.

Viaarxiv icon

ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds

Nov 30, 2021
Georg Bökman, Fredrik Kahl, Axel Flinth

Figure 1 for ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds
Figure 2 for ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds
Figure 3 for ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds
Figure 4 for ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds

In this paper, we are concerned with rotation equivariance on 2D point cloud data. We describe a particular set of functions able to approximate any continuous rotation equivariant and permutation invariant function. Based on this result, we propose a novel neural network architecture for processing 2D point clouds and we prove its universality for approximating functions exhibiting these symmetries. We also show how to extend the architecture to accept a set of 2D-2D correspondences as indata, while maintaining similar equivariance properties. Experiments are presented on the estimation of essential matrices in stereo vision.

* 9 figures 
Viaarxiv icon