Alert button
Picture for Mark Matthews

Mark Matthews

Alert button

MELON: NeRF with Unposed Images Using Equivalence Class Estimation

Mar 14, 2023
Axel Levy, Mark Matthews, Matan Sela, Gordon Wetzstein, Dmitry Lagun

Figure 1 for MELON: NeRF with Unposed Images Using Equivalence Class Estimation
Figure 2 for MELON: NeRF with Unposed Images Using Equivalence Class Estimation
Figure 3 for MELON: NeRF with Unposed Images Using Equivalence Class Estimation
Figure 4 for MELON: NeRF with Unposed Images Using Equivalence Class Estimation

Neural radiance fields enable novel-view synthesis and scene reconstruction with photorealistic quality from a few images, but require known and accurate camera poses. Conventional pose estimation algorithms fail on smooth or self-similar scenes, while methods performing inverse rendering from unposed views require a rough initialization of the camera orientations. The main difficulty of pose estimation lies in real-life objects being almost invariant under certain transformations, making the photometric distance between rendered views non-convex with respect to the camera parameters. Using an equivalence relation that matches the distribution of local minima in camera space, we reduce this space to its quotient set, in which pose estimation becomes a more convex problem. Using a neural-network to regularize pose estimation, we demonstrate that our method - MELON - can reconstruct a neural radiance field from unposed images with state-of-the-art accuracy while requiring ten times fewer views than adversarial approaches.

Viaarxiv icon

CUF: Continuous Upsampling Filters

Oct 20, 2022
Cristina Vasconcelos, Cengiz Oztireli, Mark Matthews, Milad Hashemi, Kevin Swersky, Andrea Tagliasacchi

Figure 1 for CUF: Continuous Upsampling Filters
Figure 2 for CUF: Continuous Upsampling Filters
Figure 3 for CUF: Continuous Upsampling Filters
Figure 4 for CUF: Continuous Upsampling Filters

Neural fields have rapidly been adopted for representing 3D signals, but their application to more classical 2D image-processing has been relatively limited. In this paper, we consider one of the most important operations in image processing: upsampling. In deep learning, learnable upsampling layers have extensively been used for single image super-resolution. We propose to parameterize upsampling kernels as neural fields. This parameterization leads to a compact architecture that obtains a 40-fold reduction in the number of parameters when compared with competing arbitrary-scale super-resolution architectures. When upsampling images of size 256x256 we show that our architecture is 2x-10x more efficient than competing arbitrary-scale super-resolution architectures, and more efficient than sub-pixel convolutions when instantiated to a single-scale model. In the general setting, these gains grow polynomially with the square of the target scale. We validate our method on standard benchmarks showing such efficiency gains can be achieved without sacrifices in super-resolution performance.

Viaarxiv icon

LOLNeRF: Learn from One Look

Nov 19, 2021
Daniel Rebain, Mark Matthews, Kwang Moo Yi, Dmitry Lagun, Andrea Tagliasacchi

Figure 1 for LOLNeRF: Learn from One Look
Figure 2 for LOLNeRF: Learn from One Look
Figure 3 for LOLNeRF: Learn from One Look
Figure 4 for LOLNeRF: Learn from One Look

We present a method for learning a generative 3D model based on neural radiance fields, trained solely from data with only single views of each object. While generating realistic images is no longer a difficult task, producing the corresponding 3D structure such that they can be rendered from different views is non-trivial. We show that, unlike existing methods, one does not need multi-view data to achieve this goal. Specifically, we show that by reconstructing many images aligned to an approximate canonical pose with a single network conditioned on a shared latent space, you can learn a space of radiance fields that models shape and appearance for a class of objects. We demonstrate this by training models to reconstruct object categories using datasets that contain only one view of each subject without depth or geometry information. Our experiments show that we achieve state-of-the-art results in novel view synthesis and competitive results for monocular depth prediction.

Viaarxiv icon