Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ricardo Martin-Brualla

NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

Aug 13, 2020

Ricardo Martin-Brualla, Noha Radwan, Mehdi S. M. Sajjadi, Jonathan T. Barron, Alexey Dosovitskiy, Daniel Duckworth

Figure 1 for NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

Figure 2 for NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

Figure 3 for NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

Figure 4 for NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections

Abstract:We present a learning-based method for synthesizing novel views of complex outdoor scenes using only unstructured collections of in-the-wild photographs. We build on neural radiance fields (NeRF), which uses the weights of a multilayer perceptron to implicitly model the volumetric density and color of a scene. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. In this work, we introduce a series of extensions to NeRF to address these issues, thereby allowing for accurate reconstructions from unstructured image collections taken from the internet. We apply our system, which we dub NeRF-W, to internet photo collections of famous landmarks, thereby producing photorealistic, spatially consistent scene representations despite unknown and confounding factors, resulting in significant improvement over the state of the art.

* Project website: https://nerf-w.github.io. Ricardo Martin-Brualla, Noha Radwan, and Mehdi S. M. Sajjadi contributed equally to this work. Updated affiliations

Via

Access Paper or Ask Questions

GeLaTO: Generative Latent Textured Objects

Aug 11, 2020

Ricardo Martin-Brualla, Rohit Pandey, Sofien Bouaziz, Matthew Brown, Dan B Goldman

Figure 1 for GeLaTO: Generative Latent Textured Objects

Figure 2 for GeLaTO: Generative Latent Textured Objects

Figure 3 for GeLaTO: Generative Latent Textured Objects

Figure 4 for GeLaTO: Generative Latent Textured Objects

Abstract:Accurate modeling of 3D objects exhibiting transparency, reflections and thin structures is an extremely challenging problem. Inspired by billboards and geometric proxies used in computer graphics, this paper proposes Generative Latent Textured Objects (GeLaTO), a compact representation that combines a set of coarse shape proxies defining low frequency geometry with learned neural textures, to encode both medium and fine scale geometry as well as view-dependent appearance. To generate the proxies' textures, we learn a joint latent space allowing category-level appearance and geometry interpolation. The proxies are independently rasterized with their corresponding neural texture and composited using a U-Net, which generates an output photorealistic image including an alpha map. We demonstrate the effectiveness of our approach by reconstructing complex objects from a sparse set of views. We show results on a dataset of real images of eyeglasses frames, which are particularly challenging to reconstruct using classical methods. We also demonstrate that these coarse proxies can be handcrafted when the underlying object geometry is easy to model, like eyeglasses, or generated using a neural network for more complex categories, such as cars.

* European Conference on Computer Vision 2020
* ECCV 2020 Spotlight. Project website: https://gelato-paper.github.io

Via

Access Paper or Ask Questions

State of the Art on Neural Rendering

Apr 08, 2020

Ayush Tewari, Ohad Fried, Justus Thies, Vincent Sitzmann, Stephen Lombardi, Kalyan Sunkavalli, Ricardo Martin-Brualla, Tomas Simon, Jason Saragih, Matthias Nießner(+9 more)

Figure 1 for State of the Art on Neural Rendering

Figure 2 for State of the Art on Neural Rendering

Figure 3 for State of the Art on Neural Rendering

Figure 4 for State of the Art on Neural Rendering

Abstract:Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems.

* Eurographics 2020 survey paper

Via

Access Paper or Ask Questions

KeystoneDepth: Visualizing History in 3D

Sep 19, 2019

Xuan Luo, Yanmeng Kong, Jason Lawrence, Ricardo Martin-Brualla, Steve Seitz

Figure 1 for KeystoneDepth: Visualizing History in 3D

Figure 2 for KeystoneDepth: Visualizing History in 3D

Figure 3 for KeystoneDepth: Visualizing History in 3D

Figure 4 for KeystoneDepth: Visualizing History in 3D

Abstract:This paper introduces the largest and most diverse collection of rectified stereo image pairs to the research community, KeystoneDepth, consisting of tens of thousands of stereographs of historical people, events, objects, and scenes between 1860 and 1963. Leveraging the Keystone-Mast raw scans from the California Museum of Photography, we apply multiple processing steps to produce clean stereo image pairs, complete with calibration data, rectification transforms, and depthmaps. A second contribution is a novel approach for view synthesis that runs at real-time rates on a mobile device, simulating the experience of looking through an open window into these historical scenes. We produce results for thousands of antique stereographs, capturing many important historical moments.

* Project website: http://roxanneluo.github.io/KeystoneDepth.html , Video: https://youtu.be/5JrX-KKisC8 , More results: http://roxanneluo.github.io/keystonedepth_supplementary/index.html

Via

Access Paper or Ask Questions

Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

May 29, 2019

Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin(+2 more)

Figure 1 for Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

Figure 2 for Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

Figure 3 for Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

Figure 4 for Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

Abstract:Volumetric (4D) performance capture is fundamental for AR/VR content generation. Whereas previous work in 4D performance capture has shown impressive results in studio settings, the technology is still far from being accessible to a typical consumer who, at best, might own a single RGBD sensor. Thus, in this work, we propose a method to synthesize free viewpoint renderings using a single RGBD camera. The key insight is to leverage previously seen "calibration" images of a given user to extrapolate what should be rendered in a novel viewpoint from the data available in the sensor. Given these past observations from multiple viewpoints, and the current RGBD image from a fixed view, we propose an end-to-end framework that fuses both these data sources to generate novel renderings of the performer. We demonstrate that the method can produce high fidelity images, and handle extreme changes in subject pose and camera viewpoints. We also show that the system generalizes to performers not seen in the training data. We run exhaustive experiments demonstrating the effectiveness of the proposed semi-parametric model (i.e. calibration images available to the neural network) compared to other state of the art machine learned solutions. Further, we compare the method with more traditional pipelines that employ multi-view capture. We show that our framework is able to achieve compelling results, with substantially less infrastructure than previously required.

Via

Access Paper or Ask Questions

Neural Rerendering in the Wild

Apr 08, 2019

Moustafa Meshry, Dan B Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah Snavely, Ricardo Martin-Brualla

Figure 1 for Neural Rerendering in the Wild

Figure 2 for Neural Rerendering in the Wild

Figure 3 for Neural Rerendering in the Wild

Figure 4 for Neural Rerendering in the Wild

Abstract:We explore total scene capture -- recording, modeling, and rerendering a scene under varying appearance such as season and time of day. Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud. For each photo, we render the scene points into a deep framebuffer, and train a neural network to learn the mapping of these initial renderings to the actual photos. This rerendering network also takes as input a latent appearance vector and a semantic mask indicating the location of transient objects like pedestrians. The model is evaluated on several datasets of publicly available images spanning a broad range of illumination conditions. We create short videos demonstrating realistic manipulation of the image viewpoint, appearance, and semantic labeling. We also compare results with prior work on scene reconstruction from internet photos.

* To be presented at CVPR 2019 (oral). Supplementary video available at http://youtu.be/E1crWQn_kmY

Via

Access Paper or Ask Questions

LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering

Nov 12, 2018

Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln(+7 more)

Figure 1 for LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering

Figure 2 for LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering

Figure 3 for LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering

Figure 4 for LookinGood: Enhancing Performance Capture with Real-time Neural Re-Rendering

Abstract:Motivated by augmented and virtual reality applications such as telepresence, there has been a recent focus in real-time performance capture of humans under motion. However, given the real-time constraint, these systems often suffer from artifacts in geometry and texture such as holes and noise in the final rendering, poor lighting, and low-resolution textures. We take the novel approach to augment such real-time performance capture systems with a deep architecture that takes a rendering from an arbitrary viewpoint, and jointly performs completion, super resolution, and denoising of the imagery in real-time. We call this approach neural (re-)rendering, and our live system "LookinGood". Our deep architecture is trained to produce high resolution and high quality images from a coarse rendering in real-time. First, we propose a self-supervised training method that does not require manual ground-truth annotation. We contribute a specialized reconstruction error that uses semantic information to focus on relevant parts of the subject, e.g. the face. We also introduce a salient reweighing scheme of the loss function that is able to discard outliers. We specifically design the system for virtual and augmented reality headsets where the consistency between the left and right eye plays a crucial role in the final user experience. Finally, we generate temporally stable results by explicitly minimizing the difference between two consecutive frames. We tested the proposed system in two different scenarios: one involving a single RGB-D sensor, and upper body reconstruction of an actor, the second consisting of full body 360 degree capture. Through extensive experimentation, we demonstrate how our system generalizes across unseen sequences and subjects. The supplementary video is available at http://youtu.be/Md3tdAKoLGU.

* The supplementary video is available at: http://youtu.be/Md3tdAKoLGU To be presented at SIGGRAPH Asia 2018

Via

Access Paper or Ask Questions

3D Time-lapse Reconstruction from Internet Photos

Nov 10, 2015

Ricardo Martin-Brualla, David Gallup, Steven M. Seitz

Figure 1 for 3D Time-lapse Reconstruction from Internet Photos

Figure 2 for 3D Time-lapse Reconstruction from Internet Photos

Figure 3 for 3D Time-lapse Reconstruction from Internet Photos

Figure 4 for 3D Time-lapse Reconstruction from Internet Photos

Abstract:Given an Internet photo collection of a landmark, we compute a 3D time-lapse video sequence where a virtual camera moves continuously in time and space. While previous work assumed a static camera, the addition of camera motion during the time-lapse creates a very compelling impression of parallax. Achieving this goal, however, requires addressing multiple technical challenges, including solving for time-varying depth maps, regularizing 3D point color profiles over time, and reconstructing high quality, hole-free images at every frame from the projected profiles. Our results show photorealistic time-lapses of skylines and natural scenes over many years, with dramatic parallax effects.

* To appear in ICCV'15. Supplementary video at: http://grail.cs.washington.edu/projects/timelapse3d/

Via

Access Paper or Ask Questions