Alert button
Picture for Michael Niemeyer

Michael Niemeyer

Alert button

TextMesh: Generation of Realistic 3D Meshes From Text Prompts

Apr 24, 2023
Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, Michael Niemeyer, Federico Tombari

Figure 1 for TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Figure 2 for TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Figure 3 for TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Figure 4 for TextMesh: Generation of Realistic 3D Meshes From Text Prompts

The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on 2D images, for supervision of 3D model generation using view dependent prompts. While achieving impressive results, these methods, however, have two major drawbacks. First, rather than commonly used 3D meshes, they instead generate neural radiance fields (NeRFs), making them impractical for most real applications. Second, these approaches tend to produce over-saturated models, giving the output a cartoonish looking effect. Therefore, in this work we propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction. In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.

* Project Website: https://fabi92.github.io/textmesh/ 
Viaarxiv icon

NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM

Mar 29, 2023
Hidenobu Matsuki, Keisuke Tateno, Michael Niemeyer, Federico Tombari

Figure 1 for NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM
Figure 2 for NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM
Figure 3 for NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM
Figure 4 for NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM

Neural field-based 3D representations have recently been adopted in many areas including SLAM systems. Current neural SLAM or online mapping systems lead to impressive results in the presence of simple captures, but they rely on a world-centric map representation as only a single neural field model is used. To define such a world-centric representation, accurate and static prior information about the scene, such as its boundaries and initial camera poses, are required. However, in real-time and on-the-fly scene capture applications, this prior knowledge cannot be assumed as fixed or static, since it dynamically changes and it is subject to significant updates based on run-time observations. Particularly in the context of large-scale mapping, significant camera pose drift is inevitable, necessitating the correction via loop closure. To overcome this limitation, we propose NEWTON, a view-centric mapping method that dynamically constructs neural fields based on run-time observation. In contrast to prior works, our method enables camera pose updates using loop closures and scene boundary updates by representing the scene with multiple neural fields, where each is defined in a local coordinate system of a selected keyframe. The experimental results demonstrate the superior performance of our method over existing world-centric neural field-based SLAM systems, in particular for large-scale scenes subject to camera pose updates.

Viaarxiv icon

DreamBooth3D: Subject-Driven Text-to-3D Generation

Mar 27, 2023
Amit Raj, Srinivas Kaza, Ben Poole, Michael Niemeyer, Nataniel Ruiz, Ben Mildenhall, Shiran Zada, Kfir Aberman, Michael Rubinstein, Jonathan Barron, Yuanzhen Li, Varun Jampani

Figure 1 for DreamBooth3D: Subject-Driven Text-to-3D Generation
Figure 2 for DreamBooth3D: Subject-Driven Text-to-3D Generation
Figure 3 for DreamBooth3D: Subject-Driven Text-to-3D Generation
Figure 4 for DreamBooth3D: Subject-Driven Text-to-3D Generation

We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject. We overcome this through a 3-stage optimization strategy where we jointly leverage the 3D consistency of neural radiance fields together with the personalization capability of text-to-image models. Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.

* Project page at https://dreambooth3d.github.io/ Video Summary at https://youtu.be/kKVDrbfvOoA 
Viaarxiv icon

NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

Mar 16, 2023
Marie-Julie Rakotosaona, Fabian Manhardt, Diego Martin Arroyo, Michael Niemeyer, Abhijit Kundu, Federico Tombari

Figure 1 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes
Figure 2 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes
Figure 3 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes
Figure 4 for NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

With the introduction of Neural Radiance Fields (NeRFs), novel view synthesis has recently made a big leap forward. At the core, NeRF proposes that each 3D point can emit radiance, allowing to conduct view synthesis using differentiable volumetric rendering. While neural radiance fields can accurately represent 3D scenes for computing the image rendering, 3D meshes are still the main scene representation supported by most computer graphics and simulation pipelines, enabling tasks such as real time rendering and physics-based simulations. Obtaining 3D meshes from neural radiance fields still remains an open challenge since NeRFs are optimized for view synthesis, not enforcing an accurate underlying geometry on the radiance field. We thus propose a novel compact and flexible architecture that enables easy 3D surface reconstruction from any NeRF-driven approach. Upon having trained the radiance field, we distill the volumetric 3D representation into a Signed Surface Approximation Network, allowing easy extraction of the 3D mesh and appearance. Our final 3D mesh is physically accurate and can be rendered in real time on an array of devices.

Viaarxiv icon

VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids

Jun 17, 2022
Katja Schwarz, Axel Sauer, Michael Niemeyer, Yiyi Liao, Andreas Geiger

Figure 1 for VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
Figure 2 for VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
Figure 3 for VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
Figure 4 for VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids

State-of-the-art 3D-aware generative models rely on coordinate-based MLPs to parameterize 3D radiance fields. While demonstrating impressive results, querying an MLP for every sample along each ray leads to slow rendering. Therefore, existing approaches often render low-resolution feature maps and process them with an upsampling network to obtain the final image. Albeit efficient, neural rendering often entangles viewpoint and content such that changing the camera pose results in unwanted changes of geometry or appearance. Motivated by recent results in voxel-based novel view synthesis, we investigate the utility of sparse voxel grid representations for fast and 3D-consistent generative modeling in this paper. Our results demonstrate that monolithic MLPs can indeed be replaced by 3D convolutions when combining sparse voxel grids with progressive growing, free space pruning and appropriate regularization. To obtain a compact representation of the scene and allow for scaling to higher voxel resolutions, our model disentangles the foreground object (modeled in 3D) from the background (modeled in 2D). In contrast to existing approaches, our method requires only a single forward pass to generate a full 3D scene. It hence allows for efficient rendering from arbitrary viewpoints while yielding 3D consistent results with high visual fidelity.

Viaarxiv icon

MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction

Jun 01, 2022
Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sattler, Andreas Geiger

Figure 1 for MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
Figure 2 for MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
Figure 3 for MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
Figure 4 for MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction

In recent years, neural implicit surface reconstruction methods have become popular for multi-view 3D reconstruction. In contrast to traditional multi-view stereo methods, these approaches tend to produce smoother and more complete reconstructions due to the inductive smoothness bias of neural networks. State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views. Yet, their performance drops significantly for larger and more complex scenes and scenes captured from sparse viewpoints. This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints, in particular in less-observed and textureless areas. Motivated by recent advances in the area of monocular geometry prediction, we systematically explore the utility these cues provide for improving neural implicit surface reconstruction. We demonstrate that depth and normal cues, predicted by general-purpose monocular estimators, significantly improve reconstruction quality and optimization time. Further, we analyse and investigate multiple design choices for representing neural implicit surfaces, ranging from monolithic MLP models over single-grid to multi-resolution grid representations. We observe that geometric monocular priors improve performance both for small-scale single-object as well as large-scale multi-object scenes, independent of the choice of representation.

* Project page: https://niujinshuchong.github.io/monosdf/ 
Viaarxiv icon

RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs

Dec 01, 2021
Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, Noha Radwan

Figure 1 for RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Figure 2 for RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Figure 3 for RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Figure 4 for RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs

Neural Radiance Fields (NeRF) have emerged as a powerful representation for the task of novel view synthesis due to their simplicity and state-of-the-art performance. Though NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available, its performance drops significantly when this number is reduced. We observe that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training. We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints, and annealing the ray sampling space during training. We additionally use a normalizing flow model to regularize the color of unobserved viewpoints. Our model outperforms not only other methods that optimize over a single scene, but in many cases also conditional models that are extensively pre-trained on large multi-view datasets.

* Project page available at https://m-niemeyer.github.io/regnerf/index.html 
Viaarxiv icon

Shape As Points: A Differentiable Poisson Solver

Jun 07, 2021
Songyou Peng, Chiyu "Max" Jiang, Yiyi Liao, Michael Niemeyer, Marc Pollefeys, Andreas Geiger

Figure 1 for Shape As Points: A Differentiable Poisson Solver
Figure 2 for Shape As Points: A Differentiable Poisson Solver
Figure 3 for Shape As Points: A Differentiable Poisson Solver
Figure 4 for Shape As Points: A Differentiable Poisson Solver

In recent years, neural implicit representations gained popularity in 3D reconstruction due to their expressiveness and flexibility. However, the implicit nature of neural implicit representations results in slow inference time and requires careful initialization. In this paper, we revisit the classic yet ubiquitous point cloud representation and introduce a differentiable point-to-mesh layer using a differentiable formulation of Poisson Surface Reconstruction (PSR) that allows for a GPU-accelerated fast solution of the indicator function given an oriented point cloud. The differentiable PSR layer allows us to efficiently and differentiably bridge the explicit 3D point representation with the 3D mesh via the implicit indicator field, enabling end-to-end optimization of surface reconstruction metrics such as Chamfer distance. This duality between points and meshes hence allows us to represent shapes as oriented point clouds, which are explicit, lightweight and expressive. Compared to neural implicit representations, our Shape-As-Points (SAP) model is more interpretable, lightweight, and accelerates inference time by one order of magnitude. Compared to other explicit representations such as points, patches, and meshes, SAP produces topology-agnostic, watertight manifold surfaces. We demonstrate the effectiveness of SAP on the task of surface reconstruction from unoriented point clouds and learning-based reconstruction.

Viaarxiv icon

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Mar 31, 2021
Michael Niemeyer, Andreas Geiger

Figure 1 for CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields
Figure 2 for CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields
Figure 3 for CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields
Figure 4 for CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Tremendous progress in deep generative models has led to photorealistic image synthesis. While achieving compelling results, most approaches operate in the two-dimensional image domain, ignoring the three-dimensional nature of our world. Several recent works therefore propose generative models which are 3D-aware, i.e., scenes are modeled in 3D and then rendered differentiably to the image plane. This leads to impressive 3D consistency, but incorporating such a bias comes at a price: the camera needs to be modeled as well. Current approaches assume fixed intrinsics and a predefined prior over camera pose ranges. As a result, parameter tuning is typically required for real-world data, and results degrade if the data distribution is not matched. Our key hypothesis is that learning a camera generator jointly with the image generator leads to a more principled approach to 3D-aware image synthesis. Further, we propose to decompose the scene into a background and foreground model, leading to more efficient and disentangled scene representations. While training from raw, unposed image collections, we learn a 3D- and camera-aware generative model which faithfully recovers not only the image but also the camera data distribution. At test time, our model generates images with explicit control over the camera as well as the shape and appearance of the scene.

Viaarxiv icon