Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shangzhe Wu

CGOF++: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

Nov 23, 2022

Keqiang Sun, Shangzhe Wu, Ning Zhang, Zhaoyang Huang, Quan Wang, Hongsheng Li

Abstract:Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e.g., controlling the shapes, expressions, textures, and poses of the generated face images. However, previous methods focus on controllable 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF++) that effectively enforces the shape of the generated face to conform to a given 3D Morphable Model (3DMM) mesh, built on top of EG3D [1], a recent tri-plane-based generative model. To achieve accurate control over fine-grained 3D face shapes of the synthesized images, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis framework. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.

* This article is an extension of the NeurIPS'22 paper arXiv:2206.08361

Via

Access Paper or Ask Questions

MagicPony: Learning Articulated 3D Animals in the Wild

Nov 22, 2022

Shangzhe Wu, Ruining Li, Tomas Jakab, Christian Rupprecht, Andrea Vedaldi

Abstract:We consider the problem of learning a function that can estimate the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse, given a single test image. We present a new method, dubbed MagicPony, that learns this function purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object's shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To overcome common local optima in viewpoint estimation, we further introduce a new viewpoint sampling scheme that comes at no added training cost. Compared to prior works, we show significant quantitative and qualitative improvements on this challenging task. The model also demonstrates excellent generalisation in reconstructing abstract drawings and artefacts, despite the fact that it is only trained on real images.

* Project Page: https://3dmagicpony.github.io/

Via

Access Paper or Ask Questions

ONeRF: Unsupervised 3D Object Segmentation from Multiple Views

Nov 22, 2022

Shengnan Liang, Yichen Liu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

Abstract:We present ONeRF, a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations. The segmented 3D objects are represented using separate Neural Radiance Fields (NeRFs) which allow for various 3D scene editing and novel view rendering. At the core of our method is an unsupervised approach using the iterative Expectation-Maximization algorithm, which effectively aggregates 2D visual features and the corresponding 3D cues from multi-views for joint 3D object segmentation and reconstruction. Unlike existing approaches that can only handle simple objects, our method produces segmented full 3D NeRFs of individual objects with complex shapes, topologies and appearance. The segmented ONeRfs enable a range of 3D scene editing, such as object transformation, insertion and deletion.

Via

Access Paper or Ask Questions

Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

Jun 16, 2022

Keqiang Sun, Shangzhe Wu, Zhaoyang Huang, Ning Zhang, Quan Wang, HongSheng Li

Figure 1 for Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

Figure 2 for Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

Figure 3 for Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

Figure 4 for Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

Abstract:Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e.g., controlling the shapes, expressions, textures, and poses of the generated face images. However, these methods focus on 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF) that effectively enforces the shape of the generated face to commit to a given 3D Morphable Model (3DMM) mesh. To achieve accurate control over fine-grained 3D face shapes of the synthesized image, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis algorithm. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods. Find code and demo at https://keqiangsun.github.io/projects/cgof.

Via

Access Paper or Ask Questions

De-rendering 3D Objects in the Wild

Jan 06, 2022

Felix Wimbauer, Shangzhe Wu, Christian Rupprecht

Figure 1 for De-rendering 3D Objects in the Wild

Figure 2 for De-rendering 3D Objects in the Wild

Figure 3 for De-rendering 3D Objects in the Wild

Figure 4 for De-rendering 3D Objects in the Wild

Abstract:With increasing focus on augmented and virtual reality applications (XR) comes the demand for algorithms that can lift objects from images and videos into representations that are suitable for a wide variety of related 3D tasks. Large-scale deployment of XR devices and applications means that we cannot solely rely on supervised learning, as collecting and annotating data for the unlimited variety of objects in the real world is infeasible. We present a weakly supervised method that is able to decompose a single image of an object into shape (depth and normals), material (albedo, reflectivity and shininess) and global lighting parameters. For training, the method only relies on a rough initial shape estimate of the training objects to bootstrap the learning process. This shape supervision can come for example from a pretrained depth network or - more generically - from a traditional structure-from-motion pipeline. In our experiments, we show that the method can successfully de-render 2D images into a decomposed 3D representation and generalizes to unseen object categories. Since in-the-wild evaluation is difficult due to the lack of ground truth data, we also introduce a photo-realistic synthetic test set that allows for quantitative evaluation.

Via

Access Paper or Ask Questions

DOVE: Learning Deformable 3D Objects by Watching Videos

Jul 22, 2021

Shangzhe Wu, Tomas Jakab, Christian Rupprecht, Andrea Vedaldi

Figure 1 for DOVE: Learning Deformable 3D Objects by Watching Videos

Figure 2 for DOVE: Learning Deformable 3D Objects by Watching Videos

Figure 3 for DOVE: Learning Deformable 3D Objects by Watching Videos

Figure 4 for DOVE: Learning Deformable 3D Objects by Watching Videos

Abstract:Learning deformable 3D objects from 2D images is an extremely ill-posed problem. Existing methods rely on explicit supervision to establish multi-view correspondences, such as template shape models and keypoint annotations, which restricts their applicability on objects "in the wild". In this paper, we propose to use monocular videos, which naturally provide correspondences across time, allowing us to learn 3D shapes of deformable object categories without explicit keypoints or template shapes. Specifically, we present DOVE, which learns to predict 3D canonical shape, deformation, viewpoint and texture from a single 2D image of a bird, given a bird video collection as well as automatically obtained silhouettes and optical flows as training data. Our method reconstructs temporally consistent 3D shape and deformation, which allows us to animate and re-render the bird from arbitrary viewpoints from a single image.

* Project Page: https://dove3d.github.io/

Via

Access Paper or Ask Questions

De-rendering the World's Revolutionary Artefacts

Apr 08, 2021

Shangzhe Wu, Ameesh Makadia, Jiajun Wu, Noah Snavely, Richard Tucker, Angjoo Kanazawa

Figure 1 for De-rendering the World's Revolutionary Artefacts

Figure 2 for De-rendering the World's Revolutionary Artefacts

Figure 3 for De-rendering the World's Revolutionary Artefacts

Figure 4 for De-rendering the World's Revolutionary Artefacts

Abstract:Recent works have shown exciting results in unsupervised image de-rendering -- learning to decompose 3D shape, appearance, and lighting from single-image collections without explicit supervision. However, many of these assume simplistic material and lighting models. We propose a method, termed RADAR, that can recover environment illumination and surface materials from real single-image collections, relying neither on explicit 3D supervision, nor on multi-view or multi-light images. Specifically, we focus on rotationally symmetric artefacts that exhibit challenging surface properties including specular reflections, such as vases. We introduce a novel self-supervised albedo discriminator, which allows the model to recover plausible albedo without requiring any ground-truth during training. In conjunction with a shape reconstruction module exploiting rotational symmetry, we present an end-to-end learning framework that is able to de-render the world's revolutionary artefacts. We conduct experiments on a real vase dataset and demonstrate compelling decomposition results, allowing for applications including free-viewpoint rendering and relighting.

* CVPR 2021. Project page: https://sorderender.github.io/

Via

Access Paper or Ask Questions

NeRF--: Neural Radiance Fields Without Known Camera Parameters

Feb 19, 2021

Zirui Wang, Shangzhe Wu, Weidi Xie, Min Chen, Victor Adrian Prisacariu

Figure 1 for NeRF--: Neural Radiance Fields Without Known Camera Parameters

Figure 2 for NeRF--: Neural Radiance Fields Without Known Camera Parameters

Figure 3 for NeRF--: Neural Radiance Fields Without Known Camera Parameters

Figure 4 for NeRF--: Neural Radiance Fields Without Known Camera Parameters

Abstract:This paper tackles the problem of novel view synthesis (NVS) from 2D images without known camera poses and intrinsics. Among various NVS techniques, Neural Radiance Field (NeRF) has recently gained popularity due to its remarkable synthesis quality. Existing NeRF-based approaches assume that the camera parameters associated with each input image are either directly accessible at training, or can be accurately estimated with conventional techniques based on correspondences, such as Structure-from-Motion. In this work, we propose an end-to-end framework, termed NeRF--, for training NeRF models given only RGB images, without pre-computed camera parameters. Specifically, we show that the camera parameters, including both intrinsics and extrinsics, can be automatically discovered via joint optimisation during the training of the NeRF model. On the standard LLFF benchmark, our model achieves comparable novel view synthesis results compared to the baseline trained with COLMAP pre-computed camera parameters. We also conduct extensive analyses to understand the model behaviour under different camera trajectories, and show that in scenarios where COLMAP fails, our model still produces robust results.

* project page see nerfmm.active.vision

Via

Access Paper or Ask Questions

Self-Supervised Localisation between Range Sensors and Overhead Imagery

Jun 03, 2020

Tim Y. Tang, Daniele De Martini, Shangzhe Wu, Paul Newman

Figure 1 for Self-Supervised Localisation between Range Sensors and Overhead Imagery

Figure 2 for Self-Supervised Localisation between Range Sensors and Overhead Imagery

Figure 3 for Self-Supervised Localisation between Range Sensors and Overhead Imagery

Figure 4 for Self-Supervised Localisation between Range Sensors and Overhead Imagery

Abstract:Publicly available satellite imagery can be an ubiquitous, cheap, and powerful tool for vehicle localisation when a prior sensor map is unavailable. However, satellite images are not directly comparable to data from ground range sensors because of their starkly different modalities. We present a learned metric localisation method that not only handles the modality difference, but is cheap to train, learning in a self-supervised fashion without metrically accurate ground truth. By evaluating across multiple real-world datasets, we demonstrate the robustness and versatility of our method for various sensor configurations. We pay particular attention to the use of millimetre wave radar, which, owing to its complex interaction with the scene and its immunity to weather and lighting, makes for a compelling and valuable use case.

* Accepted to Robotics: Science and Systems (RSS) 2020

Via

Access Paper or Ask Questions

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Nov 25, 2019

Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi

Figure 1 for Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Figure 2 for Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Figure 3 for Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Figure 4 for Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Abstract:We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least in principle, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.

* Appendix included, 17 pages. Project page: https://elliottwu.com/projects/unsup3d/

Via

Access Paper or Ask Questions