Alert button
Picture for Tengfei Wang

Tengfei Wang

Alert button

Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

Apr 03, 2023
Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, Dong Chen

Figure 1 for Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
Figure 2 for Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
Figure 3 for Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
Figure 4 for Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior

In this work, we investigate the problem of creating high-fidelity 3D content from only a single image. This is inherently challenging: it essentially involves estimating the underlying 3D geometry while simultaneously hallucinating unseen textures. To address this challenge, we leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation. Our approach, Make-It-3D, employs a two-stage optimization pipeline: the first stage optimizes a neural radiance field by incorporating constraints from the reference image at the frontal view and diffusion prior at novel views; the second stage transforms the coarse model into textured point clouds and further elevates the realism with diffusion prior while leveraging the high-quality textures from the reference image. Extensive experiments demonstrate that our method outperforms prior works by a large margin, resulting in faithful reconstructions and impressive visual quality. Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.

* 17 pages, 18 figures, Project page: https://make-it-3d.github.io/ 
Viaarxiv icon

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

Dec 12, 2022
Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo

Figure 1 for Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
Figure 2 for Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
Figure 3 for Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
Figure 4 for Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields. A significant challenge in generating such avatars is that the memory and processing costs in 3D are prohibitive for producing the rich details required for high-quality avatars. To tackle this problem we propose the roll-out diffusion network (Rodin), which represents a neural radiance field as multiple 2D feature maps and rolls out these maps into a single 2D feature plane within which we perform 3D-aware diffusion. The Rodin model brings the much-needed computational efficiency while preserving the integrity of diffusion in 3D by using 3D-aware convolution that attends to projected features in the 2D feature plane according to their original relationship in 3D. We also use latent conditioning to orchestrate the feature generation for global coherence, leading to high-fidelity avatars and enabling their semantic editing based on text prompts. Finally, we use hierarchical synthesis to further enhance details. The 3D avatars generated by our model compare favorably with those produced by existing generative techniques. We can generate highly detailed avatars with realistic hairstyles and facial hair like beards. We also demonstrate 3D avatar generation from image or text as well as text-guided editability.

* Project Webpage: https://3d-avatar-diffusion.microsoft.com/ 
Viaarxiv icon

3D GAN Inversion with Facial Symmetry Prior

Nov 30, 2022
Fei Yin, Yong Zhang, Xuan Wang, Tengfei Wang, Xiaoyu Li, Yuan Gong, Yanbo Fan, Xiaodong Cun, Ying Shan, Cengiz Oztireli, Yujiu Yang

Figure 1 for 3D GAN Inversion with Facial Symmetry Prior
Figure 2 for 3D GAN Inversion with Facial Symmetry Prior
Figure 3 for 3D GAN Inversion with Facial Symmetry Prior
Figure 4 for 3D GAN Inversion with Facial Symmetry Prior

Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power of neural rendering. It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion. Although with the facial prior preserved in pre-trained 3D GANs, reconstructing a 3D portrait with only one monocular image is still an ill-pose problem. The straightforward application of 2D GAN inversion methods focuses on texture similarity only while ignoring the correctness of 3D geometry shapes. It may raise geometry collapse effects, especially when reconstructing a side face under an extreme pose. Besides, the synthetic results in novel views are prone to be blurry. In this work, we propose a novel method to promote 3D GAN inversion by introducing facial symmetry prior. We design a pipeline and constraints to make full use of the pseudo auxiliary view obtained via image flipping, which helps obtain a robust and reasonable geometry shape during the inversion process. To enhance texture fidelity in unobserved viewpoints, pseudo labels from depth-guided 3D warping can provide extra supervision. We design constraints aimed at filtering out conflict areas for optimization in asymmetric situations. Comprehensive quantitative and qualitative evaluations on image reconstruction and editing demonstrate the superiority of our method.

* Project Page is at https://feiiyin.github.io/SPI/ 
Viaarxiv icon

Pretraining is All You Need for Image-to-Image Translation

May 25, 2022
Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, Fang Wen

Figure 1 for Pretraining is All You Need for Image-to-Image Translation
Figure 2 for Pretraining is All You Need for Image-to-Image Translation
Figure 3 for Pretraining is All You Need for Image-to-Image Translation
Figure 4 for Pretraining is All You Need for Image-to-Image Translation

We propose to use pretraining to boost general image-to-image translation. Prior image-to-image translation methods usually need dedicated architectural design and train individual translation models from scratch, struggling for high-quality generation of complex scenes, especially when paired training data are not abundant. In this paper, we regard each image-to-image translation problem as a downstream task and introduce a simple and generic framework that adapts a pretrained diffusion model to accommodate various kinds of image-to-image translation. We also propose adversarial training to enhance the texture synthesis in the diffusion model training, in conjunction with normalized guidance sampling to improve the generation quality. We present extensive empirical comparison across various tasks on challenging benchmarks such as ADE20K, COCO-Stuff, and DIODE, showing the proposed pretraining-based image-to-image translation (PITI) is capable of synthesizing images of unprecedented realism and faithfulness.

* Project Page: https://tengfei-wang.github.io/PITI/index.html 
Viaarxiv icon

High-Fidelity GAN Inversion for Image Attribute Editing

Sep 15, 2021
Tengfei Wang, Yong Zhang, Yanbo Fan, Jue Wang, Qifeng Chen

Figure 1 for High-Fidelity GAN Inversion for Image Attribute Editing
Figure 2 for High-Fidelity GAN Inversion for Image Attribute Editing
Figure 3 for High-Fidelity GAN Inversion for Image Attribute Editing
Figure 4 for High-Fidelity GAN Inversion for Image Attribute Editing

We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved (e.g., background, appearance and illumination). We first formulate GAN inversion as a lossy data compression problem and carefully discuss the Rate-Distortion-Edit trade-off. Due to this trade-off, previous works fail to achieve high-fidelity reconstruction while keeping compelling editing ability with a low bit-rate latent code only. In this work, we propose a distortion consultation approach that employs the distortion map as a reference for reconstruction. In the distortion consultation inversion (DCI), the distortion map is first projected to a high-rate latent map, which then complements the basic low-rate latent code with (lost) details via consultation fusion. To achieve high-fidelity editing, we propose an adaptive distortion alignment (ADA) module with a self-supervised training scheme. Extensive experiments in the face and car domains show a clear improvement in terms of both inversion and editing quality.

* Project Page is at https://tengfei-wang.github.io/HFGI/ 
Viaarxiv icon

Dual-Camera Super-Resolution with Aligned Attention Modules

Sep 06, 2021
Tengfei Wang, Jiaxin Xie, Wenxiu Sun, Qiong Yan, Qifeng Chen

Figure 1 for Dual-Camera Super-Resolution with Aligned Attention Modules
Figure 2 for Dual-Camera Super-Resolution with Aligned Attention Modules
Figure 3 for Dual-Camera Super-Resolution with Aligned Attention Modules
Figure 4 for Dual-Camera Super-Resolution with Aligned Attention Modules

We present a novel approach to reference-based super-resolution (RefSR) with the focus on dual-camera super-resolution (DCSR), which utilizes reference images for high-quality and high-fidelity results. Our proposed method generalizes the standard patch-based feature matching with spatial alignment operations. We further explore the dual-camera super-resolution that is one promising application of RefSR, and build a dataset that consists of 146 image pairs from the main and telephoto cameras in a smartphone. To bridge the domain gaps between real-world images and the training images, we propose a self-supervised domain adaptation strategy for real-world images. Extensive experiments on our dataset and a public benchmark demonstrate clear improvement achieved by our method over state of the art in both quantitative evaluation and visual comparisons.

* Accepted to ICCV 2021 (oral) 
Viaarxiv icon

Internal Video Inpainting by Implicit Long-range Propagation

Aug 17, 2021
Hao Ouyang, Tengfei Wang, Qifeng Chen

Figure 1 for Internal Video Inpainting by Implicit Long-range Propagation
Figure 2 for Internal Video Inpainting by Implicit Long-range Propagation
Figure 3 for Internal Video Inpainting by Implicit Long-range Propagation
Figure 4 for Internal Video Inpainting by Implicit Long-range Propagation

We propose a novel framework for video inpainting by adopting an internal learning strategy. Unlike previous methods that use optical flow for cross-frame context propagation to inpaint unknown regions, we show that this can be achieved implicitly by fitting a convolutional neural network to known regions. Moreover, to handle challenging sequences with ambiguous backgrounds or long-term occlusion, we design two regularization terms to preserve high-frequency details and long-term temporal consistency. Extensive experiments on the DAVIS dataset demonstrate that the proposed method achieves state-of-the-art inpainting quality quantitatively and qualitatively. We further extend the proposed method to another challenging task: learning to remove an object from a video giving a single object mask in only one frame in a 4K video.

* ICCV 2021 
Viaarxiv icon