Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"photo": models, code, and papers

Towards a Pipeline for Real-Time Visualization of Faces for VR-based Telepresence and Live Broadcasting Utilizing Neural Rendering

Jan 04, 2023
Philipp Ladwig, Rene Ebertowski, Alexander Pech, Ralf Dörner, Christian Geiger

Figure 1 for Towards a Pipeline for Real-Time Visualization of Faces for VR-based Telepresence and Live Broadcasting Utilizing Neural Rendering

Figure 2 for Towards a Pipeline for Real-Time Visualization of Faces for VR-based Telepresence and Live Broadcasting Utilizing Neural Rendering

Figure 3 for Towards a Pipeline for Real-Time Visualization of Faces for VR-based Telepresence and Live Broadcasting Utilizing Neural Rendering

Figure 4 for Towards a Pipeline for Real-Time Visualization of Faces for VR-based Telepresence and Live Broadcasting Utilizing Neural Rendering

While head-mounted displays (HMDs) for Virtual Reality (VR) have become widely available in the consumer market, they pose a considerable obstacle for a realistic face-to-face conversation in VR since HMDs hide a significant portion of the participants faces. Even with image streams from cameras directly attached to an HMD, stitching together a convincing image of an entire face remains a challenging task because of extreme capture angles and strong lens distortions due to a wide field of view. Compared to the long line of research in VR, reconstruction of faces hidden beneath an HMD is a very recent topic of research. While the current state-of-the-art solutions demonstrate photo-realistic 3D reconstruction results, they require high-cost laboratory equipment and large computational costs. We present an approach that focuses on low-cost hardware and can be used on a commodity gaming computer with a single GPU. We leverage the benefits of an end-to-end pipeline by means of Generative Adversarial Networks (GAN). Our GAN produces a frontal-facing 2.5D point cloud based on a training dataset captured with an RGBD camera. In our approach, the training process is offline, while the reconstruction runs in real-time. Our results show adequate reconstruction quality within the 'learned' expressions. Expressions not learned by the network produce artifacts and can trigger the Uncanny Valley effect.

Via

Access Paper or Ask Questions

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

Dec 01, 2022
Yinhuai Wang, Jiwen Yu, Jian Zhang

Figure 1 for Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

Figure 2 for Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

Figure 3 for Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

Figure 4 for Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model

Most existing Image Restoration (IR) models are task-specific, which can not be generalized to different degradation operators. In this work, we propose the Denoising Diffusion Null-Space Model (DDNM), a novel zero-shot framework for arbitrary linear IR problems, including but not limited to image super-resolution, colorization, inpainting, compressed sensing, and deblurring. DDNM only needs a pre-trained off-the-shelf diffusion model as the generative prior, without any extra training or network modifications. By refining only the null-space contents during the reverse diffusion process, we can yield diverse results satisfying both data consistency and realness. We further propose an enhanced and robust version, dubbed DDNM+, to support noisy restoration and improve restoration quality for hard tasks. Our experiments on several IR tasks reveal that DDNM outperforms other state-of-the-art zero-shot IR methods. We also demonstrate that DDNM+ can solve complex real-world applications, e.g., old photo restoration.

Via

Access Paper or Ask Questions

High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

Nov 29, 2022
Jiaxin Xie, Hao Ouyang, Jingtan Piao, Chenyang Lei, Qifeng Chen

Figure 1 for High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

Figure 2 for High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

Figure 3 for High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

Figure 4 for High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization

We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image. High-fidelity 3D GAN inversion is inherently challenging due to the geometry-texture trade-off in 3D inversion, where overfitting to a single view input image often damages the estimated geometry during the latent optimization. To solve this challenge, we propose a novel pipeline that builds on the pseudo-multi-view estimation with visibility analysis. We keep the original textures for the visible parts and utilize generative priors for the occluded parts. Extensive experiments show that our approach achieves advantageous reconstruction and novel view synthesis quality over state-of-the-art methods, even for images with out-of-distribution textures. The proposed pipeline also enables image attribute editing with the inverted latent code and 3D-aware texture modification. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.

* Project website: https://ken-ouyang.github.io/HFGI3D/index.html ; Github link: https://github.com/jiaxinxie97/HFGI3D

Via

Access Paper or Ask Questions

EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points

Dec 07, 2022
Chengwei Zheng, Wenbin Lin, Feng Xu

Figure 1 for EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points

Figure 2 for EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points

Figure 3 for EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points

Figure 4 for EditableNeRF: Editing Topologically Varying Neural Radiance Fields by Key Points

Neural radiance fields (NeRF) achieve highly photo-realistic novel-view synthesis, but it's a challenging problem to edit the scenes modeled by NeRF-based methods, especially for dynamic scenes. We propose editable neural radiance fields that enable end-users to easily edit dynamic scenes and even support topological changes. Input with an image sequence from a single camera, our network is trained fully automatically and models topologically varying dynamics using our picked-out surface key points. Then end-users can edit the scene by easily dragging the key points to desired new positions. To achieve this, we propose a scene analysis method to detect and initialize key points by considering the dynamics in the scene, and a weighted key points strategy to model topologically varying dynamics by joint key points and weights optimization. Our method supports intuitive multi-dimensional (up to 3D) editing and can generate novel scenes that are unseen in the input sequence. Experiments demonstrate that our method achieves high-quality editing on various dynamic scenes and outperforms the state-of-the-art. We will release our code and captured data.

Via

Access Paper or Ask Questions

Instant Volumetric Head Avatars

Nov 22, 2022
Wojciech Zielonka, Timo Bolkart, Justus Thies

Figure 1 for Instant Volumetric Head Avatars

Figure 2 for Instant Volumetric Head Avatars

Figure 3 for Instant Volumetric Head Avatars

Figure 4 for Instant Volumetric Head Avatars

We present Instant Volumetric Head Avatars (INSTA), a novel approach for reconstructing photo-realistic digital avatars instantaneously. INSTA models a dynamic neural radiance field based on neural graphics primitives embedded around a parametric face model. Our pipeline is trained on a single monocular RGB portrait video that observes the subject under different expressions and views. While state-of-the-art methods take up to several days to train an avatar, our method can reconstruct a digital avatar in less than 10 minutes on modern GPU hardware, which is orders of magnitude faster than previous solutions. In addition, it allows for the interactive rendering of novel poses and expressions. By leveraging the geometry prior of the underlying parametric face model, we demonstrate that INSTA extrapolates to unseen poses. In quantitative and qualitative studies on various subjects, INSTA outperforms state-of-the-art methods regarding rendering quality and training time.

* Website: https://zielon.github.io/insta/ Video: https://youtu.be/HOgaeWTih7Q

Via

Access Paper or Ask Questions

Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Dec 12, 2022
Anni Tang, Tianyu He, Xu Tan, Jun Ling, Runnan Li, Sheng Zhao, Li Song, Jiang Bian

Figure 1 for Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Figure 2 for Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Figure 3 for Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Figure 4 for Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly.

* Project page: see https://memoryface.github.io

Via

Access Paper or Ask Questions

High-Res Facial Appearance Capture from Polarized Smartphone Images

Dec 02, 2022
Dejan Azinović, Olivier Maury, Christophe Hery, Mathias Nießner, Justus Thies

Figure 1 for High-Res Facial Appearance Capture from Polarized Smartphone Images

Figure 2 for High-Res Facial Appearance Capture from Polarized Smartphone Images

Figure 3 for High-Res Facial Appearance Capture from Polarized Smartphone Images

Figure 4 for High-Res Facial Appearance Capture from Polarized Smartphone Images

We propose a novel method for high-quality facial texture reconstruction from RGB images using a novel capturing routine based on a single smartphone which we equip with an inexpensive polarization foil. Specifically, we turn the flashlight into a polarized light source and add a polarization filter on top of the camera. Leveraging this setup, we capture the face of a subject with cross-polarized and parallel-polarized light. For each subject, we record two short sequences in a dark environment under flash illumination with different light polarization using the modified smartphone. Based on these observations, we reconstruct an explicit surface mesh of the face using structure from motion. We then exploit the camera and light co-location within a differentiable renderer to optimize the facial textures using an analysis-by-synthesis approach. Our method optimizes for high-resolution normal textures, diffuse albedo, and specular albedo using a coarse-to-fine optimization scheme. We show that the optimized textures can be used in a standard rendering pipeline to synthesize high-quality photo-realistic 3D digital humans in novel environments.

* Project page: https://dazinovic.github.io/polface/ Video: https://www.youtube.com/watch?v=jnb4V0qURtc

Via

Access Paper or Ask Questions

HumanGen: Generating Human Radiance Fields with Explicit Priors

Dec 10, 2022
Suyi Jiang, Haoran Jiang, Ziyu Wang, Haimin Luo, Wenzheng Chen, Lan Xu

Figure 1 for HumanGen: Generating Human Radiance Fields with Explicit Priors

Figure 2 for HumanGen: Generating Human Radiance Fields with Explicit Priors

Figure 3 for HumanGen: Generating Human Radiance Fields with Explicit Priors

Figure 4 for HumanGen: Generating Human Radiance Fields with Explicit Priors

Recent years have witnessed the tremendous progress of 3D GANs for generating view-consistent radiance fields with photo-realism. Yet, high-quality generation of human radiance fields remains challenging, partially due to the limited human-related priors adopted in existing methods. We present HumanGen, a novel 3D human generation scheme with detailed geometry and $\text{360}^{\circ}$ realistic free-view rendering. It explicitly marries the 3D human generation with various priors from the 2D generator and 3D reconstructor of humans through the design of "anchor image". We introduce a hybrid feature representation using the anchor image to bridge the latent space of HumanGen with the existing 2D generator. We then adopt a pronged design to disentangle the generation of geometry and appearance. With the aid of the anchor image, we adapt a 3D reconstructor for fine-grained details synthesis and propose a two-stage blending scheme to boost appearance generation. Extensive experiments demonstrate our effectiveness for state-of-the-art 3D human generation regarding geometry details, texture quality, and free-view performance. Notably, HumanGen can also incorporate various off-the-shelf 2D latent editing methods, seamlessly lifting them into 3D.

Via

Access Paper or Ask Questions

HARP: Personalized Hand Reconstruction from a Monocular RGB Video

Dec 30, 2022
Korrawe Karunratanakul, Sergey Prokudin, Otmar Hilliges, Siyu Tang

Figure 1 for HARP: Personalized Hand Reconstruction from a Monocular RGB Video

Figure 2 for HARP: Personalized Hand Reconstruction from a Monocular RGB Video

Figure 3 for HARP: Personalized Hand Reconstruction from a Monocular RGB Video

Figure 4 for HARP: Personalized Hand Reconstruction from a Monocular RGB Video

We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. In contrast to the major trend of neural implicit representations, HARP models a hand with a mesh-based parametric hand model, a vertex displacement map, a normal map, and an albedo without any neural components. As validated by our experiments, the explicit nature of our representation enables a truly scalable, robust, and efficient approach to hand avatar creation. HARP is optimized via gradient descent from a short sequence captured by a hand-held mobile phone and can be directly used in AR/VR applications with real-time rendering capability. To enable this, we carefully design and implement a shadow-aware differentiable rendering scheme that is robust to high degree articulations and self-shadowing regularly present in hand motion sequences, as well as challenging lighting conditions. It also generalizes to unseen poses and novel viewpoints, producing photo-realistic renderings of hand animations performing highly-articulated motions. Furthermore, the learned HARP representation can be used for improving 3D hand pose estimation quality in challenging viewpoints. The key advantages of HARP are validated by the in-depth analyses on appearance reconstruction, novel-view and novel pose synthesis, and 3D hand pose refinement. It is an AR/VR-ready personalized hand representation that shows superior fidelity and scalability.

* Project page: https://korrawe.github.io/harp-project/

Via

Access Paper or Ask Questions

BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields

Nov 23, 2022
Peng Wang, Lingzhe Zhao, Ruijie Ma, Peidong Liu

Figure 1 for BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields

Figure 2 for BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields

Figure 3 for BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields

Figure 4 for BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields

Neural Radiance Fields (NeRF) have received considerable attention recently, due to its impressive capability in photo-realistic 3D reconstruction and novel view synthesis, given a set of posed camera images. Earlier work usually assumes the input images are in good quality. However, image degradation (e.g. image motion blur in low-light conditions) can easily happen in real-world scenarios, which would further affect the rendering quality of NeRF. In this paper, we present a novel bundle adjusted deblur Neural Radiance Fields (BAD-NeRF), which can be robust to severe motion blurred images and inaccurate camera poses. Our approach models the physical image formation process of a motion blurred image, and jointly learns the parameters of NeRF and recovers the camera motion trajectories during exposure time. In experiments, we show that by directly modeling the real physical image formation process, BAD-NeRF achieves superior performance over prior works on both synthetic and real datasets.

Via

Access Paper or Ask Questions