Alert button
Picture for Youyi Zheng

Youyi Zheng

Alert button

VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization

Mar 31, 2023
Bingfan Zhu, Yanchao Yang, Xulong Wang, Youyi Zheng, Leonidas Guibas

Figure 1 for VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
Figure 2 for VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
Figure 3 for VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
Figure 4 for VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization

We propose VDN-NeRF, a method to train neural radiance fields (NeRFs) for better geometry under non-Lambertian surface and dynamic lighting conditions that cause significant variation in the radiance of a point when viewed from different angles. Instead of explicitly modeling the underlying factors that result in the view-dependent phenomenon, which could be complex yet not inclusive, we develop a simple and effective technique that normalizes the view-dependence by distilling invariant information already encoded in the learned NeRFs. We then jointly train NeRFs for view synthesis with view-dependence normalization to attain quality geometry. Our experiments show that even though shape-radiance ambiguity is inevitable, the proposed normalization can minimize its effect on geometry, which essentially aligns the optimal capacity needed for explaining view-dependent variations. Our method applies to various baselines and significantly improves geometry without changing the volume rendering pipeline, even if the data is captured under a moving light source. Code is available at: https://github.com/BoifZ/VDN-NeRF.

Viaarxiv icon

OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization

Dec 29, 2022
Feihong Shen, JIngjing Liu, Haizhen Li, Bing Fang, Chenglong Ma, Jin Hao, Yang Feng, Youyi Zheng

Figure 1 for OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization
Figure 2 for OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization
Figure 3 for OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization
Figure 4 for OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization

Patients take care of what their teeth will be like after the orthodontics. Orthodontists usually describe the expectation movement based on the original smile images, which is unconvincing. The growth of deep-learning generative models change this situation. It can visualize the outcome of orthodontic treatment and help patients foresee their future teeth and facial appearance. While previous studies mainly focus on 2D or 3D virtual treatment outcome (VTO) at a profile level, the problem of simulating treatment outcome at a frontal facial image is poorly explored. In this paper, we build an efficient and accurate system for simulating virtual teeth alignment effects in a frontal facial image. Our system takes a frontal face image of a patient with visible malpositioned teeth and the patient's 3D scanned teeth model as input, and progressively generates the visual results of the patient's teeth given the specific orthodontics planning steps from the doctor (i.e., the specification of translations and rotations of individual tooth). We design a multi-modal encoder-decoder based generative model to synthesize identity-preserving frontal facial images with aligned teeth. In addition, the original image color information is used to optimize the orthodontic outcomes, making the results more natural. We conduct extensive qualitative and clinical experiments and also a pilot study to validate our method.

Viaarxiv icon

NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations

May 09, 2022
Keyu Wu, Yifan Ye, Lingchen Yang, Hongbo Fu, Kun Zhou, Youyi Zheng

Figure 1 for NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations
Figure 2 for NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations
Figure 3 for NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations
Figure 4 for NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations

Undoubtedly, high-fidelity 3D hair plays an indispensable role in digital humans. However, existing monocular hair modeling methods are either tricky to deploy in digital systems (e.g., due to their dependence on complex user interactions or large databases) or can produce only a coarse geometry. In this paper, we introduce NeuralHDHair, a flexible, fully automatic system for modeling high-fidelity hair from a single image. The key enablers of our system are two carefully designed neural networks: an IRHairNet (Implicit representation for hair using neural network) for inferring high-fidelity 3D hair geometric features (3D orientation field and 3D occupancy field) hierarchically and a GrowingNet(Growing hair strands using neural network) to efficiently generate 3D hair strands in parallel. Specifically, we perform a coarse-to-fine manner and propose a novel voxel-aligned implicit function (VIFu) to represent the global hair feature, which is further enhanced by the local details extracted from a hair luminance map. To improve the efficiency of a traditional hair growth algorithm, we adopt a local neural implicit function to grow strands based on the estimated 3D hair geometric features. Extensive experiments show that our method is capable of constructing a high-fidelity 3D hair model from a single image, both efficiently and effectively, and achieves the-state-of-the-art performance.

* Accepted by IEEE CVPR 2022 
Viaarxiv icon

NeuralReshaper: Single-image Human-body Retouching with Deep Neural Networks

Apr 12, 2022
Beijia Chen, Hongbo Fu, Xiang Chen, Kun Zhou, Youyi Zheng

Figure 1 for NeuralReshaper: Single-image Human-body Retouching with Deep Neural Networks
Figure 2 for NeuralReshaper: Single-image Human-body Retouching with Deep Neural Networks
Figure 3 for NeuralReshaper: Single-image Human-body Retouching with Deep Neural Networks
Figure 4 for NeuralReshaper: Single-image Human-body Retouching with Deep Neural Networks

In this paper, we present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks. To achieve globally coherent reshaping effects, our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image and then reshapes the fitted 3D model with respect to user-specified semantic attributes. Previous methods rely on image warping to transfer 3D reshaping effects to the entire image domain and thus often cause distortions in both foreground and background. In contrast, we resort to generative adversarial nets conditioned on the source image and a 2D warping field induced by the reshaped 3D model, to achieve more realistic reshaping results. Specifically, we separately encode the foreground and background information in the source image using a two-headed UNet-like generator, and guide the information flow from the foreground branch to the background branch via feature space warping. Furthermore, to deal with the lack-of-data problem that no paired data exist (i.e., the same human bodies in varying shapes), we introduce a novel self-supervised strategy to train our network. Unlike previous methods that often require manual efforts to correct undesirable artifacts caused by incorrect body-to-image fitting, our method is fully automatic. Extensive experiments on both indoor and outdoor datasets demonstrate the superiority of our method over previous approaches.

Viaarxiv icon

AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications

Mar 11, 2022
Jin Hao, Jiaxiang Liu, Jin Li, Wei Pan, Ruizhe Chen, Huimin Xiong, Kaiwei Sun, Hangzheng Lin, Wanlu Liu, Wanghui Ding, Jianfei Yang, Haoji Hu, Yueling Zhang, Yang Feng, Zeyu Zhao, Huikai Wu, Youyi Zheng, Bing Fang, Zuozhu Liu, Zhihe Zhao

Figure 1 for AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications
Figure 2 for AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications
Figure 3 for AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications
Figure 4 for AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications

A critical step in virtual dental treatment planning is to accurately delineate all tooth-bone structures from CBCT with high fidelity and accurate anatomical information. Previous studies have established several methods for CBCT segmentation using deep learning. However, the inherent resolution discrepancy of CBCT and the loss of occlusal and dentition information largely limited its clinical applicability. Here, we present a Deep Dental Multimodal Analysis (DDMA) framework consisting of a CBCT segmentation model, an intraoral scan (IOS) segmentation model (the most accurate digital dental model), and a fusion model to generate 3D fused crown-root-bone structures with high fidelity and accurate occlusal and dentition information. Our model was trained with a large-scale dataset with 503 CBCT and 28,559 IOS meshes manually annotated by experienced human experts. For CBCT segmentation, we use a five-fold cross validation test, each with 50 CBCT, and our model achieves an average Dice coefficient and IoU of 93.99% and 88.68%, respectively, significantly outperforming the baselines. For IOS segmentations, our model achieves an mIoU of 93.07% and 95.70% on the maxillary and mandible on a test set of 200 IOS meshes, which are 1.77% and 3.52% higher than the state-of-art method. Our DDMA framework takes about 20 to 25 minutes to generate the fused 3D mesh model following the sequential processing order, compared to over 5 hours by human experts. Notably, our framework has been incorporated into a software by a clear aligner manufacturer, and real-world clinical cases demonstrate that our model can visualize crown-root-bone structures during the entire orthodontic treatment and can predict risks like dehiscence and fenestration. These findings demonstrate the potential of multi-modal deep learning to improve the quality of digital dental models and help dentists make better clinical decisions.

* 30 pages, 6 figures, 3 tables 
Viaarxiv icon

Can We Use Neural Regularization to Solve Depth Super-Resolution?

Dec 21, 2021
Milena Gazdieva, Oleg Voynov, Alexey Artemov, Youyi Zheng, Luiz Velho, Evgeny Burnaev

Figure 1 for Can We Use Neural Regularization to Solve Depth Super-Resolution?
Figure 2 for Can We Use Neural Regularization to Solve Depth Super-Resolution?
Figure 3 for Can We Use Neural Regularization to Solve Depth Super-Resolution?
Figure 4 for Can We Use Neural Regularization to Solve Depth Super-Resolution?

Depth maps captured with commodity sensors often require super-resolution to be used in applications. In this work we study a super-resolution approach based on a variational problem statement with Tikhonov regularization where the regularizer is parametrized with a deep neural network. This approach was previously applied successfully in photoacoustic tomography. We experimentally show that its application to depth map super-resolution is difficult, and provide suggestions about the reasons for that.

* 9 pages 
Viaarxiv icon

Domain Adaptation on Point Clouds via Geometry-Aware Implicits

Dec 17, 2021
Yuefan Shen, Yanchao Yang, Mi Yan, He Wang, Youyi Zheng, Leonidas Guibas

Figure 1 for Domain Adaptation on Point Clouds via Geometry-Aware Implicits
Figure 2 for Domain Adaptation on Point Clouds via Geometry-Aware Implicits
Figure 3 for Domain Adaptation on Point Clouds via Geometry-Aware Implicits
Figure 4 for Domain Adaptation on Point Clouds via Geometry-Aware Implicits

As a popular geometric representation, point clouds have attracted much attention in 3D vision, leading to many applications in autonomous driving and robotics. One important yet unsolved issue for learning on point cloud is that point clouds of the same object can have significant geometric variations if generated using different procedures or captured using different sensors. These inconsistencies induce domain gaps such that neural networks trained on one domain may fail to generalize on others. A typical technique to reduce the domain gap is to perform adversarial training so that point clouds in the feature space can align. However, adversarial training is easy to fall into degenerated local minima, resulting in negative adaptation gains. Here we propose a simple yet effective method for unsupervised domain adaptation on point clouds by employing a self-supervised task of learning geometry-aware implicits, which plays two critical roles in one shot. First, the geometric information in the point clouds is preserved through the implicit representations for downstream tasks. More importantly, the domain-specific variations can be effectively learned away in the implicit space. We also propose an adaptive strategy to compute unsigned distance fields for arbitrary point clouds due to the lack of shape models in practice. When combined with a task loss, the proposed outperforms state-of-the-art unsupervised domain adaptation methods that rely on adversarial domain alignment and more complicated self-supervised tasks. Our method is evaluated on both PointDA-10 and GraspNet datasets. The code and trained models will be publicly available.

Viaarxiv icon

SketchHairSalon: Deep Sketch-based Hair Image Synthesis

Sep 21, 2021
Chufeng Xiao, Deng Yu, Xiaoguang Han, Youyi Zheng, Hongbo Fu

Figure 1 for SketchHairSalon: Deep Sketch-based Hair Image Synthesis
Figure 2 for SketchHairSalon: Deep Sketch-based Hair Image Synthesis
Figure 3 for SketchHairSalon: Deep Sketch-based Hair Image Synthesis
Figure 4 for SketchHairSalon: Deep Sketch-based Hair Image Synthesis

Recent deep generative models allow real-time generation of hair images from sketch inputs. Existing solutions often require a user-provided binary mask to specify a target hair shape. This not only costs users extra labor but also fails to capture complicated hair boundaries. Those solutions usually encode hair structures via orientation maps, which, however, are not very effective to encode complex structures. We observe that colored hair sketches already implicitly define target hair shapes as well as hair appearance and are more flexible to depict hair structures than orientation maps. Based on these observations, we present SketchHairSalon, a two-stage framework for generating realistic hair images directly from freehand sketches depicting desired hair structure and appearance. At the first stage, we train a network to predict a hair matte from an input hair sketch, with an optional set of non-hair strokes. At the second stage, another network is trained to synthesize the structure and appearance of hair images from the input sketch and the generated matte. To make the networks in the two stages aware of long-term dependency of strokes, we apply self-attention modules to them. To train these networks, we present a new dataset containing thousands of annotated hair sketch-image pairs and corresponding hair mattes. Two efficient methods for sketch completion are proposed to automatically complete repetitive braided parts and hair strokes, respectively, thus reducing the workload of users. Based on the trained networks and the two sketch completion strategies, we build an intuitive interface to allow even novice users to design visually pleasing hair images exhibiting various hair structures and appearance via freehand sketches. The qualitative and quantitative evaluations show the advantages of the proposed system over the existing or alternative solutions.

* SIGGRAPH Asia 2021 (https://chufengxiao.github.io/SketchHairSalon/) 
Viaarxiv icon

ADeLA: Automatic Dense Labeling with Attention for Viewpoint Adaptation in Semantic Segmentation

Jul 29, 2021
Yanchao Yang, Hanxiang Ren, He Wang, Bokui Shen, Qingnan Fan, Youyi Zheng, C. Karen Liu, Leonidas Guibas

Figure 1 for ADeLA: Automatic Dense Labeling with Attention for Viewpoint Adaptation in Semantic Segmentation
Figure 2 for ADeLA: Automatic Dense Labeling with Attention for Viewpoint Adaptation in Semantic Segmentation
Figure 3 for ADeLA: Automatic Dense Labeling with Attention for Viewpoint Adaptation in Semantic Segmentation
Figure 4 for ADeLA: Automatic Dense Labeling with Attention for Viewpoint Adaptation in Semantic Segmentation

We describe an unsupervised domain adaptation method for image content shift caused by viewpoint changes for a semantic segmentation task. Most existing methods perform domain alignment in a shared space and assume that the mapping from the aligned space to the output is transferable. However, the novel content induced by viewpoint changes may nullify such a space for effective alignments, thus resulting in negative adaptation. Our method works without aligning any statistics of the images between the two domains. Instead, it utilizes a view transformation network trained only on color images to hallucinate the semantic images for the target. Despite the lack of supervision, the view transformation network can still generalize to semantic images thanks to the inductive bias introduced by the attention mechanism. Furthermore, to resolve ambiguities in converting the semantic images to semantic labels, we treat the view transformation network as a functional representation of an unknown mapping implied by the color images and propose functional label hallucination to generate pseudo-labels in the target domain. Our method surpasses baselines built on state-of-the-art correspondence estimation and view synthesis methods. Moreover, it outperforms the state-of-the-art unsupervised domain adaptation methods that utilize self-training and adversarial domain alignment. Our code and dataset will be made publicly available.

Viaarxiv icon

DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis

Jul 27, 2021
Yanchao Yang, Yuefan Shen, Youyi Zheng, C. Karen Liu, Leonidas Guibas

Figure 1 for DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis
Figure 2 for DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis
Figure 3 for DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis
Figure 4 for DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis

We describe a method for realistic depth synthesis that learns diverse variations from the real depth scans and ensures geometric consistency for effective synthetic-to-real transfer. Unlike general image synthesis pipelines, where geometries are mostly ignored, we treat geometries carried by the depth based on their own existence. We propose differential contrastive learning that explicitly enforces the underlying geometric properties to be invariant regarding the real variations been learned. The resulting depth synthesis method is task-agnostic and can be used for training any task-specific networks with synthetic labels. We demonstrate the effectiveness of the proposed method by extensive evaluations on downstream real-world geometric reasoning tasks. We show our method achieves better synthetic-to-real transfer performance than the other state-of-the-art. When fine-tuned on a small number of real-world annotations, our method can even surpass the fully supervised baselines.

Viaarxiv icon