Alert button
Picture for Di Qiu

Di Qiu

Alert button

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

Apr 04, 2023
Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, Ping Tan, Thabo Beeler, Sean Fanello, Yinda Zhang

Figure 1 for Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
Figure 2 for Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
Figure 3 for Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
Figure 4 for Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduce over-smoothing and improve out-of-model expressions synthesis, we propose to predict local features anchored on the 3DMM geometry. These learnt features are driven by 3DMM deformation and interpolated in 3D space to yield the volumetric radiance at a designated query point. We further show that using a Convolutional Neural Network in the UV space is critical in incorporating spatial context and producing representative local features. Extensive experiments show that we are able to reconstruct high-quality avatars, with more accurate expression-dependent details, good generalization to out-of-training expressions, and quantitatively superior renderings compared to other state-of-the-art approaches.

* In CVPR2023. Project page: https://augmentedperception.github.io/monoavatar/ 
Viaarxiv icon

Learn to Cluster Faces via Pairwise Classification

May 26, 2022
Junfu Liu, Di Qiu, Pengfei Yan, Xiaolin Wei

Figure 1 for Learn to Cluster Faces via Pairwise Classification
Figure 2 for Learn to Cluster Faces via Pairwise Classification
Figure 3 for Learn to Cluster Faces via Pairwise Classification
Figure 4 for Learn to Cluster Faces via Pairwise Classification

Face clustering plays an essential role in exploiting massive unlabeled face data. Recently, graph-based face clustering methods are getting popular for their satisfying performances. However, they usually suffer from excessive memory consumption especially on large-scale graphs, and rely on empirical thresholds to determine the connectivities between samples in inference, which restricts their applications in various real-world scenes. To address such problems, in this paper, we explore face clustering from the pairwise angle. Specifically, we formulate the face clustering task as a pairwise relationship classification task, avoiding the memory-consuming learning on large-scale graphs. The classifier can directly determine the relationship between samples and is enhanced by taking advantage of the contextual information. Moreover, to further facilitate the efficiency of our method, we propose a rank-weighted density to guide the selection of pairs sent to the classifier. Experimental results demonstrate that our method achieves state-of-the-art performances on several public clustering benchmarks at the fastest speed and shows a great advantage in comparison with graph-based clustering methods on memory consumption.

* Accepted by ICCV2021 
Viaarxiv icon

CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results

Feb 26, 2021
Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu, Shuo Yang, Yuanjun Xiong, Wei Xia, Yan Xu, Man Luo, Jian Liu, Jianshu Li, Zhijun Chen, Mingyu Guo, Hui Li, Junfu Liu, Pengfei Gao, Tianqi Hong, Hao Han, Shijie Liu, Xinhua Chen, Di Qiu, Cheng Zhen, Dashuang Liang, Yufeng Jin, Zhanlong Hao

Figure 1 for CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results
Figure 2 for CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results
Figure 3 for CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results
Figure 4 for CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results

As facial interaction systems are prevalently deployed, security and reliability of these systems become a critical issue, with substantial research efforts devoted. Among them, face anti-spoofing emerges as an important area, whose objective is to identify whether a presented face is live or spoof. Recently, a large-scale face anti-spoofing dataset, CelebA-Spoof which comprised of 625,537 pictures of 10,177 subjects has been released. It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects. This paper reports methods and results in the CelebA-Spoof Challenge 2020 on Face AntiSpoofing which employs the CelebA-Spoof dataset. The model evaluation is conducted online on the hidden test set. A total of 134 participants registered for the competition, and 19 teams made valid submissions. We will analyze the top ranked solutions and present some discussion on future work directions.

* Technical report. Challenge website: https://competitions.codalab.org/competitions/26210 
Viaarxiv icon

Guided Collaborative Training for Pixel-wise Semi-Supervised Learning

Aug 12, 2020
Zhanghan Ke, Di Qiu, Kaican Li, Qiong Yan, Rynson W. H. Lau

Figure 1 for Guided Collaborative Training for Pixel-wise Semi-Supervised Learning
Figure 2 for Guided Collaborative Training for Pixel-wise Semi-Supervised Learning
Figure 3 for Guided Collaborative Training for Pixel-wise Semi-Supervised Learning
Figure 4 for Guided Collaborative Training for Pixel-wise Semi-Supervised Learning

We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory due to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use task-specific properties. In this paper, we present a new SSL framework, named Guided Collaborative Training (GCT), for pixel-wise tasks, with two main technical contributions. First, GCT addresses the issues caused by the dense outputs through a novel flaw detector. Second, the modules in GCT learn from unlabeled data collaboratively through two newly proposed constraints that are independent of task-specific properties. As a result, GCT can be applied to a wide range of pixel-wise tasks without structural adaptation. Our extensive experiments on four challenging vision tasks, including semantic segmentation, real image denoising, portrait image matting, and night image enhancement, show that GCT outperforms state-of-the-art SSL methods by a large margin. Our code available at: https://github.com/ZHKKKe/PixelSSL.

* 16th European Conference on Computer Vision (ECCV 2020) 
Viaarxiv icon

Towards Geometry Guided Neural Relighting with Flash Photography

Aug 12, 2020
Di Qiu, Jin Zeng, Zhanghan Ke, Wenxiu Sun, Chengxi Yang

Figure 1 for Towards Geometry Guided Neural Relighting with Flash Photography
Figure 2 for Towards Geometry Guided Neural Relighting with Flash Photography
Figure 3 for Towards Geometry Guided Neural Relighting with Flash Photography
Figure 4 for Towards Geometry Guided Neural Relighting with Flash Photography

Previous image based relighting methods require capturing multiple images to acquire high frequency lighting effect under different lighting conditions, which needs nontrivial effort and may be unrealistic in certain practical use scenarios. While such approaches rely entirely on cleverly sampling the color images under different lighting conditions, little has been done to utilize geometric information that crucially influences the high-frequency features in the images, such as glossy highlight and cast shadow. We therefore propose a framework for image relighting from a single flash photograph with its corresponding depth map using deep learning. By incorporating the depth map, our approach is able to extrapolate realistic high-frequency effects under novel lighting via geometry guided image decomposition from the flashlight image, and predict the cast shadow map from the shadow-encoding transformed depth map. Moreover, the single-image based setup greatly simplifies the data capture process. We experimentally validate the advantage of our geometry guided approach over state-of-the-art image-based approaches in intrinsic image decomposition and image relighting, and also demonstrate our performance on real mobile phone photo examples.

Viaarxiv icon

Gradient Regularized Contrastive Learning for Continual Domain Adaptation

Jul 25, 2020
Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, Xiaogang Wang

Figure 1 for Gradient Regularized Contrastive Learning for Continual Domain Adaptation
Figure 2 for Gradient Regularized Contrastive Learning for Continual Domain Adaptation
Figure 3 for Gradient Regularized Contrastive Learning for Continual Domain Adaptation
Figure 4 for Gradient Regularized Contrastive Learning for Continual Domain Adaptation

Human beings can quickly adapt to environmental changes by leveraging learning experience. However, the poor ability of adapting to dynamic environments remains a major challenge for AI models. To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labeled source domain and a sequence of unlabeled target domains. There are two major obstacles in this problem: domain shifts and catastrophic forgetting. In this work, we propose Gradient Regularized Contrastive Learning to solve the above obstacles. At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains. Hence our method can jointly learn both semantically discriminative and domain-invariant features with labeled source domain and unlabeled target domains. The experiments on Digits, DomainNet and Office-Caltech benchmarks demonstrate the strong performance of our approach when compared to the state-of-the-art.

Viaarxiv icon

Modal Uncertainty Estimation via Discrete Latent Representation

Jul 25, 2020
Di Qiu, Lok Ming Lui

Figure 1 for Modal Uncertainty Estimation via Discrete Latent Representation
Figure 2 for Modal Uncertainty Estimation via Discrete Latent Representation
Figure 3 for Modal Uncertainty Estimation via Discrete Latent Representation
Figure 4 for Modal Uncertainty Estimation via Discrete Latent Representation

Many important problems in the real world don't have unique solutions. It is thus important for machine learning models to be capable of proposing different plausible solutions with meaningful probability measures. In this work we introduce such a deep learning framework that learns the one-to-many mappings between the inputs and outputs, together with faithful uncertainty measures. We call our framework {\it modal uncertainty estimation} since we model the one-to-many mappings to be generated through a set of discrete latent variables, each representing a latent mode hypothesis that explains the corresponding type of input-output relationship. The discrete nature of the latent representations thus allows us to estimate for any input the conditional probability distribution of the outputs very effectively. Both the discrete latent space and its uncertainty estimation are jointly learned during training. We motivate our use of discrete latent space through the multi-modal posterior collapse problem in current conditional generative models, then develop the theoretical background, and extensively validate our method on both synthetic and realistic tasks. Our framework demonstrates significantly more accurate uncertainty estimation than the current state-of-the-art methods, and is informative and convenient for practical use.

Viaarxiv icon

Adapting Object Detectors with Conditional Domain Normalization

Mar 16, 2020
Peng Su, Kun Wang, Xingyu Zeng, Shixiang Tang, Dapeng Chen, Di Qiu, Xiaogang Wang

Figure 1 for Adapting Object Detectors with Conditional Domain Normalization
Figure 2 for Adapting Object Detectors with Conditional Domain Normalization
Figure 3 for Adapting Object Detectors with Conditional Domain Normalization
Figure 4 for Adapting Object Detectors with Conditional Domain Normalization

Real-world object detectors are often challenged by the domain gaps between different datasets. In this work, we present the Conditional Domain Normalization (CDN) to bridge the domain gap. CDN is designed to encode different domain inputs into a shared latent space, where the features from different domains carry the same domain attribute. To achieve this, we first disentangle the domain-specific attribute out of the semantic features from one domain via a domain embedding module, which learns a domain-vector to characterize the corresponding domain attribute information. Then this domain-vector is used to encode the features from another domain through a conditional normalization, resulting in different domains' features carrying the same domain attribute. We incorporate CDN into various convolution stages of an object detector to adaptively address the domain shifts of different level's representation. In contrast to existing adaptation works that conduct domain confusion learning on semantic features to remove domain-specific factors, CDN aligns different domain distributions by modulating the semantic features of one domain conditioned on the learned domain-vector of another domain. Extensive experiments show that CDN outperforms existing methods remarkably on both real-to-real and synthetic-to-real adaptation benchmarks, including 2D image detection and 3D point cloud detection.

Viaarxiv icon

Shape analysis via inconsistent surface registration

Mar 03, 2020
Gary P. T. Choi, Di Qiu, Lok Ming Lui

Figure 1 for Shape analysis via inconsistent surface registration
Figure 2 for Shape analysis via inconsistent surface registration
Figure 3 for Shape analysis via inconsistent surface registration
Figure 4 for Shape analysis via inconsistent surface registration

In this work, we develop a framework for shape analysis using inconsistent surface mapping. Traditional landmark-based geometric morphometrics methods suffer from the limited degrees of freedom, while most of the more advanced non-rigid surface mapping methods rely on a strong assumption of the global consistency of two surfaces. From a practical point of view, given two anatomical surfaces with prominent feature landmarks, it is more desirable to have a method that automatically detects the most relevant parts of the two surfaces and finds the optimal landmark-matching alignment between those parts, without assuming any global 1-1 correspondence between the two surfaces. Our method is capable of solving this problem using inconsistent surface registration based on quasi-conformal theory. It further enables us to quantify the dissimilarity of two shapes using quasi-conformal distortion and differences in mean and Gaussian curvatures, thereby providing a natural way for shape classification. Experiments on Platyrrhine molars demonstrate the effectiveness of our method and shed light on the interplay between function and shape in nature.

Viaarxiv icon

Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module

Sep 17, 2019
Di Qiu, Jiahao Pang, Wenxiu Sun, Chengxi Yang

Figure 1 for Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module
Figure 2 for Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module
Figure 3 for Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module
Figure 4 for Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module

Recently, it is increasingly popular to equip mobile RGB cameras with Time-of-Flight (ToF) sensors for active depth sensing. However, for off-the-shelf ToF sensors, one must tackle two problems in order to obtain high-quality depth with respect to the RGB camera, namely 1) online calibration and alignment; and 2) complicated error correction for ToF depth sensing. In this work, we propose a framework for jointly alignment and refinement via deep learning. First, a cross-modal optical flow between the RGB image and the ToF amplitude image is estimated for alignment. The aligned depth is then refined via an improved kernel predicting network that performs kernel normalization and applies the bias prior to the dynamic convolution. To enrich our data for end-to-end training, we have also synthesized a dataset using tools from computer graphics. Experimental results demonstrate the effectiveness of our approach, achieving state-of-the-art for ToF refinement.

* ICCV2019 
Viaarxiv icon