Alert button
Picture for Haoyu Wu

Haoyu Wu

Alert button

S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces

Mar 30, 2023
Haoyu Wu, Alexandros Graikos, Dimitris Samaras

Figure 1 for S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces
Figure 2 for S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces
Figure 3 for S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces
Figure 4 for S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces

Neural rendering of implicit surfaces performs well in 3D vision applications. However, it requires dense input views as supervision. When only sparse input images are available, output quality drops significantly due to the shape-radiance ambiguity problem. We note that this ambiguity can be constrained when a 3D point is visible in multiple views, as is the case in multi-view stereo (MVS). We thus propose to regularize neural rendering optimization with an MVS solution. The use of an MVS probability volume and a generalized cross entropy loss leads to a noise-tolerant optimization process. In addition, neural rendering provides global consistency constraints that guide the MVS depth hypothesis sampling and thus improves MVS performance. Given only three sparse input views, experiments show that our method not only outperforms generic neural rendering models by a large margin but also significantly increases the reconstruction quality of MVS models. Project webpage: https://hao-yu-wu.github.io/s-volsdf/.

Viaarxiv icon

EmoTalk: Speech-driven emotional disentanglement for 3D face animation

Mar 20, 2023
Ziqiao Peng, Haoyu Wu, Zhenbo Song, Hao Xu, Xiangyu Zhu, Hongyan Liu, Jun He, Zhaoxin Fan

Figure 1 for EmoTalk: Speech-driven emotional disentanglement for 3D face animation
Figure 2 for EmoTalk: Speech-driven emotional disentanglement for 3D face animation
Figure 3 for EmoTalk: Speech-driven emotional disentanglement for 3D face animation
Figure 4 for EmoTalk: Speech-driven emotional disentanglement for 3D face animation

Speech-driven 3D face animation aims to generate realistic facial expressions that match the speech content and emotion. However, existing methods often neglect emotional facial expressions or fail to disentangle them from speech content. To address this issue, this paper proposes an end-to-end neural network to disentangle different emotions in speech so as to generate rich 3D facial expressions. Specifically, we introduce the emotion disentangling encoder (EDE) to disentangle the emotion and content in the speech by cross-reconstructed speech signals with different emotion labels. Then an emotion-guided feature fusion decoder is employed to generate a 3D talking face with enhanced emotion. The decoder is driven by the disentangled identity, emotional, and content embeddings so as to generate controllable personal and emotional styles. Finally, considering the scarcity of the 3D emotional talking face data, we resort to the supervision of facial blendshapes, which enables the reconstruction of plausible 3D faces from 2D emotional data, and contribute a large-scale 3D emotional talking face dataset (3D-ETF) to train the network. Our experiments and user studies demonstrate that our approach outperforms state-of-the-art methods and exhibits more diverse facial movements. We recommend watching the supplementary video: https://ziqiaopeng.github.io/emotalk

Viaarxiv icon

Disentangle Perceptual Learning through Online Contrastive Learning

Jun 24, 2020
Kangfu Mei, Yao Lu, Qiaosi Yi, Haoyu Wu, Juncheng Li, Rui Huang

Figure 1 for Disentangle Perceptual Learning through Online Contrastive Learning
Figure 2 for Disentangle Perceptual Learning through Online Contrastive Learning
Figure 3 for Disentangle Perceptual Learning through Online Contrastive Learning
Figure 4 for Disentangle Perceptual Learning through Online Contrastive Learning

Pursuing realistic results according to human visual perception is the central concern in the image transformation tasks. Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation. In this paper, we argue that, among the features representation from the pre-trained classification network, only limited dimensions are related to human visual perception, while others are irrelevant, although both will affect the final image transformation results. Under such an assumption, we try to disentangle the perception-relevant dimensions from the representation through our proposed online contrastive learning. The resulted network includes the pre-training part and a feature selection layer, followed by the contrastive learning module, which utilizes the transformed results, target images, and task-oriented distorted images as the positive, negative, and anchor samples, respectively. The contrastive learning aims at activating the perception-relevant dimensions and suppressing the irrelevant ones by using the triplet loss, so that the original representation can be disentangled for better perceptual quality. Experiments on various image transformation tasks demonstrate the superiority of our framework, in terms of human visual perception, to the existing approaches using pre-trained networks and empirically designed losses.

* 12 pages, 8 figures 
Viaarxiv icon

HighEr-Resolution Network for Image Demosaicing and Enhancing

Nov 19, 2019
Kangfu Mei, Juncheng Li, Jiajie Zhang, Haoyu Wu, Jie Li, Rui Huang

Figure 1 for HighEr-Resolution Network for Image Demosaicing and Enhancing
Figure 2 for HighEr-Resolution Network for Image Demosaicing and Enhancing
Figure 3 for HighEr-Resolution Network for Image Demosaicing and Enhancing
Figure 4 for HighEr-Resolution Network for Image Demosaicing and Enhancing

Neural-networks based image restoration methods tend to use low-resolution image patches for training. Although higher-resolution image patches can provide more global information, state-of-the-art methods cannot utilize them due to their huge GPU memory usage, as well as the instable training process. However, plenty of studies have shown that global information is crucial for image restoration tasks like image demosaicing and enhancing. In this work, we propose a HighEr-Resolution Network (HERN) to fully learning global information in high-resolution image patches. To achieve this, the HERN employs two parallel paths to learn image features in two different resolutions, respectively. By combining global-aware features and multi-scale features, our HERN is able to learn global information with feasible GPU memory usage. Besides, we introduce a progressive training method to solve the instability issue and accelerate model convergence. On the task of image demosaicing and enhancing, our HERN achieves state-of-the-art performance on the AIM2019 RAW to RGB mapping challenge. The source code of our implementation is available at https://github.com/MKFMIKU/RAW2RGBNet.

* Accepted in ICCV 2019 Workshop (AIM2019 Raw to RGB Challenge Winner) 
Viaarxiv icon