Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haoqiang Fan

MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

May 31, 2024

Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan

Figure 1 for MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

Figure 2 for MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

Figure 3 for MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

Figure 4 for MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

Abstract:Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research. This is due to two challenges inherent in portrait animation driven with raw videos: 1) significant identity leakage; 2) Irrelevant background and facial details such as wrinkles degrade performance. To harnesses the power of the raw videos for vivid portrait animation, we proposed a pioneering conditional diffusion model named as MegActor. First, we introduced a synthetic data generation framework for creating videos with consistent motion and expressions but inconsistent IDs to mitigate the issue of ID leakage. Second, we segmented the foreground and background of the reference image and employed CLIP to encode the background details. This encoded information is then integrated into the network via a text embedding module, thereby ensuring the stability of the background. Finally, we further style transfer the appearance of the reference image to the driving video to eliminate the influence of facial details in the driving videos. Our final model was trained solely on public datasets, achieving results comparable to commercial models. We hope this will help the open-source community.The code is available at https://github.com/megvii-research/MegFaceAnimate.

Via

Access Paper or Ask Questions

Towards RGB-NIR Cross-modality Image Registration and Beyond

May 30, 2024

Huadong Li, Shichao Dong, Jin Wang, Rong Fu, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji

Figure 1 for Towards RGB-NIR Cross-modality Image Registration and Beyond

Figure 2 for Towards RGB-NIR Cross-modality Image Registration and Beyond

Figure 3 for Towards RGB-NIR Cross-modality Image Registration and Beyond

Figure 4 for Towards RGB-NIR Cross-modality Image Registration and Beyond

Abstract:This paper focuses on the area of RGB(visible)-NIR(near-infrared) cross-modality image registration, which is crucial for many downstream vision tasks to fully leverage the complementary information present in visible and infrared images. In this field, researchers face two primary challenges - the absence of a correctly-annotated benchmark with viewpoint variations for evaluating RGB-NIR cross-modality registration methods and the problem of inconsistent local features caused by the appearance discrepancy between RGB-NIR cross-modality images. To address these challenges, we first present the RGB-NIR Image Registration (RGB-NIR-IRegis) benchmark, which, for the first time, enables fair and comprehensive evaluations for the task of RGB-NIR cross-modality image registration. Evaluations of previous methods highlight the significant challenges posed by our RGB-NIR-IRegis benchmark, especially on RGB-NIR image pairs with viewpoint variations. To analyze the causes of the unsatisfying performance, we then design several metrics to reveal the toxic impact of inconsistent local features between visible and infrared images on the model performance. This further motivates us to develop a baseline method named Semantic Guidance Transformer (SGFormer), which utilizes high-level semantic guidance to mitigate the negative impact of local inconsistent features. Despite the simplicity of our motivation, extensive experimental results show the effectiveness of our method.

* 18 pages, 7 figures

Via

Access Paper or Ask Questions

Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

Dec 08, 2023

Huadong Li, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji

Figure 1 for Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

Figure 2 for Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

Figure 3 for Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

Figure 4 for Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

Abstract:It is widely believed that the dense supervision is better than the sparse supervision in the field of depth completion, but the underlying reasons for this are rarely discussed. In this paper, we find that the challenge of using sparse supervision for training Radar-Camera depth prediction models is the Projection Transformation Collapse (PTC). The PTC implies that sparse supervision leads the model to learn unexpected collapsed projection transformations between Image/Radar/LiDAR spaces. Building on this insight, we propose a novel ``Disruption-Compensation" framework to handle the PTC, thereby relighting the use of sparse supervision in depth completion tasks. The disruption part deliberately discards position correspondences among Image/Radar/LiDAR, while the compensation part leverages 3D spatial and 2D semantic information to compensate for the discarded beneficial position correspondence. Extensive experimental results demonstrate that our framework (sparse supervision) outperforms the state-of-the-art (dense supervision) with 11.6$\%$ improvement in mean absolute error and $1.6 \times$ speedup. The code is available at ...

Via

Access Paper or Ask Questions

GAFlow: Incorporating Gaussian Attention into Optical Flow

Sep 28, 2023

Ao Luo, Fan Yang, Xin Li, Lang Nie, Chunyu Lin, Haoqiang Fan, Shuaicheng Liu

Abstract:Optical flow, or the estimation of motion fields from image sequences, is one of the fundamental problems in computer vision. Unlike most pixel-wise tasks that aim at achieving consistent representations of the same category, optical flow raises extra demands for obtaining local discrimination and smoothness, which yet is not fully explored by existing approaches. In this paper, we push Gaussian Attention (GA) into the optical flow models to accentuate local properties during representation learning and enforce the motion affinity during matching. Specifically, we introduce a novel Gaussian-Constrained Layer (GCL) which can be easily plugged into existing Transformer blocks to highlight the local neighborhood that contains fine-grained structural information. Moreover, for reliable motion analysis, we provide a new Gaussian-Guided Attention Module (GGAM) which not only inherits properties from Gaussian distribution to instinctively revolve around the neighbor fields of each point but also is empowered to put the emphasis on contextually related regions during matching. Our fully-equipped model, namely Gaussian Attention Flow network (GAFlow), naturally incorporates a series of novel Gaussian-based modules into the conventional optical flow framework for reliable motion analysis. Extensive experiments on standard optical flow datasets consistently demonstrate the exceptional performance of the proposed approach in terms of both generalization ability evaluation and online benchmark testing. Code is available at https://github.com/LA30/GAFlow.

* To appear in ICCV-2023

Via

Access Paper or Ask Questions

MEFLUT: Unsupervised 1D Lookup Tables for Multi-exposure Image Fusion

Sep 21, 2023

Ting Jiang, Chuan Wang, Xinpeng Li, Ru Li, Haoqiang Fan, Shuaicheng Liu

Abstract:In this paper, we introduce a new approach for high-quality multi-exposure image fusion (MEF). We show that the fusion weights of an exposure can be encoded into a 1D lookup table (LUT), which takes pixel intensity value as input and produces fusion weight as output. We learn one 1D LUT for each exposure, then all the pixels from different exposures can query 1D LUT of that exposure independently for high-quality and efficient fusion. Specifically, to learn these 1D LUTs, we involve attention mechanism in various dimensions including frame, channel and spatial ones into the MEF task so as to bring us significant quality improvement over the state-of-the-art (SOTA). In addition, we collect a new MEF dataset consisting of 960 samples, 155 of which are manually tuned by professionals as ground-truth for evaluation. Our network is trained by this dataset in an unsupervised manner. Extensive experiments are conducted to demonstrate the effectiveness of all the newly proposed components, and results show that our approach outperforms the SOTA in our and another representative dataset SICE, both qualitatively and quantitatively. Moreover, our 1D LUT approach takes less than 4ms to run a 4K image on a PC GPU. Given its high quality, efficiency and robustness, our method has been shipped into millions of Android mobiles across multiple brands world-wide. Code is available at: https://github.com/Hedlen/MEFLUT.

Via

Access Paper or Ask Questions

Supervised Homography Learning with Realistic Dataset Generation

Aug 15, 2023

Hai Jiang, Haipeng Li, Songchen Han, Haoqiang Fan, Bing Zeng, Shuaicheng Liu

Figure 1 for Supervised Homography Learning with Realistic Dataset Generation

Figure 2 for Supervised Homography Learning with Realistic Dataset Generation

Figure 3 for Supervised Homography Learning with Realistic Dataset Generation

Figure 4 for Supervised Homography Learning with Realistic Dataset Generation

Abstract:In this paper, we propose an iterative framework, which consists of two phases: a generation phase and a training phase, to generate realistic training data and yield a supervised homography network. In the generation phase, given an unlabeled image pair, we utilize the pre-estimated dominant plane masks and homography of the pair, along with another sampled homography that serves as ground truth to generate a new labeled training pair with realistic motion. In the training phase, the generated data is used to train the supervised homography network, in which the training data is refined via a content consistency module and a quality assessment module. Once an iteration is finished, the trained network is used in the next data generation phase to update the pre-estimated homography. Through such an iterative strategy, the quality of the dataset and the performance of the network can be gradually and simultaneously improved. Experimental results show that our method achieves state-of-the-art performance and existing supervised methods can be also improved based on the generated dataset. Code and dataset are available at https://github.com/JianghaiSCU/RealSH.

* Accepted by ICCV 2023

Via

Access Paper or Ask Questions

SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

Jul 10, 2023

Xinpeng Li, Ting Jiang, Haoqiang Fan, Shuaicheng Liu

Figure 1 for SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

Figure 2 for SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

Figure 3 for SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

Figure 4 for SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

Abstract:Image Quality Assessment (IQA) is a challenging task that requires training on massive datasets to achieve accurate predictions. However, due to the lack of IQA data, deep learning-based IQA methods typically rely on pre-trained networks trained on massive datasets as feature extractors to enhance their generalization ability, such as the ResNet network trained on ImageNet. In this paper, we utilize the encoder of Segment Anything, a recently proposed segmentation model trained on a massive dataset, for high-level semantic feature extraction. Most IQA methods are limited to extracting spatial-domain features, while frequency-domain features have been shown to better represent noise and blur. Therefore, we leverage both spatial-domain and frequency-domain features by applying Fourier and standard convolutions on the extracted features, respectively. Extensive experiments are conducted to demonstrate the effectiveness of all the proposed components, and results show that our approach outperforms the state-of-the-art (SOTA) in four representative datasets, both qualitatively and quantitatively. Our experiments confirm the powerful feature extraction capabilities of Segment Anything and highlight the value of combining spatial-domain and frequency-domain features in IQA tasks. Code: https://github.com/Hedlen/SAM-IQA

Via

Access Paper or Ask Questions

Towards Robust SDRTV-to-HDRTV via Dual Inverse Degradation Network

Jul 07, 2023

Kepeng Xu, Gang He, Li Xu, Xingchao Yang, Ming Sun, Yuzhi Wang, Zijia Ma, Haoqiang Fan, Xing Wen

Figure 1 for Towards Robust SDRTV-to-HDRTV via Dual Inverse Degradation Network

Figure 2 for Towards Robust SDRTV-to-HDRTV via Dual Inverse Degradation Network

Figure 3 for Towards Robust SDRTV-to-HDRTV via Dual Inverse Degradation Network

Figure 4 for Towards Robust SDRTV-to-HDRTV via Dual Inverse Degradation Network

Abstract:Recently, the transformation of standard dynamic range TV (SDRTV) to high dynamic range TV (HDRTV) is in high demand due to the scarcity of HDRTV content. However, the conversion of SDRTV to HDRTV often amplifies the existing coding artifacts in SDRTV which deteriorate the visual quality of the output. In this study, we propose a dual inverse degradation SDRTV-to-HDRTV network DIDNet to address the issue of coding artifact restoration in converted HDRTV, which has not been previously studied. Specifically, we propose a temporal-spatial feature alignment module and dual modulation convolution to remove coding artifacts and enhance color restoration ability. Furthermore, a wavelet attention module is proposed to improve SDRTV features in the frequency domain. An auxiliary loss is introduced to decouple the learning process for effectively restoring from dual degradation. The proposed method outperforms the current state-of-the-art method in terms of quantitative results, visual quality, and inference times, thus enhancing the performance of the SDRTV-to-HDRTV method in real-world scenarios.

* 10 pages

Via

Access Paper or Ask Questions

Low-Light Image Enhancement with Wavelet-based Diffusion Models

Jun 01, 2023

Hai Jiang, Ao Luo, Songchen Han, Haoqiang Fan, Shuaicheng Liu

Abstract:Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. To address these issues, we propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL. Specifically, we present a wavelet-based conditional diffusion model (WCDM) that leverages the generative power of diffusion models to produce results with satisfactory perceptual fidelity. Additionally, it also takes advantage of the strengths of wavelet transformation to greatly accelerate inference and reduce computational resource usage without sacrificing information. To avoid chaotic content and diversity, we perform both forward diffusion and reverse denoising in the training phase of WCDM, enabling the model to achieve stable denoising and reduce randomness during inference. Moreover, we further design a high-frequency restoration module (HFRM) that utilizes the vertical and horizontal details of the image to complement the diagonal information for better fine-grained restoration. Extensive experiments on publicly available real-world benchmarks demonstrate that our method outperforms the existing state-of-the-art methods both quantitatively and visually, and it achieves remarkable improvements in efficiency compared to previous diffusion-based methods. In addition, we empirically show that the application for low-light face detection also reveals the latent practical values of our method.

Via

Access Paper or Ask Questions

Realistic Noise Synthesis with Diffusion Models

May 23, 2023

Qi Wu, Mingyan Han, Ting Jiang, Haoqiang Fan, Bing Zeng, Shuaicheng Liu

Figure 1 for Realistic Noise Synthesis with Diffusion Models

Figure 2 for Realistic Noise Synthesis with Diffusion Models

Figure 3 for Realistic Noise Synthesis with Diffusion Models

Figure 4 for Realistic Noise Synthesis with Diffusion Models

Abstract:Deep learning-based approaches have achieved remarkable performance in single-image denoising. However, training denoising models typically requires a large amount of data, which can be difficult to obtain in real-world scenarios. Furthermore, synthetic noise used in the past has often produced significant differences compared to real-world noise due to the complexity of the latter and the poor modeling ability of noise distributions of Generative Adversarial Network (GAN) models, resulting in residual noise and artifacts within denoising models. To address these challenges, we propose a novel method for synthesizing realistic noise using diffusion models. This approach enables us to generate large amounts of high-quality data for training denoising models by controlling camera settings to simulate different environmental conditions and employing guided multi-scale content information to ensure that our method is more capable of generating real noise with multi-frequency spatial correlations. In particular, we design an inversion mechanism for the setting, which extends our method to more public datasets without setting information. Based on the noise dataset we synthesized, we have conducted sufficient experiments on multiple benchmarks, and experimental results demonstrate that our method outperforms state-of-the-art methods on multiple benchmarks and metrics, demonstrating its effectiveness in synthesizing realistic noise for training denoising models.

Via

Access Paper or Ask Questions