Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingyu Yang

Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

Jul 02, 2024

Cong Cao, Huanjing Yue, Xin Liu, Jingyu Yang

Figure 1 for Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

Figure 2 for Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

Figure 3 for Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

Figure 4 for Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

Abstract:Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various image restoration and enhancement tasks without training. However, directly applying them to video restoration and enhancement results in severe temporal flickering artifacts. In this paper, we propose the first framework for zero-shot video restoration and enhancement based on a pre-trained image diffusion model. By replacing the self-attention layer with the proposed cross-previous-frame attention layer, the pre-trained image diffusion model can take advantage of the temporal correlation between neighboring frames. We further propose temporal consistency guidance, spatial-temporal noise sharing, and an early stopping sampling strategy for better temporally consistent sampling. Our method is a plug-and-play module that can be inserted into any diffusion-based zero-shot image restoration or enhancement methods to further improve their performance. Experimental results demonstrate the superiority of our proposed method in producing temporally consistent videos with better fidelity.

* 19 pages

Via

Access Paper or Ask Questions

NTIRE 2024 Challenge on Night Photography Rendering

Jun 18, 2024

Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk(+40 more)

Figure 1 for NTIRE 2024 Challenge on Night Photography Rendering

Figure 2 for NTIRE 2024 Challenge on Night Photography Rendering

Figure 3 for NTIRE 2024 Challenge on Night Photography Rendering

Figure 4 for NTIRE 2024 Challenge on Night Photography Rendering

Abstract:This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algorithms was also measured alongside the quality of their output. To evaluate the results, a sufficient number of viewers were asked to assess the visual quality of the proposed solutions, considering the subjective nature of the task. There were 2 nominations: quality and efficiency. Top 5 solutions in terms of output quality were sorted by evaluation time (see Fig. 1). The top ranking participants' solutions effectively represent the state-of-the-art in nighttime photography rendering. More results can be found at https://nightimaging.org.

* 10 pages, 10 figures

Via

Access Paper or Ask Questions

MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Apr 30, 2024

Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu(+53 more)

Figure 1 for MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Figure 2 for MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Figure 3 for MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Figure 4 for MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Abstract:The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.

* CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

Via

Access Paper or Ask Questions

LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Apr 08, 2024

Haoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li

Figure 1 for LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Figure 2 for LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Figure 3 for LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Figure 4 for LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging

Abstract:Human pose and shape (HPS) estimation with lensless imaging is not only beneficial to privacy protection but also can be used in covert surveillance scenarios due to the small size and simple structure of this device. However, this task presents significant challenges due to the inherent ambiguity of the captured measurements and lacks effective methods for directly estimating human pose and shape from lensless data. In this paper, we propose the first end-to-end framework to recover 3D human poses and shapes from lensless measurements to our knowledge. We specifically design a multi-scale lensless feature decoder to decode the lensless measurements through the optically encoded mask for efficient feature extraction. We also propose a double-head auxiliary supervision mechanism to improve the estimation accuracy of human limb ends. Besides, we establish a lensless imaging system and verify the effectiveness of our method on various datasets acquired by our lensless imaging system.

* Accepted to CVPR 2024. More results available at https://cic.tju.edu.cn/faculty/likun/projects/LPSNet

Via

Access Paper or Ask Questions

DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

Mar 31, 2024

Chunyang Bi, Xin Luo, Sheng Shen, Mengxi Zhang, Huanjing Yue, Jingyu Yang

Figure 1 for DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

Figure 2 for DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

Figure 3 for DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

Figure 4 for DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion

Abstract:Diffusion models, known for their powerful generative capabilities, play a crucial role in addressing real-world super-resolution challenges. However, these models often focus on improving local textures while neglecting the impacts of global degradation, which can significantly reduce semantic fidelity and lead to inaccurate reconstructions and suboptimal super-resolution performance. To address this issue, we introduce a novel two-stage, degradation-aware framework that enhances the diffusion model's ability to recognize content and degradation in low-resolution images. In the first stage, we employ unsupervised contrastive learning to obtain representations of image degradations. In the second stage, we integrate a degradation-aware module into a simplified ControlNet, enabling flexible adaptation to various degradations based on the learned representations. Furthermore, we decompose the degradation-aware features into global semantics and local details branches, which are then injected into the diffusion denoising module to modulate the target generation. Our method effectively recovers semantically precise and photorealistic details, particularly under significant degradation conditions, demonstrating state-of-the-art performance across various benchmarks. Codes will be released at https://github.com/bichunyang419/DeeDSR.

Via

Access Paper or Ask Questions

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

Mar 07, 2024

Kaishen Yuan, Zitong Yu, Xin Liu, Weicheng Xie, Huanjing Yue, Jingyu Yang

Abstract:Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a promising paradigm to address these challenges, whereas its existing methods lack design for AU characteristics. Therefore, we innovatively investigate PETL paradigm to AU detection, introducing AUFormer and proposing a novel Mixture-of-Knowledge Expert (MoKE) collaboration mechanism. An individual MoKE specific to a certain AU with minimal learnable parameters first integrates personalized multi-scale and correlation knowledge. Then the MoKE collaborates with other MoKEs in the expert group to obtain aggregated information and inject it into the frozen Vision Transformer (ViT) to achieve parameter-efficient AU detection. Additionally, we design a Margin-truncated Difficulty-aware Weighted Asymmetric Loss (MDWA-Loss), which can encourage the model to focus more on activated AUs, differentiate the difficulty of unactivated AUs, and discard potential mislabeled samples. Extensive experiments from various perspectives, including within-domain, cross-domain, data efficiency, and micro-expression domain, demonstrate AUFormer's state-of-the-art performance and robust generalization abilities without relying on additional relevant data. The code for AUFormer is available at https://github.com/yuankaishen2001/AUFormer.

* 19 pages, 6 figures

Via

Access Paper or Ask Questions

A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Feb 29, 2024

Chao Hao, Zitong Yu, Xin Liu, Jun Xu, Huanjing Yue, Jingyu Yang

Figure 1 for A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Figure 2 for A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Figure 3 for A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Figure 4 for A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Abstract:Camouflaged object detection (COD) and salient object detection (SOD) are two distinct yet closely-related computer vision tasks widely studied during the past decades. Though sharing the same purpose of segmenting an image into binary foreground and background regions, their distinction lies in the fact that COD focuses on concealed objects hidden in the image, while SOD concentrates on the most prominent objects in the image. Previous works achieved good performance by stacking various hand-designed modules and multi-scale features. However, these carefully-designed complex networks often performed well on one task but not on another. In this work, we propose a simple yet effective network (SENet) based on vision Transformer (ViT), by employing a simple design of an asymmetric ViT-based encoder-decoder structure, we yield competitive results on both tasks, exhibiting greater versatility than meticulously crafted ones. Furthermore, to enhance the Transformer's ability to model local information, which is important for pixel-level binary segmentation tasks, we propose a local information capture module (LICM). We also propose a dynamic weighted loss (DW loss) based on Binary Cross-Entropy (BCE) and Intersection over Union (IoU) loss, which guides the network to pay more attention to those smaller and more difficult-to-find target objects according to their size. Moreover, we explore the issue of joint training of SOD and COD, and propose a preliminary solution to the conflict in joint training, further improving the performance of SOD. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our method. The code is available at https://github.com/linuxsino/SENet.

* submitted to IEEE TIP

Via

Access Paper or Ask Questions

KeDuSR: Real-World Dual-Lens Super-Resolution via Kernel-Free Matching

Jan 02, 2024

Huanjing Yue, Zifan Cui, Kun Li, Jingyu Yang

Abstract:Dual-lens super-resolution (SR) is a practical scenario for reference (Ref) based SR by utilizing the telephoto image (Ref) to assist the super-resolution of the low-resolution wide-angle image (LR input). Different from general RefSR, the Ref in dual-lens SR only covers the overlapped field of view (FoV) area. However, current dual-lens SR methods rarely utilize these specific characteristics and directly perform dense matching between the LR input and Ref. Due to the resolution gap between LR and Ref, the matching may miss the best-matched candidate and destroy the consistent structures in the overlapped FoV area. Different from them, we propose to first align the Ref with the center region (namely the overlapped FoV area) of the LR input by combining global warping and local warping to make the aligned Ref be sharp and consistent. Then, we formulate the aligned Ref and LR center as value-key pairs, and the corner region of the LR is formulated as queries. In this way, we propose a kernel-free matching strategy by matching between the LR-corner (query) and LR-center (key) regions, and the corresponding aligned Ref (value) can be warped to the corner region of the target. Our kernel-free matching strategy avoids the resolution gap between LR and Ref, which makes our network have better generalization ability. In addition, we construct a DuSR-Real dataset with (LR, Ref, HR) triples, where the LR and HR are well aligned. Experiments on three datasets demonstrate that our method outperforms the second-best method by a large margin. Our code and dataset are available at https://github.com/ZifanCui/KeDuSR.

* 14 pages, 10 figures. Accepted by AAAI-2024

Via

Access Paper or Ask Questions

Learning to See Low-Light Images via Feature Domain Adaptation

Dec 20, 2023

Qirui Yang, Qihua Cheng, Huanjing Yue, Le Zhang, Yihao Liu, Jingyu Yang

Abstract:Raw low light image enhancement (LLIE) has achieved much better performance than the sRGB domain enhancement methods due to the merits of raw data. However, the ambiguity between noisy to clean and raw to sRGB mappings may mislead the single-stage enhancement networks. The two-stage networks avoid ambiguity by decoupling the two mappings but usually have large computing complexity. To solve this problem, we propose a single-stage network empowered by Feature Domain Adaptation (FDA) to decouple the denoising and color mapping tasks in raw LLIE. The denoising encoder is supervised by the clean raw image, and then the denoised features are adapted for the color mapping task by an FDA module. We propose a Lineformer to serve as the FDA, which can well explore the global and local correlations with fewer line buffers (friendly to the line-based imaging process). During inference, the raw supervision branch is removed. In this way, our network combines the advantage of a two-stage enhancement process with the efficiency of single-stage inference. Experiments on four benchmark datasets demonstrate that our method achieves state-of-the-art performance with fewer computing costs (60% FLOPs of the two-stage method DNF). Our codes will be released after the acceptance of this work.

Via

Access Paper or Ask Questions

MARIS: Referring Image Segmentation via Mutual-Aware Attention Features

Nov 27, 2023

Mengxi Zhang, Yiming Liu, Xiangjun Yin, Huanjing Yue, Jingyu Yang

Figure 1 for MARIS: Referring Image Segmentation via Mutual-Aware Attention Features

Figure 2 for MARIS: Referring Image Segmentation via Mutual-Aware Attention Features

Figure 3 for MARIS: Referring Image Segmentation via Mutual-Aware Attention Features

Figure 4 for MARIS: Referring Image Segmentation via Mutual-Aware Attention Features

Abstract:Referring image segmentation (RIS) aims to segment a particular region based on a language expression prompt. Existing methods incorporate linguistic features into visual features and obtain multi-modal features for mask decoding. However, these methods may segment the visually salient entity instead of the correct referring region, as the multi-modal features are dominated by the abundant visual context. In this paper, we propose MARIS, a referring image segmentation method that leverages the Segment Anything Model (SAM) and introduces a mutual-aware attention mechanism to enhance the cross-modal fusion via two parallel branches. Specifically, our mutual-aware attention mechanism consists of Vision-Guided Attention and Language-Guided Attention, which bidirectionally model the relationship between visual and linguistic features. Correspondingly, we design a Mask Decoder to enable explicit linguistic guidance for more consistent segmentation with the language expression. To this end, a multi-modal query token is proposed to integrate linguistic information and interact with visual information simultaneously. Extensive experiments on three benchmark datasets show that our method outperforms the state-of-the-art RIS methods. Our code will be publicly available.

Via

Access Paper or Ask Questions