Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shangchen Zhou

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

Dec 11, 2023

Shangchen Zhou, Peiqing Yang, Jianyi Wang, Yihang Luo, Chen Change Loy

Abstract:Text-based diffusion models have exhibited remarkable success in generation and editing, showing great promise for enhancing visual content with their generative prior. However, applying these models to video super-resolution remains challenging due to the high demands for output fidelity and temporal consistency, which is complicated by the inherent randomness in diffusion models. Our study introduces Upscale-A-Video, a text-guided latent diffusion framework for video upscaling. This framework ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences; globally, without training, a flow-guided recurrent latent propagation module is introduced to enhance overall video stability by propagating and fusing latent across the entire sequences. Thanks to the diffusion paradigm, our model also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation, enabling a trade-off between fidelity and quality. Extensive experiments show that Upscale-A-Video surpasses existing methods in both synthetic and real-world benchmarks, as well as in AI-generated videos, showcasing impressive visual realism and temporal consistency.

* Equal contributions from first two authors. Project page: https://shangchenzhou.com/projects/upscale-a-video/

Via

Access Paper or Ask Questions

Iterative Token Evaluation and Refinement for Real-World Super-Resolution

Dec 09, 2023

Chaofeng Chen, Shangchen Zhou, Liang Liao, Haoning Wu, Wenxiu Sun, Qiong Yan, Weisi Lin

Figure 1 for Iterative Token Evaluation and Refinement for Real-World Super-Resolution

Figure 2 for Iterative Token Evaluation and Refinement for Real-World Super-Resolution

Figure 3 for Iterative Token Evaluation and Refinement for Real-World Super-Resolution

Figure 4 for Iterative Token Evaluation and Refinement for Real-World Super-Resolution

Abstract:Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations. Existing methods such as Generative Adversarial Networks (GANs) or continuous diffusion models present their own issues including GANs being difficult to train while continuous diffusion models requiring numerous inference steps. In this paper, we propose an Iterative Token Evaluation and Refinement (ITER) framework for RWSR, which utilizes a discrete diffusion model operating in the discrete token representation space, i.e., indexes of features extracted from a VQGAN codebook pre-trained with high-quality (HQ) images. We show that ITER is easier to train than GANs and more efficient than continuous diffusion models. Specifically, we divide RWSR into two sub-tasks, i.e., distortion removal and texture generation. Distortion removal involves simple HQ token prediction with LQ images, while texture generation uses a discrete diffusion model to iteratively refine the distortion removal output with a token refinement network. In particular, we propose to include a token evaluation network in the discrete diffusion process. It learns to evaluate which tokens are good restorations and helps to improve the iterative refinement results. Moreover, the evaluation network can first check status of the distortion removal output and then adaptively select total refinement steps needed, thereby maintaining a good balance between distortion removal and texture generation. Extensive experimental results show that ITER is easy to train and performs well within just 8 iterative steps. Our codes will be available publicly.

* To appear in AAAI2024, https://github.com/chaofengc/ITER

Via

Access Paper or Ask Questions

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

Sep 27, 2023

Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang(+10 more)

Abstract:This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging a pre-trained text-to-image (T2I) model as a basis. It is a highly desirable yet challenging task to simultaneously a) accomplish the synthesis of visually realistic and temporally coherent videos while b) preserving the strong creative generation nature of the pre-trained T2I model. To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model. Our key insights are two-fold: 1) We reveal that the incorporation of simple temporal self-attentions, coupled with rotary positional encoding, adequately captures the temporal correlations inherent in video data. 2) Additionally, we validate that the process of joint image-video fine-tuning plays a pivotal role in producing high-quality and creative outcomes. To enhance the performance of LaVie, we contribute a comprehensive and diverse video dataset named Vimeo25M, consisting of 25 million text-video pairs that prioritize quality, diversity, and aesthetic appeal. Extensive experiments demonstrate that LaVie achieves state-of-the-art performance both quantitatively and qualitatively. Furthermore, we showcase the versatility of pre-trained LaVie models in various long video generation and personalized video synthesis applications.

* Project webpage: https://vchitect.github.io/LaVie-project/

Via

Access Paper or Ask Questions

PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance

Sep 19, 2023

Peiqing Yang, Shangchen Zhou, Qingyi Tao, Chen Change Loy

Abstract:Exploiting pre-trained diffusion models for restoration has recently become a favored alternative to the traditional task-specific training approach. Previous works have achieved noteworthy success by limiting the solution space using explicit degradation models. However, these methods often fall short when faced with complex degradations as they generally cannot be precisely modeled. In this paper, we propose PGDiff by introducing partial guidance, a fresh perspective that is more adaptable to real-world degradations compared to existing works. Rather than specifically defining the degradation process, our approach models the desired properties, such as image structure and color statistics of high-quality images, and applies this guidance during the reverse diffusion process. These properties are readily available and make no assumptions about the degradation process. When combined with a diffusion prior, this partial guidance can deliver appealing results across a range of restoration tasks. Additionally, PGDiff can be extended to handle composite tasks by consolidating multiple high-quality image properties, achieved by integrating the guidance from respective tasks. Experimental results demonstrate that our method not only outperforms existing diffusion-prior-based approaches but also competes favorably with task-specific models.

* GitHub: https://github.com/pq-yang/PGDiff

Via

Access Paper or Ask Questions

ProPainter: Improving Propagation and Transformer for Video Inpainting

Sep 07, 2023

Shangchen Zhou, Chongyi Li, Kelvin C. K. Chan, Chen Change Loy

Figure 1 for ProPainter: Improving Propagation and Transformer for Video Inpainting

Figure 2 for ProPainter: Improving Propagation and Transformer for Video Inpainting

Figure 3 for ProPainter: Improving Propagation and Transformer for Video Inpainting

Figure 4 for ProPainter: Improving Propagation and Transformer for Video Inpainting

Abstract:Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms in video inpainting (VI). Despite the effectiveness of these components, they still suffer from some limitations that affect their performance. Previous propagation-based approaches are performed separately either in the image or feature domain. Global image propagation isolated from learning may cause spatial misalignment due to inaccurate optical flow. Moreover, memory or computational constraints limit the temporal range of feature propagation and video Transformer, preventing exploration of correspondence information from distant frames. To address these issues, we propose an improved framework, called ProPainter, which involves enhanced ProPagation and an efficient Transformer. Specifically, we introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting global correspondences reliably. We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens. With these components, ProPainter outperforms prior arts by a large margin of 1.46 dB in PSNR while maintaining appealing efficiency.

* Accepted by ICCV 2023. Code: https://github.com/sczhou/ProPainter

Via

Access Paper or Ask Questions

Adaptive Window Pruning for Efficient Local Motion Deblurring

Jun 25, 2023

Haoying Li, Jixin Zhao, Shangchen Zhou, Huajun Feng, Chongyi Li, Chen Change Loy

Figure 1 for Adaptive Window Pruning for Efficient Local Motion Deblurring

Figure 2 for Adaptive Window Pruning for Efficient Local Motion Deblurring

Figure 3 for Adaptive Window Pruning for Efficient Local Motion Deblurring

Figure 4 for Adaptive Window Pruning for Efficient Local Motion Deblurring

Abstract:Local motion blur commonly occurs in real-world photography due to the mixing between moving objects and stationary backgrounds during exposure. Existing image deblurring methods predominantly focus on global deblurring, inadvertently affecting the sharpness of backgrounds in locally blurred images and wasting unnecessary computation on sharp pixels, especially for high-resolution images. This paper aims to adaptively and efficiently restore high-resolution locally blurred images. We propose a local motion deblurring vision Transformer (LMD-ViT) built on adaptive window pruning Transformer blocks (AdaWPT). To focus deblurring on local regions and reduce computation, AdaWPT prunes unnecessary windows, only allowing the active windows to be involved in the deblurring processes. The pruning operation relies on the blurriness confidence predicted by a confidence predictor that is trained end-to-end using a reconstruction loss with Gumbel-Softmax re-parameterization and a pruning loss guided by annotated blur masks. Our method removes local motion blur effectively without distorting sharp regions, demonstrated by its exceptional perceptual and quantitative improvements (+0.24dB) compared to state-of-the-art methods. In addition, our approach substantially reduces FLOPs by 66% and achieves more than a twofold increase in inference speed compared to Transformer-based deblurring methods. We will make our code and annotated blur masks publicly available.

* 18 pages

Via

Access Paper or Ask Questions

Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond

Jun 08, 2023

Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yihang Luo, Chen Change Loy

Figure 1 for Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond

Figure 2 for Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond

Figure 3 for Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond

Figure 4 for Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond

Abstract:Artificial lights commonly leave strong lens flare artifacts on the images captured at night, degrading both the visual quality and performance of vision algorithms. Existing flare removal approaches mainly focus on removing daytime flares and fail in nighttime cases. Nighttime flare removal is challenging due to the unique luminance and spectrum of artificial lights, as well as the diverse patterns and image degradation of the flares. The scarcity of the nighttime flare removal dataset constraints the research on this crucial task. In this paper, we introduce Flare7K++, the first comprehensive nighttime flare removal dataset, consisting of 962 real-captured flare images (Flare-R) and 7,000 synthetic flares (Flare7K). Compared to Flare7K, Flare7K++ is particularly effective in eliminating complicated degradation around the light source, which is intractable by using synthetic flares alone. Besides, the previous flare removal pipeline relies on the manual threshold and blur kernel settings to extract light sources, which may fail when the light sources are tiny or not overexposed. To address this issue, we additionally provide the annotations of light sources in Flare7K++ and propose a new end-to-end pipeline to preserve the light source while removing lens flares. Our dataset and pipeline offer a valuable foundation and benchmark for future investigations into nighttime flare removal studies. Extensive experiments demonstrate that Flare7K++ supplements the diversity of existing flare datasets and pushes the frontier of nighttime flare removal towards real-world scenarios.

* Extension of arXiv:2210.06570; Project page at https://ykdai.github.io/projects/Flare7K

Via

Access Paper or Ask Questions

MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

May 23, 2023

Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Qingpeng Zhu, Qianhui Sun, Wenxiu Sun, Chen Change Loy, Jinwei Gu

Figure 1 for MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

Figure 2 for MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

Figure 3 for MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

Figure 4 for MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

Abstract:Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). With the success of the 1st MIPI Workshop@ECCV 2022, we introduce the second MIPI challenge including four tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2023. In total, 120 participants were successfully registered, and 11 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2023/ .

* CVPR 2023 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2023/

Via

Access Paper or Ask Questions

Exploiting Diffusion Prior for Real-World Image Super-Resolution

May 11, 2023

Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin C. K. Chan, Chen Change Loy

Abstract:We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution (SR). Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we introduce a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches.

* Project page: https://iceclear.github.io/projects/stablesr/

Via

Access Paper or Ask Questions

MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results

Apr 27, 2023

Qingpeng Zhu, Wenxiu Sun, Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Qianhui Sun, Chen Change Loy, Jinwei Gu, Yi Yu(+13 more)

Figure 1 for MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results

Figure 2 for MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results

Figure 3 for MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results

Figure 4 for MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results

Abstract:Depth completion from RGB images and sparse Time-of-Flight (ToF) measurements is an important problem in computer vision and robotics. While traditional methods for depth completion have relied on stereo vision or structured light techniques, recent advances in deep learning have enabled more accurate and efficient completion of depth maps from RGB images and sparse ToF measurements. To evaluate the performance of different depth completion methods, we organized an RGB+sparse ToF depth completion competition. The competition aimed to encourage research in this area by providing a standardized dataset and evaluation metrics to compare the accuracy of different approaches. In this report, we present the results of the competition and analyze the strengths and weaknesses of the top-performing methods. We also discuss the implications of our findings for future research in RGB+sparse ToF depth completion. We hope that this competition and report will help to advance the state-of-the-art in this important area of research. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2023.

* arXiv admin note: substantial text overlap with arXiv:2209.07057

Via

Access Paper or Ask Questions