Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Radu Timofte

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Nov 15, 2021

Yuanhao Cai, Jing Lin, Xiaowan Hu, Haoqian Wang, Xin Yuan, Yulun Zhang, Radu Timofte, Luc Van Gool

Figure 1 for Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Figure 2 for Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Figure 3 for Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Figure 4 for Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Abstract:Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement in the coded aperture snapshot spectral imaging (CASSI) system. The HSI representations are highly similar and correlated across the spectral dimension. Modeling the inter-spectra interactions is beneficial for HSI reconstruction. However, existing CNN-based methods show limitations in capturing spectral-wise similarity and long-range dependencies. Besides, the HSI information is modulated by a coded aperture (physical mask) in CASSI. Nonetheless, current algorithms have not fully explored the guidance effect of the mask for HSI restoration. In this paper, we propose a novel framework, Mask-guided Spectral-wise Transformer (MST), for HSI reconstruction. Specifically, we present a Spectral-wise Multi-head Self-Attention (S-MSA) that treats each spectral feature as a token and calculates self-attention along the spectral dimension. In addition, we customize a Mask-guided Mechanism (MM) that directs S-MSA to pay attention to spatial regions with high-fidelity spectral representations. Extensive experiments show that our MST significantly outperforms state-of-the-art (SOTA) methods on simulation and real HSI datasets while requiring dramatically cheaper computational and memory costs.

* Transformer, Snapshot Compressive Imaging, Hyperspectral Image Reconstruction

Via

Access Paper or Ask Questions

Normalizing Flow as a Flexible Fidelity Objective for Photo-Realistic Super-resolution

Nov 05, 2021

Andreas Lugmayr, Martin Danelljan, Fisher Yu, Luc Van Gool, Radu Timofte

Figure 1 for Normalizing Flow as a Flexible Fidelity Objective for Photo-Realistic Super-resolution

Figure 2 for Normalizing Flow as a Flexible Fidelity Objective for Photo-Realistic Super-resolution

Figure 3 for Normalizing Flow as a Flexible Fidelity Objective for Photo-Realistic Super-resolution

Figure 4 for Normalizing Flow as a Flexible Fidelity Objective for Photo-Realistic Super-resolution

Abstract:Super-resolution is an ill-posed problem, where a ground-truth high-resolution image represents only one possibility in the space of plausible solutions. Yet, the dominant paradigm is to employ pixel-wise losses, such as L_1, which drive the prediction towards a blurry average. This leads to fundamentally conflicting objectives when combined with adversarial losses, which degrades the final quality. We address this issue by revisiting the L_1 loss and show that it corresponds to a one-layer conditional flow. Inspired by this relation, we explore general flows as a fidelity-based alternative to the L_1 objective. We demonstrate that the flexibility of deeper flows leads to better visual quality and consistency when combined with adversarial losses. We conduct extensive user studies for three datasets and scale factors, where our approach is shown to outperform state-of-the-art methods for photo-realistic super-resolution. Code and trained models will be available at: git.io/AdFlow

* WACV 2022

Via

Access Paper or Ask Questions

Towards Flexible Blind JPEG Artifacts Removal

Sep 29, 2021

Jiaxi Jiang, Kai Zhang, Radu Timofte

Figure 1 for Towards Flexible Blind JPEG Artifacts Removal

Figure 2 for Towards Flexible Blind JPEG Artifacts Removal

Figure 3 for Towards Flexible Blind JPEG Artifacts Removal

Figure 4 for Towards Flexible Blind JPEG Artifacts Removal

Abstract:Training a single deep blind model to handle different quality factors for JPEG image artifacts removal has been attracting considerable attention due to its convenience for practical usage. However, existing deep blind methods usually directly reconstruct the image without predicting the quality factor, thus lacking the flexibility to control the output as the non-blind methods. To remedy this problem, in this paper, we propose a flexible blind convolutional neural network, namely FBCNN, that can predict the adjustable quality factor to control the trade-off between artifacts removal and details preservation. Specifically, FBCNN decouples the quality factor from the JPEG image via a decoupler module and then embeds the predicted quality factor into the subsequent reconstructor module through a quality factor attention block for flexible control. Besides, we find existing methods are prone to fail on non-aligned double JPEG images even with only a one-pixel shift, and we thus propose a double JPEG degradation model to augment the training data. Extensive experiments on single JPEG images, more general double JPEG images, and real-world JPEG images demonstrate that our proposed FBCNN achieves favorable performance against state-of-the-art methods in terms of both quantitative metrics and visual quality.

* Accepted by ICCV 2021, Code: https://github.com/jiaxi-jiang/FBCNN

Via

Access Paper or Ask Questions

PDC-Net+: Enhanced Probabilistic Dense Correspondence Network

Sep 29, 2021

Prune Truong, Martin Danelljan, Radu Timofte, Luc Van Gool

Figure 1 for PDC-Net+: Enhanced Probabilistic Dense Correspondence Network

Figure 2 for PDC-Net+: Enhanced Probabilistic Dense Correspondence Network

Figure 3 for PDC-Net+: Enhanced Probabilistic Dense Correspondence Network

Figure 4 for PDC-Net+: Enhanced Probabilistic Dense Correspondence Network

Abstract:Establishing robust and accurate correspondences between a pair of images is a long-standing computer vision problem with numerous applications. While classically dominated by sparse methods, emerging dense approaches offer a compelling alternative paradigm that avoids the keypoint detection step. However, dense flow estimation is often inaccurate in the case of large displacements, occlusions, or homogeneous regions. In order to apply dense methods to real-world applications, such as pose estimation, image manipulation, or 3D reconstruction, it is therefore crucial to estimate the confidence of the predicted matches. We propose the Enhanced Probabilistic Dense Correspondence Network, PDC-Net+, capable of estimating accurate dense correspondences along with a reliable confidence map. We develop a flexible probabilistic approach that jointly learns the flow prediction and its uncertainty. In particular, we parametrize the predictive distribution as a constrained mixture model, ensuring better modelling of both accurate flow predictions and outliers. Moreover, we develop an architecture and an enhanced training strategy tailored for robust and generalizable uncertainty prediction in the context of self-supervised training. Our approach obtains state-of-the-art results on multiple challenging geometric matching and optical flow datasets. We further validate the usefulness of our probabilistic confidence estimation for the tasks of pose estimation, 3D reconstruction, image-based localization, and image retrieval. Code and models are available at https://github.com/PruneTruong/DenseMatching.

* Code: https://github.com/PruneTruong/DenseMatching. Paper extension of PDC-Net. arXiv admin note: substantial text overlap with arXiv:2101.01710

Via

Access Paper or Ask Questions

Perceptual Learned Video Compression with Recurrent Conditional GAN

Sep 13, 2021

Ren Yang, Luc Van Gool, Radu Timofte

Figure 1 for Perceptual Learned Video Compression with Recurrent Conditional GAN

Figure 2 for Perceptual Learned Video Compression with Recurrent Conditional GAN

Figure 3 for Perceptual Learned Video Compression with Recurrent Conditional GAN

Figure 4 for Perceptual Learned Video Compression with Recurrent Conditional GAN

Abstract:This paper proposes a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network. In our approach, the recurrent auto-encoder-based generator learns to fully explore the temporal correlation for compressing video. More importantly, we propose a recurrent conditional discriminator, which judges raw and compressed video conditioned on both spatial and temporal information, including the latent representation, temporal motion and hidden states in recurrent cells. This way, in the adversarial training, it pushes the generated video to be not only spatially photo-realistic but also temporally consistent with groundtruth and coherent among video frames. The experimental results show that the proposed PLVC model learns to compress video towards good perceptual quality at low bit-rate, and outperforms the previous traditional and learned approaches on several perceptual quality metrics. The user study further validates the outstanding perceptual performance of PLVC in comparison with the latest learned video compression approaches and the official HEVC test model (HM 16.20). The codes will be released at https://github.com/RenYang-home/PLVC.

Via

Access Paper or Ask Questions

Generalized Real-World Super-Resolution through Adversarial Robustness

Aug 25, 2021

Angela Castillo, María Escobar, Juan C. Pérez, Andrés Romero, Radu Timofte, Luc Van Gool, Pablo Arbeláez

Figure 1 for Generalized Real-World Super-Resolution through Adversarial Robustness

Figure 2 for Generalized Real-World Super-Resolution through Adversarial Robustness

Figure 3 for Generalized Real-World Super-Resolution through Adversarial Robustness

Figure 4 for Generalized Real-World Super-Resolution through Adversarial Robustness

Abstract:Real-world Super-Resolution (SR) has been traditionally tackled by first learning a specific degradation model that resembles the noise and corruption artifacts in low-resolution imagery. Thus, current methods lack generalization and lose their accuracy when tested on unseen types of corruption. In contrast to the traditional proposal, we present Robust Super-Resolution (RSR), a method that leverages the generalization capability of adversarial attacks to tackle real-world SR. Our novel framework poses a paradigm shift in the development of real-world SR methods. Instead of learning a dataset-specific degradation, we employ adversarial attacks to create difficult examples that target the model's weaknesses. Afterward, we use these adversarial examples during training to improve our model's capacity to process noisy inputs. We perform extensive experimentation on synthetic and real-world images and empirically demonstrate that our RSR method generalizes well across datasets without re-training for specific noise priors. By using a single robust model, we outperform state-of-the-art specialized methods on real-world benchmarks.

* ICCV Workshops, 2021

Via

Access Paper or Ask Questions

SwinIR: Image Restoration Using Swin Transformer

Aug 23, 2021

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, Radu Timofte

Figure 1 for SwinIR: Image Restoration Using Swin Transformer

Figure 2 for SwinIR: Image Restoration Using Swin Transformer

Figure 3 for SwinIR: Image Restoration Using Swin Transformer

Figure 4 for SwinIR: Image Restoration Using Swin Transformer

Abstract:Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by $\textbf{up to 0.14$\sim$0.45dB}$, while the total number of parameters can be reduced by $\textbf{up to 67%}$.

* Sota results on classical/lightweight/real-world image SR, image denoising and JPEG compression artifact reduction. Code: https://github.com/JingyunLiang/SwinIR

Via

Access Paper or Ask Questions

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Aug 18, 2021

Goutam Bhat, Martin Danelljan, Fisher Yu, Luc Van Gool, Radu Timofte

Figure 1 for Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Figure 2 for Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Figure 3 for Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Figure 4 for Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Abstract:We propose a deep reparametrization of the maximum a posteriori formulation commonly employed in multi-frame image restoration tasks. Our approach is derived by introducing a learned error metric and a latent representation of the target image, which transforms the MAP objective to a deep feature space. The deep reparametrization allows us to directly model the image formation process in the latent space, and to integrate learned image priors into the prediction. Our approach thereby leverages the advantages of deep learning, while also benefiting from the principled multi-frame fusion provided by the classical MAP formulation. We validate our approach through comprehensive experiments on burst denoising and burst super-resolution datasets. Our approach sets a new state-of-the-art for both tasks, demonstrating the generality and effectiveness of the proposed formulation.

* ICCV 2021 Oral

Via

Access Paper or Ask Questions

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution

Aug 11, 2021

Jingyun Liang, Guolei Sun, Kai Zhang, Luc Van Gool, Radu Timofte

Figure 1 for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution

Figure 2 for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution

Figure 3 for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution

Figure 4 for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution

Abstract:Existing blind image super-resolution (SR) methods mostly assume blur kernels are spatially invariant across the whole image. However, such an assumption is rarely applicable for real images whose blur kernels are usually spatially variant due to factors such as object motion and out-of-focus. Hence, existing blind SR methods would inevitably give rise to poor performance in real applications. To address this issue, this paper proposes a mutual affine network (MANet) for spatially variant kernel estimation. Specifically, MANet has two distinctive features. First, it has a moderate receptive field so as to keep the locality of degradation. Second, it involves a new mutual affine convolution (MAConv) layer that enhances feature expressiveness without increasing receptive field, model size and computation burden. This is made possible through exploiting channel interdependence, which applies each channel split with an affine transformation module whose input are the rest channel splits. Extensive experiments on synthetic and real images show that the proposed MANet not only performs favorably for both spatially variant and invariant kernel estimation, but also leads to state-of-the-art blind SR performance when combined with non-blind SR methods.

* Accepted by ICCV2021. Code: https://github.com/JingyunLiang/MANet

Via

Access Paper or Ask Questions

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Aug 11, 2021

Jingyun Liang, Andreas Lugmayr, Kai Zhang, Martin Danelljan, Luc Van Gool, Radu Timofte

Figure 1 for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Figure 2 for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Figure 3 for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Figure 4 for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Abstract:Normalizing flows have recently demonstrated promising results for low-level vision tasks. For image super-resolution (SR), it learns to predict diverse photo-realistic high-resolution (HR) images from the low-resolution (LR) image rather than learning a deterministic mapping. For image rescaling, it achieves high accuracy by jointly modelling the downscaling and upscaling processes. While existing approaches employ specialized techniques for these two tasks, we set out to unify them in a single formulation. In this paper, we propose the hierarchical conditional flow (HCFlow) as a unified framework for image SR and image rescaling. More specifically, HCFlow learns a bijective mapping between HR and LR image pairs by modelling the distribution of the LR image and the rest high-frequency component simultaneously. In particular, the high-frequency component is conditional on the LR image in a hierarchical manner. To further enhance the performance, other losses such as perceptual loss and GAN loss are combined with the commonly used negative log-likelihood loss in training. Extensive experiments on general image SR, face image SR and image rescaling have demonstrated that the proposed HCFlow achieves state-of-the-art performance in terms of both quantitative metrics and visual quality.

* Accepted by ICCV2021. Code: https://github.com/JingyunLiang/HCFlow

Via

Access Paper or Ask Questions