Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Radu Timofte

AIM 2020 Challenge on Video Extreme Super-Resolution: Methods and Results

Sep 14, 2020
Dario Fuoli, Zhiwu Huang, Shuhang Gu, Radu Timofte, Arnau Raventos, Aryan Esfandiari, Salah Karout, Xuan Xu, Xin Li, Xin Xiong, Jinge Wang, Pablo Navarrete Michelini, Wenhao Zhang, Dongyang Zhang, Hanwei Zhu, Dan Xia, Haoyu Chen, Jinjin Gu, Zhi Zhang, Tongtong Zhao, Shanshan Zhao, Kazutoshi Akita, Norimichi Ukita, Hrishikesh P S, Densen Puthussery, Jiji C V

This paper reviews the video extreme super-resolution challenge associated with the AIM 2020 workshop at ECCV 2020. Common scaling factors for learned video super-resolution (VSR) do not go beyond factor 4. Missing information can be restored well in this region, especially in HR videos, where the high-frequency content mostly consists of texture details. The task in this challenge is to upscale videos with an extreme factor of 16, which results in more serious degradations that also affect the structural integrity of the videos. A single pixel in the low-resolution (LR) domain corresponds to 256 pixels in the high-resolution (HR) domain. Due to this massive information loss, it is hard to accurately restore the missing information. Track 1 is set up to gauge the state-of-the-art for such a demanding task, where fidelity to the ground truth is measured by PSNR and SSIM. Perceptually higher quality can be achieved in trade-off for fidelity by generating plausible high-frequency content. Track 2 therefore aims at generating visually pleasing results, which are ranked according to human perception, evaluated by a user study. In contrast to single image super-resolution (SISR), VSR can benefit from additional information in the temporal domain. However, this also imposes an additional requirement, as the generated frames need to be consistent along time.

Via

Access Paper or Ask Questions

Plug-and-Play Image Restoration with Deep Denoiser Prior

Aug 31, 2020
Kai Zhang, Yawei Li, Wangmeng Zuo, Lei Zhang, Luc Van Gool, Radu Timofte

Figure 1 for Plug-and-Play Image Restoration with Deep Denoiser Prior

Figure 2 for Plug-and-Play Image Restoration with Deep Denoiser Prior

Figure 3 for Plug-and-Play Image Restoration with Deep Denoiser Prior

Figure 4 for Plug-and-Play Image Restoration with Deep Denoiser Prior

Recent works on plug-and-play image restoration have shown that a denoiser can implicitly serve as the image prior for model-based methods to solve many inverse problems. Such a property induces considerable advantages for plug-and-play image restoration (e.g., integrating the flexibility of model-based method and effectiveness of learning-based methods) when the denoiser is discriminatively learned via deep convolutional neural network (CNN) with large modeling capacity. However, while deeper and larger CNN models are rapidly gaining popularity, existing plug-and-play image restoration hinders its performance due to the lack of suitable denoiser prior. In order to push the limits of plug-and-play image restoration, we set up a benchmark deep denoiser prior by training a highly flexible and effective CNN denoiser. We then plug the deep denoiser prior as a modular part into a half quadratic splitting based iterative algorithm to solve various image restoration problems. We, meanwhile, provide a thorough analysis of parameter setting, intermediate results and empirical convergence to better understand the working mechanism. Experimental results on three representative image restoration tasks, including deblurring, super-resolution and demosaicing, demonstrate that the proposed plug-and-play image restoration with deep denoiser prior not only significantly outperforms other state-of-the-art model-based methods but also achieves competitive or even superior performance against state-of-the-art learning-based methods. The source code is available at https://github.com/cszn/DPIR.

* An extended version of IRCNN (CVPR17). Project page: https://github.com/cszn/DPIR

Via

Access Paper or Ask Questions

DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Jul 30, 2020
Alexandre Carlier, Martin Danelljan, Alexandre Alahi, Radu Timofte

Figure 1 for DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Figure 2 for DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Figure 3 for DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Figure 4 for DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Scalable Vector Graphics (SVG) are ubiquitous in modern 2D interfaces due to their ability to scale to different resolutions. However, despite the success of deep learning-based models applied to rasterized images, the problem of vector graphics representation learning and generation remains largely unexplored. In this work, we propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and interpolation. Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. The network directly predicts a set of shapes in a non-autoregressive fashion. We introduce the task of complex SVG icons generation by releasing a new large-scale dataset along with an open-source library for SVG manipulation. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool by performing interpolations and other latent space operations. Our code is available at https://github.com/alexandre01/deepsvg.

* 19 pages; updated references

Via

Access Paper or Ask Questions

The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Jun 29, 2020
Yawei Li, Wen Li, Martin Danelljan, Kai Zhang, Shuhang Gu, Luc Van Gool, Radu Timofte

Figure 1 for The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Figure 2 for The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Figure 3 for The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Figure 4 for The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

In this paper, we tackle the problem of convolutional neural network design. Instead of focusing on the overall architecture design, we investigate a design space that is usually overlooked, \ie adjusting the channel configurations of predefined networks. We find that this adjustment can be achieved by pruning widened baseline networks and leads to superior performance. Base on that, we articulate the ``heterogeneity hypothesis'': with the same training protocol, there exists a layer-wise dissimilated network architecture (LW-DNA) that can outperform the original network with regular channel configurations under lower level of model complexity. The LW-DNA models are identified without added computational cost and training time compared with the original network. This constraint leads to controlled experiment which directs the focus to the importance of layer-wise specific channel configurations. Multiple sources of hints relate the benefits of LW-DNA models to overfitting, \ie the relative relationship between model complexity and dataset size. Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration. The resultant LW-DNA models consistently outperform the compared baseline models.

* Code will be available at https://github.com/ofsoundof/Heterogeneity_Hypothesis

Via

Access Paper or Ask Questions

Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

Jun 29, 2020
Ren Yang, Fabian Mentzer, Luc Van Gool, Radu Timofte

Figure 1 for Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

Figure 2 for Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

Figure 3 for Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

Figure 4 for Learning for Video Compression with Recurrent Auto-Encoder and Recurrent Probability Model

The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

OpenDVC: An Open Source Implementation of the DVC Video Compression Method

Jun 29, 2020
Ren Yang, Luc Van Gool, Radu Timofte

Figure 1 for OpenDVC: An Open Source Implementation of the DVC Video Compression Method

Figure 2 for OpenDVC: An Open Source Implementation of the DVC Video Compression Method

Figure 3 for OpenDVC: An Open Source Implementation of the DVC Video Compression Method

Figure 4 for OpenDVC: An Open Source Implementation of the DVC Video Compression Method

We introduce an open source Tensorflow implementation of the Deep Video Compression (DVC) method in this technical report. DVC is the first end-to-end optimized learned video compression method, achieving better MS-SSIM performance than the Low-Delay P (LDP) very fast setting of x265 and comparable PSNR performance with x265 (LDP very fast). At the time of writing this report, several learned video compression methods are superior to DVC, but currently none of them provides open source codes. We hope that our OpenDVC codes are able to provide a useful model for further development, and facilitate future researches on learned video compression. Different from the original DVC, which is only optimized for PSNR, we release not only the PSNR-optimized re-implementation, denoted by OpenDVC (PSNR), but also the MS-SSIM-optimized model OpenDVC (MS-SSIM). Our OpenDVC (MS-SSIM) model provides a more convincing baseline for MS-SSIM optimized methods, which can only compare with the PSNR optimized DVC in the past. The OpenDVC source codes and pre-trained models are publicly released at https://github.com/RenYang-home/OpenDVC.

* Technical report of OpenDVC; the project page is at https://github.com/RenYang-home/OpenDVC

Via

Access Paper or Ask Questions

SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Jun 25, 2020
Andreas Lugmayr, Martin Danelljan, Luc Van Gool, Radu Timofte

Figure 1 for SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Figure 2 for SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Figure 3 for SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Figure 4 for SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Super-resolution is an ill-posed problem, since it allows for multiple predictions for a given low-resolution image. This fundamental fact is largely ignored by state-of-the-art deep learning based approaches. These methods instead train a deterministic mapping using combinations of reconstruction and adversarial losses. In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input. Our model is trained in a principled manner using a single loss, namely the negative log-likelihood. SRFlow therefore directly accounts for the ill-posed nature of the problem, and learns to predict diverse photo-realistic high-resolution images. Moreover, we utilize the strong image posterior learned by SRFlow to design flexible image manipulation techniques, capable of enhancing super-resolved images by, e.g., transferring content from other images. We perform extensive experiments on faces, as well as on super-resolution in general. SRFlow outperforms state-of-the-art GAN-based approaches in terms of both PSNR and perceptual quality metrics, while allowing for diversity through the exploration of the space of super-resolved solutions.

Via

Access Paper or Ask Questions

Rendering Natural Camera Bokeh Effect with Deep Learning

Jun 10, 2020
Andrey Ignatov, Jagruti Patel, Radu Timofte

Figure 1 for Rendering Natural Camera Bokeh Effect with Deep Learning

Figure 2 for Rendering Natural Camera Bokeh Effect with Deep Learning

Figure 3 for Rendering Natural Camera Bokeh Effect with Deep Learning

Figure 4 for Rendering Natural Camera Bokeh Effect with Deep Learning

Bokeh is an important artistic effect used to highlight the main object of interest on the photo by blurring all out-of-focus areas. While DSLR and system camera lenses can render this effect naturally, mobile cameras are unable to produce shallow depth-of-field photos due to a very small aperture diameter of their optics. Unlike the current solutions simulating bokeh by applying Gaussian blur to image background, in this paper we propose to learn a realistic shallow focus technique directly from the photos produced by DSLR cameras. For this, we present a large-scale bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR with 50mm f/1.8 lenses. We use these images to train a deep learning model to reproduce a natural bokeh effect based on a single narrow-aperture image. The experimental results show that the proposed approach is able to render a plausible non-uniform bokeh even in case of complex input data with multiple objects. The dataset, pre-trained models and codes used in this paper are available on the project website.

Via

Access Paper or Ask Questions