Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martin Danelljan

GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network

Sep 16, 2020

Prune Truong, Martin Danelljan, Luc Van Gool, Radu Timofte

Figure 1 for GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network

Figure 2 for GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network

Figure 3 for GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network

Figure 4 for GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network

Abstract:The feature correlation layer serves as a key neural network module in numerous computer vision problems that involve dense correspondences between image pairs. It predicts a correspondence volume by evaluating dense scalar products between feature vectors extracted from pairs of locations in two images. However, this point-to-point feature comparison is insufficient when disambiguating multiple similar regions in an image, severely affecting the performance of the end task. We propose GOCor, a fully differentiable dense matching module, acting as a direct replacement to the feature correlation layer. The correspondence volume generated by our module is the result of an internal optimization procedure that explicitly accounts for similar regions in the scene. Moreover, our approach is capable of effectively learning spatial matching priors to resolve further matching ambiguities. We analyze our GOCor module in extensive ablative experiments. When integrated into state-of-the-art networks, our approach significantly outperforms the feature correlation layer for the tasks of geometric matching, optical flow, and dense semantic matching. The code and trained models will be made available at github.com/PruneTruong/GOCor.

* code: https://github.com/PruneTruong/GOCor

Via

Access Paper or Ask Questions

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

Sep 15, 2020

Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu(+68 more)

Figure 1 for AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

Figure 2 for AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

Figure 3 for AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

Figure 4 for AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

Abstract:This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.

Via

Access Paper or Ask Questions

DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Jul 30, 2020

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, Radu Timofte

Figure 1 for DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Figure 2 for DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Figure 3 for DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Figure 4 for DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation

Abstract:Scalable Vector Graphics (SVG) are ubiquitous in modern 2D interfaces due to their ability to scale to different resolutions. However, despite the success of deep learning-based models applied to rasterized images, the problem of vector graphics representation learning and generation remains largely unexplored. In this work, we propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and interpolation. Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. The network directly predicts a set of shapes in a non-autoregressive fashion. We introduce the task of complex SVG icons generation by releasing a new large-scale dataset along with an open-source library for SVG manipulation. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool by performing interpolations and other latent space operations. Our code is available at https://github.com/alexandre01/deepsvg.

* 19 pages; updated references

Via

Access Paper or Ask Questions

Video Object Segmentation with Episodic Graph Memory Networks

Jul 19, 2020

Xinkai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, Luc Van Gool

Figure 1 for Video Object Segmentation with Episodic Graph Memory Networks

Figure 2 for Video Object Segmentation with Episodic Graph Memory Networks

Figure 3 for Video Object Segmentation with Episodic Graph Memory Networks

Figure 4 for Video Object Segmentation with Episodic Graph Memory Networks

Abstract:How to make a segmentation model efficiently adapt to a specific video and to online target appearance variations are fundamentally crucial issues in the field of video object segmentation. In this work, a graph memory network is developed to address the novel idea of "learning to update the segmentation model". Specifically, we exploit an episodic memory network, organized as a fully connected graph, to store frames as nodes and capture cross-frame correlations by edges. Further, learnable controllers are embedded to ease memory reading and writing, as well as maintain a fixed memory scale. The structured, external memory design enables our model to comprehensively mine and quickly store new knowledge, even with limited visual information, and the differentiable memory controllers slowly learn an abstract method for storing useful representations in the memory and how to later use these representations for prediction, via gradient descent. In addition, the proposed graph memory network yields a neat yet principled framework, which can generalize well both one-shot and zero-shot video object segmentation tasks. Extensive experiments on four challenging benchmark datasets verify that our graph memory network is able to facilitate the adaptation of the segmentation network for case-by-case video object segmentation.

* ECCV2020 Spotlight; website: https://github.com/carrierlxk/GraphMemVOS

Via

Access Paper or Ask Questions

The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Jun 29, 2020

Yawei Li, Wen Li, Martin Danelljan, Kai Zhang, Shuhang Gu, Luc Van Gool, Radu Timofte

Figure 1 for The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Figure 2 for The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Figure 3 for The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Figure 4 for The Heterogeneity Hypothesis: Finding Layer-Wise Dissimilated Network Architecture

Abstract:In this paper, we tackle the problem of convolutional neural network design. Instead of focusing on the overall architecture design, we investigate a design space that is usually overlooked, \ie adjusting the channel configurations of predefined networks. We find that this adjustment can be achieved by pruning widened baseline networks and leads to superior performance. Base on that, we articulate the ``heterogeneity hypothesis'': with the same training protocol, there exists a layer-wise dissimilated network architecture (LW-DNA) that can outperform the original network with regular channel configurations under lower level of model complexity. The LW-DNA models are identified without added computational cost and training time compared with the original network. This constraint leads to controlled experiment which directs the focus to the importance of layer-wise specific channel configurations. Multiple sources of hints relate the benefits of LW-DNA models to overfitting, \ie the relative relationship between model complexity and dataset size. Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration. The resultant LW-DNA models consistently outperform the compared baseline models.

* Code will be available at https://github.com/ofsoundof/Heterogeneity_Hypothesis

Via

Access Paper or Ask Questions

SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Jun 25, 2020

Andreas Lugmayr, Martin Danelljan, Luc Van Gool, Radu Timofte

Figure 1 for SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Figure 2 for SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Figure 3 for SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Figure 4 for SRFlow: Learning the Super-Resolution Space with Normalizing Flow

Abstract:Super-resolution is an ill-posed problem, since it allows for multiple predictions for a given low-resolution image. This fundamental fact is largely ignored by state-of-the-art deep learning based approaches. These methods instead train a deterministic mapping using combinations of reconstruction and adversarial losses. In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input. Our model is trained in a principled manner using a single loss, namely the negative log-likelihood. SRFlow therefore directly accounts for the ill-posed nature of the problem, and learns to predict diverse photo-realistic high-resolution images. Moreover, we utilize the strong image posterior learned by SRFlow to design flexible image manipulation techniques, capable of enhancing super-resolved images by, e.g., transferring content from other images. We perform extensive experiments on faces, as well as on super-resolution in general. SRFlow outperforms state-of-the-art GAN-based approaches in terms of both PSNR and perceptual quality metrics, while allowing for diversity through the exploration of the space of super-resolved solutions.

Via

Access Paper or Ask Questions

NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

May 06, 2020

Dario Fuoli, Zhiwu Huang, Martin Danelljan, Radu Timofte, Hua Wang, Longcun Jin, Dewei Su, Jing Liu, Jaehoon Lee, Michal Kudelski(+11 more)

Figure 1 for NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

Figure 2 for NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

Figure 3 for NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

Figure 4 for NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

Abstract:This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM), which addresses the issues of quality mapping from source video domain to target video domain. The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets. In particular, track 1 offers a new Internet video benchmark, requiring algorithms to learn the map from more compressed videos to less compressed videos in a supervised training manner. In track 2, algorithms are required to learn the quality mapping from one device to another when their quality varies substantially and weakly-aligned video pairs are available. For track 1, in total 7 teams competed in the final test phase, demonstrating novel and effective solutions to the problem. For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.

* The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

Via

Access Paper or Ask Questions

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

May 05, 2020

Andreas Lugmayr, Martin Danelljan, Radu Timofte, Namhyuk Ahn, Dongwoon Bai, Jie Cai, Yun Cao, Junyang Chen, Kaihua Cheng, SeYoung Chun(+36 more)

Figure 1 for NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

Figure 2 for NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

Figure 3 for NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

Figure 4 for NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

Abstract:This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Processing artifacts, the aim is to super-resolve images with synthetically generated image processing artifacts. This allows for quantitative benchmarking of the approaches \wrt a ground-truth image. In Track 2: Smartphone Images, real low-quality smart phone images have to be super-resolved. In both tracks, the ultimate goal is to achieve the best perceptual quality, evaluated using a human study. This is the second challenge on the subject, following AIM 2019, targeting to advance the state-of-the-art in super-resolution. To measure the performance we use the benchmark protocol from AIM 2019. In total 22 teams competed in the final testing phase, demonstrating new and innovative solutions to the problem.

Via

Access Paper or Ask Questions

How to Train Your Energy-Based Model for Regression

May 04, 2020

Fredrik K. Gustafsson, Martin Danelljan, Radu Timofte, Thomas B. Schön

Figure 1 for How to Train Your Energy-Based Model for Regression

Figure 2 for How to Train Your Energy-Based Model for Regression

Figure 3 for How to Train Your Energy-Based Model for Regression

Figure 4 for How to Train Your Energy-Based Model for Regression

Abstract:Energy-based models (EBMs) have become increasingly popular within computer vision in recent years. While they are commonly employed for generative image modeling, recent work has applied EBMs also for regression tasks, achieving state-of-the-art performance on object detection and visual tracking. Training EBMs is however known to be challenging. While a variety of different techniques have been explored for generative modeling, the application of EBMs to regression is not a well-studied problem. How EBMs should be trained for best possible regression performance is thus currently unclear. We therefore accept the task of providing the first detailed study of this problem. To that end, we propose a simple yet highly effective extension of noise contrastive estimation, and carefully compare its performance to six popular methods from literature on the tasks of 1D regression and object detection. The results of this comparison suggest that our training method should be considered the go-to approach. We also apply our method to the visual tracking task, setting a new state-of-the-art on five datasets. Notably, our tracker achieves 63.7% AUC on LaSOT and 78.7% Success on TrackingNet. Code is available at https://github.com/fregu856/ebms_regression.

* Code is available at https://github.com/fregu856/ebms_regression

Via

Access Paper or Ask Questions

Know Your Surroundings: Exploiting Scene Information for Object Tracking

May 01, 2020

Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte

Figure 1 for Know Your Surroundings: Exploiting Scene Information for Object Tracking

Figure 2 for Know Your Surroundings: Exploiting Scene Information for Object Tracking

Figure 3 for Know Your Surroundings: Exploiting Scene Information for Object Tracking

Figure 4 for Know Your Surroundings: Exploiting Scene Information for Object Tracking

Abstract:Current state-of-the-art trackers only rely on a target appearance model in order to localize the object in each frame. Such approaches are however prone to fail in case of e.g. fast appearance changes or presence of distractor objects, where a target appearance model alone is insufficient for robust tracking. Having the knowledge about the presence and locations of other objects in the surrounding scene can be highly beneficial in such cases. This scene information can be propagated through the sequence and used to, for instance, explicitly avoid distractor objects and eliminate target candidate regions. In this work, we propose a novel tracking architecture which can utilize scene information for tracking. Our tracker represents such information as dense localized state vectors, which can encode, for example, if the local region is target, background, or distractor. These state vectors are propagated through the sequence and combined with the appearance model output to localize the target. Our network is learned to effectively utilize the scene information by directly maximizing tracking performance on video segments. The proposed approach sets a new state-of-the-art on 3 tracking benchmarks, achieving an AO score of 63.6% on the recent GOT-10k dataset.

Via

Access Paper or Ask Questions