Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martin Danelljan

TADA: Taxonomy Adaptive Domain Adaptation

Sep 10, 2021
Rui Gong, Martin Danelljan, Dengxin Dai, Wenguan Wang, Danda Pani Paudel, Ajad Chhatkuli, Fisher Yu, Luc Van Gool

Figure 1 for TADA: Taxonomy Adaptive Domain Adaptation

Figure 2 for TADA: Taxonomy Adaptive Domain Adaptation

Figure 3 for TADA: Taxonomy Adaptive Domain Adaptation

Figure 4 for TADA: Taxonomy Adaptive Domain Adaptation

Traditional domain adaptation addresses the task of adapting a model to a novel target domain under limited or no additional supervision. While tackling the input domain gap, the standard domain adaptation settings assume no domain change in the output space. In semantic prediction tasks, different datasets are often labeled according to different semantic taxonomies. In many real-world settings, the target domain task requires a different taxonomy than the one imposed by the source domain. We therefore introduce the more general taxonomy adaptive domain adaptation (TADA) problem, allowing for inconsistent taxonomies between the two domains. We further propose an approach that jointly addresses the image-level and label-level domain adaptation. On the label-level, we employ a bilateral mixed sampling strategy to augment the target domain, and a relabelling method to unify and align the label spaces. We address the image-level domain gap by proposing an uncertainty-rectified contrastive learning method, leading to more domain-invariant and class discriminative features. We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy. Our framework outperforms previous state-of-the-art by a large margin, while capable of adapting to new target domain taxonomies.

* 15 pages, 5 figures, 6 tables

Via

Access Paper or Ask Questions

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Aug 18, 2021
Goutam Bhat, Martin Danelljan, Fisher Yu, Luc Van Gool, Radu Timofte

Figure 1 for Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Figure 2 for Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Figure 3 for Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Figure 4 for Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

We propose a deep reparametrization of the maximum a posteriori formulation commonly employed in multi-frame image restoration tasks. Our approach is derived by introducing a learned error metric and a latent representation of the target image, which transforms the MAP objective to a deep feature space. The deep reparametrization allows us to directly model the image formation process in the latent space, and to integrate learned image priors into the prediction. Our approach thereby leverages the advantages of deep learning, while also benefiting from the principled multi-frame fusion provided by the classical MAP formulation. We validate our approach through comprehensive experiments on burst denoising and burst super-resolution datasets. Our approach sets a new state-of-the-art for both tasks, demonstrating the generality and effectiveness of the proposed formulation.

* ICCV 2021 Oral

Via

Access Paper or Ask Questions

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Aug 11, 2021
Jingyun Liang, Andreas Lugmayr, Kai Zhang, Martin Danelljan, Luc Van Gool, Radu Timofte

Figure 1 for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Figure 2 for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Figure 3 for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Figure 4 for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling

Normalizing flows have recently demonstrated promising results for low-level vision tasks. For image super-resolution (SR), it learns to predict diverse photo-realistic high-resolution (HR) images from the low-resolution (LR) image rather than learning a deterministic mapping. For image rescaling, it achieves high accuracy by jointly modelling the downscaling and upscaling processes. While existing approaches employ specialized techniques for these two tasks, we set out to unify them in a single formulation. In this paper, we propose the hierarchical conditional flow (HCFlow) as a unified framework for image SR and image rescaling. More specifically, HCFlow learns a bijective mapping between HR and LR image pairs by modelling the distribution of the LR image and the rest high-frequency component simultaneously. In particular, the high-frequency component is conditional on the LR image in a hierarchical manner. To further enhance the performance, other losses such as perceptual loss and GAN loss are combined with the commonly used negative log-likelihood loss in training. Extensive experiments on general image SR, face image SR and image rescaling have demonstrated that the proposed HCFlow achieves state-of-the-art performance in terms of both quantitative metrics and visual quality.

* Accepted by ICCV2021. Code: https://github.com/JingyunLiang/HCFlow

Via

Access Paper or Ask Questions

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Jun 22, 2021
Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

Figure 1 for Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Figure 2 for Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Figure 3 for Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Figure 4 for Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks. Code will be available at http://vis.xyz/pub/pcan.

* Multiple object tracking and segmentation on large-scale datasets

Via

Access Paper or Ask Questions

NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Jun 07, 2021
Goutam Bhat, Martin Danelljan, Radu Timofte, Kazutoshi Akita, Wooyeong Cho, Haoqiang Fan, Lanpeng Jia, Daeshik Kim, Bruno Lecouat, Youwei Li, Shuaicheng Liu, Ziluan Liu, Ziwei Luo, Takahiro Maeda, Julien Mairal, Christian Micheloni, Xuan Mo, Takeru Oba, Pavel Ostyakov, Jean Ponce, Sanghyeok Son, Jian Sun, Norimichi Ukita, Rao Muhammad Umer, Youliang Yan, Lei Yu, Magauiya Zhussip, Xueyi Zou

Figure 1 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 2 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 3 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Figure 4 for NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using a diverse set of solutions. The top-performing methods set a new state-of-the-art for the burst super-resolution task.

* NTIRE 2021 Burst Super-Resolution challenge report

Via

Access Paper or Ask Questions

Learnable Online Graph Representations for 3D Multi-Object Tracking

Apr 23, 2021
Jan-Nico Zaech, Dengxin Dai, Alexander Liniger, Martin Danelljan, Luc Van Gool

Figure 1 for Learnable Online Graph Representations for 3D Multi-Object Tracking

Figure 2 for Learnable Online Graph Representations for 3D Multi-Object Tracking

Figure 3 for Learnable Online Graph Representations for 3D Multi-Object Tracking

Figure 4 for Learnable Online Graph Representations for 3D Multi-Object Tracking

Tracking of objects in 3D is a fundamental task in computer vision that finds use in a wide range of applications such as autonomous driving, robotics or augmented reality. Most recent approaches for 3D multi object tracking (MOT) from LIDAR use object dynamics together with a set of handcrafted features to match detections of objects. However, manually designing such features and heuristics is cumbersome and often leads to suboptimal performance. In this work, we instead strive towards a unified and learning based approach to the 3D MOT problem. We design a graph structure to jointly process detection and track states in an online manner. To this end, we employ a Neural Message Passing network for data association that is fully trainable. Our approach provides a natural way for track initialization and handling of false positive detections, while significantly improving track stability. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.

* 13 pages

Via

Access Paper or Ask Questions

Warp Consistency for Unsupervised Learning of Dense Correspondences

Apr 08, 2021
Prune Truong, Martin Danelljan, Fisher Yu, Luc Van Gool

Figure 1 for Warp Consistency for Unsupervised Learning of Dense Correspondences

Figure 2 for Warp Consistency for Unsupervised Learning of Dense Correspondences

Figure 3 for Warp Consistency for Unsupervised Learning of Dense Correspondences

Figure 4 for Warp Consistency for Unsupervised Learning of Dense Correspondences

The key challenge in learning dense correspondences lies in the lack of ground-truth matches for real image pairs. While photometric consistency losses provide unsupervised alternatives, they struggle with large appearance changes, which are ubiquitous in geometric and semantic matching tasks. Moreover, methods relying on synthetic training pairs often suffer from poor generalisation to real data. We propose Warp Consistency, an unsupervised learning objective for dense correspondence regression. Our objective is effective even in settings with large appearance and view-point changes. Given a pair of real images, we first construct an image triplet by applying a randomly sampled warp to one of the original images. We derive and analyze all flow-consistency constraints arising between the triplet. From our observations and empirical results, we design a general unsupervised objective employing two of the derived constraints. We validate our warp consistency loss by training three recent dense correspondence networks for the geometric and semantic matching tasks. Our approach sets a new state-of-the-art on several challenging benchmarks, including MegaDepth, RobotCar and TSS. Code and models will be released at https://github.com/PruneTruong/DenseMatching.

* code: https://github.com/PruneTruong/DenseMatching

Via

Access Paper or Ask Questions

Learning Target Candidate Association to Keep Track of What Not to Track

Mar 30, 2021
Christoph Mayer, Martin Danelljan, Danda Pani Paudel, Luc Van Gool

Figure 1 for Learning Target Candidate Association to Keep Track of What Not to Track

Figure 2 for Learning Target Candidate Association to Keep Track of What Not to Track

Figure 3 for Learning Target Candidate Association to Keep Track of What Not to Track

Figure 4 for Learning Target Candidate Association to Keep Track of What Not to Track

The presence of objects that are confusingly similar to the tracked target, poses a fundamental challenge in appearance-based visual tracking. Such distractor objects are easily misclassified as the target itself, leading to eventual tracking failure. While most methods strive to suppress distractors through more powerful appearance models, we take an alternative approach. We propose to keep track of distractor objects in order to continue tracking the target. To this end, we introduce a learned association network, allowing us to propagate the identities of all target candidates from frame-to-frame. To tackle the problem of lacking ground-truth correspondences between distractor objects in visual tracking, we propose a training strategy that combines partial annotations with self-supervision. We conduct comprehensive experimental validation and analysis of our approach on several challenging datasets. Our tracker sets a new state-of-the-art on six benchmarks, achieving an AUC score of 67.2% on LaSOT and a +6.1% absolute gain on the OxUvA long-term dataset.

* 17 Pages

Via

Access Paper or Ask Questions

Deep Gaussian Processes for Few-Shot Segmentation

Mar 30, 2021
Joakim Johnander, Johan Edstedt, Martin Danelljan, Michael Felsberg, Fahad Shahbaz Khan

Figure 1 for Deep Gaussian Processes for Few-Shot Segmentation

Figure 2 for Deep Gaussian Processes for Few-Shot Segmentation

Figure 3 for Deep Gaussian Processes for Few-Shot Segmentation

Figure 4 for Deep Gaussian Processes for Few-Shot Segmentation

Few-shot segmentation is a challenging task, requiring the extraction of a generalizable representation from only a few annotated samples, in order to segment novel query images. A common approach is to model each class with a single prototype. While conceptually simple, these methods suffer when the target appearance distribution is multi-modal or not linearly separable in feature space. To tackle this issue, we propose a few-shot learner formulation based on Gaussian process (GP) regression. Through the expressivity of the GP, our approach is capable of modeling complex appearance distributions in the deep feature space. The GP provides a principled way of capturing uncertainty, which serves as another powerful cue for the final segmentation, obtained by a CNN decoder. We further exploit the end-to-end learning capabilities of our approach to learn the output space of the GP learner, ensuring a richer encoding of the segmentation mask. We perform comprehensive experimental analysis of our few-shot learner formulation. Our approach sets a new state-of-the-art for 5-shot segmentation, with mIoU scores of 68.1 and 49.8 on PASCAL-5i and COCO-20i, respectively

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

Deep Burst Super-Resolution

Jan 26, 2021
Goutam Bhat, Martin Danelljan, Luc Van Gool, Radu Timofte

Figure 1 for Deep Burst Super-Resolution

Figure 2 for Deep Burst Super-Resolution

Figure 3 for Deep Burst Super-Resolution

Figure 4 for Deep Burst Super-Resolution

While single-image super-resolution (SISR) has attracted substantial interest in recent years, the proposed approaches are limited to learning image priors in order to add high frequency details. In contrast, multi-frame super-resolution (MFSR) offers the possibility of reconstructing rich details by combining signal information from multiple shifted images. This key advantage, along with the increasing popularity of burst photography, have made MFSR an important problem for real-world applications. We propose a novel architecture for the burst super-resolution task. Our network takes multiple noisy RAW images as input, and generates a denoised, super-resolved RGB image as output. This is achieved by explicitly aligning deep embeddings of the input frames using pixel-wise optical flow. The information from all frames are then adaptively merged using an attention-based fusion module. In order to enable training and evaluation on real-world data, we additionally introduce the BurstSR dataset, consisting of smartphone bursts and high-resolution DSLR ground-truth. We perform comprehensive experimental analysis, demonstrating the effectiveness of the proposed architecture.

Via

Access Paper or Ask Questions