Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sabine Süsstrunk

Estimating Image Depth in the Comics Domain

Oct 07, 2021

Deblina Bhattacharjee, Martin Everaert, Mathieu Salzmann, Sabine Süsstrunk

Figure 1 for Estimating Image Depth in the Comics Domain

Figure 2 for Estimating Image Depth in the Comics Domain

Abstract:Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones and then use an attention-guided monocular depth estimator to predict their depth. This lets us leverage the depth annotations of existing natural images to train the depth estimator. Furthermore, our model learns to distinguish between text and images in the comics panels to reduce text-based artefacts in the depth estimates. Our method consistently outperforms the existing state-ofthe-art approaches across all metrics on both the DCM and eBDtheque images. Finally, we introduce a dataset to evaluate depth prediction on comics.

* WACV 2022 : Winter Conference on Applications of Computer Vision

Via

Access Paper or Ask Questions

Fidelity Estimation Improves Noisy-Image Classification with Pretrained Networks

Jun 01, 2021

Xiaoyu Lin, Deblina Bhattacharjee, Majed El Helou, Sabine Süsstrunk

Figure 1 for Fidelity Estimation Improves Noisy-Image Classification with Pretrained Networks

Figure 2 for Fidelity Estimation Improves Noisy-Image Classification with Pretrained Networks

Figure 3 for Fidelity Estimation Improves Noisy-Image Classification with Pretrained Networks

Figure 4 for Fidelity Estimation Improves Noisy-Image Classification with Pretrained Networks

Abstract:Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning classification methods are trained on clean images and are not robust when handling noisy ones, even if a restoration preprocessing step is applied. While novel methods address this problem, they rely on modified feature extractors and thus necessitate retraining. We instead propose a method that can be applied on a pretrained classifier. Our method exploits a fidelity map estimate that is fused into the internal representations of the feature extractor, thereby guiding the attention of the network and making it more robust to noisy data. We improve the noisy-image classification (NIC) results by significantly large margins, especially at high noise levels, and come close to the fully retrained approaches. Furthermore, as proof of concept, we show that when using our oracle fidelity map we even outperform the fully retrained methods, whether trained on noisy or restored images.

* Submitted to IEEE SPL for review

Via

Access Paper or Ask Questions

Modeling Object Dissimilarity for Deep Saliency Prediction

Apr 08, 2021

Bahar Aydemir, Deblina Bhattacharjee, Seungryong Kim, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

Figure 1 for Modeling Object Dissimilarity for Deep Saliency Prediction

Figure 2 for Modeling Object Dissimilarity for Deep Saliency Prediction

Figure 3 for Modeling Object Dissimilarity for Deep Saliency Prediction

Figure 4 for Modeling Object Dissimilarity for Deep Saliency Prediction

Abstract:Saliency prediction has made great strides over the past two decades, with current techniques modeling low-level information, such as color, intensity and size contrasts, and high-level one, such as attention and gaze direction for entire objects. Despite this, these methods fail to account for the dissimilarity between objects, which humans naturally do. In this paper, we introduce a detection-guided saliency prediction network that explicitly models the differences between multiple objects, such as their appearance and size dissimilarities. Our approach is general, allowing us to fuse our object dissimilarities with features extracted by any deep saliency prediction network. As evidenced by our experiments, this consistently boosts the accuracy of the baseline networks, enabling us to outperform the state-of-the-art models on three saliency benchmarks, namely SALICON, MIT300 and CAT2000.

Via

Access Paper or Ask Questions

Deep Gaussian Denoiser Epistemic Uncertainty and Decoupled Dual-Attention Fusion

Jan 22, 2021

Xiaoqi Ma, Xiaoyu Lin, Majed El Helou, Sabine Süsstrunk

Figure 1 for Deep Gaussian Denoiser Epistemic Uncertainty and Decoupled Dual-Attention Fusion

Figure 2 for Deep Gaussian Denoiser Epistemic Uncertainty and Decoupled Dual-Attention Fusion

Figure 3 for Deep Gaussian Denoiser Epistemic Uncertainty and Decoupled Dual-Attention Fusion

Figure 4 for Deep Gaussian Denoiser Epistemic Uncertainty and Decoupled Dual-Attention Fusion

Abstract:Following the performance breakthrough of denoising networks, improvements have come chiefly through novel architecture designs and increased depth. While novel denoising networks were designed for real images coming from different distributions, or for specific applications, comparatively small improvement was achieved on Gaussian denoising. The denoising solutions suffer from epistemic uncertainty that can limit further advancements. This uncertainty is traditionally mitigated through different ensemble approaches. However, such ensembles are prohibitively costly with deep networks, which are already large in size. Our work focuses on pushing the performance limits of state-of-the-art methods on Gaussian denoising. We propose a model-agnostic approach for reducing epistemic uncertainty while using only a single pretrained network. We achieve this by tapping into the epistemic uncertainty through augmented and frequency-manipulated images to obtain denoised images with varying error. We propose an ensemble method with two decoupled attention paths, over the pixel domain and over that of our different manipulations, to learn the final fusion. Our results significantly improve over the state-of-the-art baselines and across varying noise levels.

* Code and models are publicly available on https://github.com/IVRL/DEU

Via

Access Paper or Ask Questions

BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration

Nov 03, 2020

Majed El Helou, Sabine Süsstrunk

Figure 1 for BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration

Figure 2 for BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration

Figure 3 for BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration

Figure 4 for BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration

Abstract:Image restoration encompasses fundamental image processing tasks that have been addressed with different algorithms and deep learning methods. Classical restoration algorithms leverage a variety of priors, either implicitly or explicitly. Their priors are hand-designed and their corresponding weights are heuristically assigned. Thus, deep learning methods often produce superior restoration quality. Deep networks are, however, capable of strong and hardly-predictable hallucinations. Networks jointly and implicitly learn to be faithful to the observed data while learning an image prior, and the separation of original and hallucinated data downstream is then not possible. This limits their wide-spread adoption in restoration applications. Furthermore, it is often the hallucinated part that is victim to degradation-model overfitting. We present an approach with decoupled network-prior hallucination and data fidelity. We refer to our framework as the Bayesian Integration of a Generative Prior (BIGPrior). Our BIGPrior method is rooted in a Bayesian restoration framework, and tightly connected to classical restoration methods. In fact, our approach can be viewed as a generalization of a large family of classical restoration algorithms. We leverage a recent network inversion method to extract image prior information from a generative network. We show on image colorization, inpainting, and denoising that our framework consistently improves the prior results through good integration of data fidelity. Our method, though partly reliant on the quality of the generative network inversion, is competitive with state-of-the-art supervised and task-specific restoration methods. It also provides an additional metric that sets forth the degree of prior reliance per pixel. Indeed, the per pixel contributions of the decoupled data fidelity and prior terms are readily available in our proposed framework.

* Under submission. Code available on https://github.com/majedelhelou/BIGPrior

Via

Access Paper or Ask Questions

AIM 2020: Scene Relighting and Illumination Estimation Challenge

Sep 27, 2020

Majed El Helou, Ruofan Zhou, Sabine Süsstrunk, Radu Timofte, Mahmoud Afifi, Michael S. Brown, Kele Xu, Hengxing Cai, Yuzhong Liu, Li-Wen Wang(+27 more)

Figure 1 for AIM 2020: Scene Relighting and Illumination Estimation Challenge

Figure 2 for AIM 2020: Scene Relighting and Illumination Estimation Challenge

Figure 3 for AIM 2020: Scene Relighting and Illumination Estimation Challenge

Figure 4 for AIM 2020: Scene Relighting and Illumination Estimation Challenge

Abstract:We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illuminant orientation (i.e., light source position). The goal of the second track was to estimate illumination settings, namely the color temperature and orientation, from a given image. Lastly, the third track dealt with any-to-any relighting, thus a generalization of the first track. The target color temperature and orientation, rather than being pre-determined, are instead given by a guide image. Participants were allowed to make use of their track 1 and 2 solutions for track 3. The tracks had 94, 52, and 56 registered participants, respectively, leading to 20 confirmed submissions in the final competition stage.

* ECCVW 2020. Data and more information on https://github.com/majedelhelou/VIDIT

Via

Access Paper or Ask Questions

Volumetric Transformer Networks

Jul 18, 2020

Seungryong Kim, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for Volumetric Transformer Networks

Figure 2 for Volumetric Transformer Networks

Figure 3 for Volumetric Transformer Networks

Figure 4 for Volumetric Transformer Networks

Abstract:Existing techniques to encode spatial invariance within deep convolutional neural networks (CNNs) apply the same warping field to all the feature channels. This does not account for the fact that the individual feature channels can represent different semantic parts, which can undergo different spatial transformations w.r.t. a canonical configuration. To overcome this limitation, we introduce a learnable module, the volumetric transformer network (VTN), that predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. We design our VTN as an encoder-decoder network, with modules dedicated to letting the information flow across the feature channels, to account for the dependencies between the semantic parts. We further propose a loss function defined between the warped features of pairs of instances, which improves the localization ability of VTN. Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.

* ECCV 2020

Via

Access Paper or Ask Questions

On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them

Jun 15, 2020

Chen Liu, Mathieu Salzmann, Tao Lin, Ryota Tomioka, Sabine Süsstrunk

Figure 1 for On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them

Figure 2 for On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them

Figure 3 for On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them

Figure 4 for On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them

Abstract:We analyze the influence of adversarial training on the loss landscape of machine learning models. To this end, we first provide analytical studies of the properties of adversarial loss functions under different adversarial budgets. We then demonstrate that the adversarial loss landscape is less favorable to optimization, due to increased curvature and more scattered gradients. Our conclusions are validated by numerical analyses, which show that training under large adversarial budgets impede the escape from suboptimal random initialization, cause non-vanishing gradients and make the model find sharper minima. Based on these observations, we show that a periodic adversarial scheduling (PAS) strategy can effectively overcome these challenges, yielding better results than vanilla adversarial training while being much less sensitive to the choice of learning rate.

Via

Access Paper or Ask Questions

Editing in Style: Uncovering the Local Semantics of GANs

May 21, 2020

Edo Collins, Raja Bala, Bob Price, Sabine Süsstrunk

Figure 1 for Editing in Style: Uncovering the Local Semantics of GANs

Figure 2 for Editing in Style: Uncovering the Local Semantics of GANs

Figure 3 for Editing in Style: Uncovering the Local Semantics of GANs

Figure 4 for Editing in Style: Uncovering the Local Semantics of GANs

Abstract:While the quality of GAN image synthesis has improved tremendously in recent years, our ability to control and condition the output is still limited. Focusing on StyleGAN, we introduce a simple and effective method for making local, semantically-aware edits to a target output image. This is accomplished by borrowing elements from a source image, also a GAN output, via a novel manipulation of style vectors. Our method requires neither supervision from an external model, nor involves complex spatial morphing operations. Instead, it relies on the emergent disentanglement of semantic objects that is learned by StyleGAN during its training. Semantic editing is demonstrated on GANs producing human faces, indoor scenes, cats, and cars. We measure the locality and photorealism of the edits produced by our method, and find that it accomplishes both.

* IEEE Conference on Computer Vision and Patten Recognition (CVPR), 2020. Code: https://github.com/IVRL/GANLocalEditing

Via

Access Paper or Ask Questions

VIDIT: Virtual Image Dataset for Illumination Transfer

May 13, 2020

Majed El Helou, Ruofan Zhou, Johan Barthas, Sabine Süsstrunk

Figure 1 for VIDIT: Virtual Image Dataset for Illumination Transfer

Figure 2 for VIDIT: Virtual Image Dataset for Illumination Transfer

Figure 3 for VIDIT: Virtual Image Dataset for Illumination Transfer

Figure 4 for VIDIT: Virtual Image Dataset for Illumination Transfer

Abstract:Deep image relighting is gaining more interest lately, as it allows photo enhancement through illumination-specific retouching without human effort. Aside from aesthetic enhancement and photo montage, image relighting is valuable for domain adaptation, whether to augment datasets for training or to normalize input test data. Accurate relighting is, however, very challenging for various reasons, such as the difficulty in removing and recasting shadows and the modeling of different surfaces. We present a novel dataset, the Virtual Image Dataset for Illumination Transfer (VIDIT), in an effort to create a reference evaluation benchmark and to push forward the development of illumination manipulation methods. Virtual datasets are not only an important step towards achieving real-image performance but have also proven capable of improving training even when real datasets are possible to acquire and available. VIDIT contains 300 virtual scenes used for training, where every scene is captured 40 times in total: from 8 equally-spaced azimuthal angles, each lit with 5 different illuminants.

* For further information and data, see https://github.com/majedelhelou/VIDIT

Via

Access Paper or Ask Questions