Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luc Van Gool

KU Leuven/ESAT-PSI, ETH Zurich/CVL, TRACE vzw

StyleGenes: Discrete and Efficient Latent Distributions for GANs

Apr 30, 2023

Evangelos Ntavelis, Mohamad Shahbazi, Iason Kastanis, Radu Timofte, Martin Danelljan, Luc Van Gool

Figure 1 for StyleGenes: Discrete and Efficient Latent Distributions for GANs

Figure 2 for StyleGenes: Discrete and Efficient Latent Distributions for GANs

Figure 3 for StyleGenes: Discrete and Efficient Latent Distributions for GANs

Figure 4 for StyleGenes: Discrete and Efficient Latent Distributions for GANs

Abstract:We propose a discrete latent distribution for Generative Adversarial Networks (GANs). Instead of drawing latent vectors from a continuous prior, we sample from a finite set of learnable latents. However, a direct parametrization of such a distribution leads to an intractable linear increase in memory in order to ensure sufficient sample diversity. We address this key issue by taking inspiration from the encoding of information in biological organisms. Instead of learning a separate latent vector for each sample, we split the latent space into a set of genes. For each gene, we train a small bank of gene variants. Thus, by independently sampling a variant for each gene and combining them into the final latent vector, our approach can represent a vast number of unique latent samples from a compact set of learnable parameters. Interestingly, our gene-inspired latent encoding allows for new and intuitive approaches to latent-space exploration, enabling conditional sampling from our unconditionally trained model. Moreover, our approach preserves state-of-the-art photo-realism while achieving better disentanglement than the widely-used StyleMapping network.

Via

Access Paper or Ask Questions

SAM Struggles in Concealed Scenes -- Empirical Study on "Segment Anything"

Apr 27, 2023

Ge-Peng Ji, Deng-Ping Fan, Peng Xu, Ming-Ming Cheng, Bowen Zhou, Luc Van Gool

Abstract:Segmenting anything is a ground-breaking step toward artificial general intelligence, and the Segment Anything Model (SAM) greatly fosters the foundation models for computer vision. We could not be more excited to probe the performance traits of SAM. In particular, exploring situations in which SAM does not perform well is interesting. In this report, we choose three concealed scenes, i.e., camouflaged animals, industrial defects, and medical lesions, to evaluate SAM under unprompted settings. Our main observation is that SAM looks unskilled in concealed scenes.

* Report

Via

Access Paper or Ask Questions

EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

Apr 27, 2023

Suman Saha, Lukas Hoyer, Anton Obukhov, Dengxin Dai, Luc Van Gool

Figure 1 for EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

Figure 2 for EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

Figure 3 for EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

Figure 4 for EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation

Abstract:With autonomous industries on the rise, domain adaptation of the visual perception stack is an important research direction due to the cost savings promise. Much prior art was dedicated to domain-adaptive semantic segmentation in the synthetic-to-real context. Despite being a crucial output of the perception stack, panoptic segmentation has been largely overlooked by the domain adaptation community. Therefore, we revisit well-performing domain adaptation strategies from other fields, adapt them to panoptic segmentation, and show that they can effectively enhance panoptic domain adaptation. Further, we study the panoptic network design and propose a novel architecture (EDAPS) designed explicitly for domain-adaptive panoptic segmentation. It uses a shared, domain-robust transformer encoder to facilitate the joint adaptation of semantic and instance features, but task-specific decoders tailored for the specific requirements of both domain-adaptive semantic and instance segmentation. As a result, the performance gap seen in challenging panoptic benchmarks is substantially narrowed. EDAPS significantly improves the state-of-the-art performance for panoptic segmentation UDA by a large margin of 25% on SYNTHIA-to-Cityscapes and even 72% on the more challenging SYNTHIA-to-Mapillary Vistas. The implementation is available at https://github.com/susaha/edaps.

Via

Access Paper or Ask Questions

Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation

Apr 26, 2023

Lukas Hoyer, Dengxin Dai, Luc Van Gool

Abstract:Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG. It is enabled by three training strategies to avoid overfitting to the source domain: While (1) Rare Class Sampling mitigates the bias toward common source domain classes, (2) a Thing-Class ImageNet Feature Distance and (3) a learning rate warmup promote feature transfer from ImageNet pretraining. As UDA&DG are usually GPU memory intensive, most previous methods downscale or crop images. However, low-resolution predictions often fail to preserve fine details while models trained with cropped images fall short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention. DAFormer and HRDA significantly improve the state-of-the-art UDA&DG by more than 10 mIoU on 5 different benchmarks. The implementation is available at https://github.com/lhoyer/HRDA.

Via

Access Paper or Ask Questions

Indiscernible Object Counting in Underwater Scenes

Apr 23, 2023

Guolei Sun, Zhaochong An, Yun Liu, Ce Liu, Christos Sakaridis, Deng-Ping Fan, Luc Van Gool

Figure 1 for Indiscernible Object Counting in Underwater Scenes

Figure 2 for Indiscernible Object Counting in Underwater Scenes

Figure 3 for Indiscernible Object Counting in Underwater Scenes

Figure 4 for Indiscernible Object Counting in Underwater Scenes

Abstract:Recently, indiscernible scene understanding has attracted a lot of attention in the vision community. We further advance the frontier of this field by systematically studying a new challenge named indiscernible object counting (IOC), the goal of which is to count objects that are blended with respect to their surroundings. Due to a lack of appropriate IOC datasets, we present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points. Our dataset consists of a large number of indiscernible objects (mainly fish) in underwater scenes, making the annotation process all the more challenging. IOCfish5K is superior to existing datasets with indiscernible scenes because of its larger scale, higher image resolutions, more annotations, and denser scenes. All these aspects make it the most challenging dataset for IOC so far, supporting progress in this area. For benchmarking purposes, we select 14 mainstream methods for object counting and carefully evaluate them on IOCfish5K. Furthermore, we propose IOCFormer, a new strong baseline that combines density and regression branches in a unified framework and can effectively tackle object counting under concealed scenes. Experiments show that IOCFormer achieves state-of-the-art scores on IOCfish5K.

* To appear in CVPR 2023. The resources are available at https://github.com/GuoleiSun/Indiscernible-Object-Counting

Via

Access Paper or Ask Questions

Advances in Deep Concealed Scene Understanding

Apr 21, 2023

Deng-Ping Fan, Ge-Peng Ji, Peng Xu, Ming-Ming Cheng, Christos Sakaridis, Luc Van Gool

Abstract:Concealed scene understanding (CSU) is a hot computer vision topic aiming to perceive objects with camouflaged properties. The current boom in its advanced techniques and novel applications makes it timely to provide an up-to-date survey to enable researchers to understand the global picture of the CSU field, including both current achievements and major challenges. This paper makes four contributions: (1) For the first time, we present a comprehensive survey of the deep learning techniques oriented at CSU, including a background with its taxonomy, task-unique challenges, and a review of its developments in the deep learning era via surveying existing datasets and deep techniques. (2) For a quantitative comparison of the state-of-the-art, we contribute the largest and latest benchmark for Concealed Object Segmentation (COS). (3) To evaluate the transferability of deep CSU in practical scenarios, we re-organize the largest concealed defect segmentation dataset termed CDS2K with the hard cases from diversified industrial scenarios, on which we construct a comprehensive benchmark. (4) We discuss open problems and potential research directions for this community. Our code and datasets are available at https://github.com/DengPingFan/CSU, which will be updated continuously to watch and summarize the advancements in this rapidly evolving field.

* 18 pages, 6 figures, 8 tables

Via

Access Paper or Ask Questions

Quantum Annealing for Single Image Super-Resolution

Apr 18, 2023

Han Yao Choong, Suryansh Kumar, Luc Van Gool

Figure 1 for Quantum Annealing for Single Image Super-Resolution

Figure 2 for Quantum Annealing for Single Image Super-Resolution

Figure 3 for Quantum Annealing for Single Image Super-Resolution

Figure 4 for Quantum Annealing for Single Image Super-Resolution

Abstract:This paper proposes a quantum computing-based algorithm to solve the single image super-resolution (SISR) problem. One of the well-known classical approaches for SISR relies on the well-established patch-wise sparse modeling of the problem. Yet, this field's current state of affairs is that deep neural networks (DNNs) have demonstrated far superior results than traditional approaches. Nevertheless, quantum computing is expected to become increasingly prominent for machine learning problems soon. As a result, in this work, we take the privilege to perform an early exploration of applying a quantum computing algorithm to this important image enhancement problem, i.e., SISR. Among the two paradigms of quantum computing, namely universal gate quantum computing and adiabatic quantum computing (AQC), the latter has been successfully applied to practical computer vision problems, in which quantum parallelism has been exploited to solve combinatorial optimization efficiently. This work demonstrates formulating quantum SISR as a sparse coding optimization problem, which is solved using quantum annealers accessed via the D-Wave Leap platform. The proposed AQC-based algorithm is demonstrated to achieve improved speed-up over a classical analog while maintaining comparable SISR accuracy.

* Accepted to IEEE/CVF CVPR 2023, NTIRE Challenge and Workshop. Draft info: 10 pages, 6 Figures, 2 Tables

Via

Access Paper or Ask Questions

Single Image Depth Prediction Made Better: A Multivariate Gaussian Take

Apr 18, 2023

Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc Van Gool

Abstract:Neural-network-based single image depth prediction (SIDP) is a challenging task where the goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is ill-posed, the fundamental goal is to come up with an approach that can reliably model the scene depth from a set of training examples. In the pursuit of perfect depth estimation, most existing state-of-the-art learning techniques predict a single scalar depth value per-pixel. Yet, it is well-known that the trained model has accuracy limits and can predict imprecise depth. Therefore, an SIDP approach must be mindful of the expected depth variations in the model's prediction at test time. Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution. To this end, we model per-pixel scene depth using a multivariate Gaussian distribution. Moreover, contrary to the existing uncertainty modeling methods -- in the same spirit, where per-pixel depth is assumed to be independent, we introduce per-pixel covariance modeling that encodes its depth dependency w.r.t all the scene points. Unfortunately, per-pixel depth covariance modeling leads to a computationally expensive continuous loss function, which we solve efficiently using the learned low-rank approximation of the overall covariance matrix. Notably, when tested on benchmark datasets such as KITTI, NYU, and SUN-RGB-D, the SIDP model obtained by optimizing our loss function shows state-of-the-art results. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.

* Accepted to IEEE/CVF CVPR 2023. Draft info: 17 pages, 13 Figures, 9 Tables

Via

Access Paper or Ask Questions

SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders

Apr 14, 2023

Qingsen Yan, Song Zhang, Weiye Chen, Hao Tang, Yu Zhu, Jinqiu Sun, Luc Van Gool, Yanning Zhang

Figure 1 for SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders

Figure 2 for SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders

Figure 3 for SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders

Figure 4 for SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders

Abstract:Generating a high-quality High Dynamic Range (HDR) image from dynamic scenes has recently been extensively studied by exploiting Deep Neural Networks (DNNs). Most DNNs-based methods require a large amount of training data with ground truth, requiring tedious and time-consuming work. Few-shot HDR imaging aims to generate satisfactory images with limited data. However, it is difficult for modern DNNs to avoid overfitting when trained on only a few images. In this work, we propose a novel semi-supervised approach to realize few-shot HDR imaging via two stages of training, called SSHDR. Unlikely previous methods, directly recovering content and removing ghosts simultaneously, which is hard to achieve optimum, we first generate content of saturated regions with a self-supervised mechanism and then address ghosts via an iterative semi-supervised learning framework. Concretely, considering that saturated regions can be regarded as masking Low Dynamic Range (LDR) input regions, we design a Saturated Mask AutoEncoder (SMAE) to learn a robust feature representation and reconstruct a non-saturated HDR image. We also propose an adaptive pseudo-label selection strategy to pick high-quality HDR pseudo-labels in the second stage to avoid the effect of mislabeled samples. Experiments demonstrate that SSHDR outperforms state-of-the-art methods quantitatively and qualitatively within and across different datasets, achieving appealing HDR visualization with few labeled samples.

* accepted by CVPR2023

Via

Access Paper or Ask Questions

CamDiff: Camouflage Image Augmentation via Diffusion Model

Apr 11, 2023

Xue-Jing Luo, Shuo Wang, Zongwei Wu, Christos Sakaridis, Yun Cheng, Deng-Ping Fan, Luc Van Gool

Abstract:The burgeoning field of camouflaged object detection (COD) seeks to identify objects that blend into their surroundings. Despite the impressive performance of recent models, we have identified a limitation in their robustness, where existing methods may misclassify salient objects as camouflaged ones, despite these two characteristics being contradictory. This limitation may stem from lacking multi-pattern training images, leading to less saliency robustness. To address this issue, we introduce CamDiff, a novel approach inspired by AI-Generated Content (AIGC) that overcomes the scarcity of multi-pattern training images. Specifically, we leverage the latent diffusion model to synthesize salient objects in camouflaged scenes, while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training (CLIP) model to prevent synthesis failures and ensure the synthesized object aligns with the input prompt. Consequently, the synthesized image retains its original camouflage label while incorporating salient objects, yielding camouflage samples with richer characteristics. The results of user studies show that the salient objects in the scenes synthesized by our framework attract the user's attention more; thus, such samples pose a greater challenge to the existing COD models. Our approach enables flexible editing and efficient large-scale dataset generation at a low cost. It significantly enhances COD baselines' training and testing phases, emphasizing robustness across diverse domains. Our newly-generated datasets and source code are available at https://github.com/drlxj/CamDiff.

Via

Access Paper or Ask Questions