Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sabine Süsstrunk

De-coupling and De-positioning Dense Self-supervised Learning

Mar 29, 2023
Congpei Qiu, Tong Zhang, Wei Ke, Mathieu Salzmann, Sabine Süsstrunk

Figure 1 for De-coupling and De-positioning Dense Self-supervised Learning

Figure 2 for De-coupling and De-positioning Dense Self-supervised Learning

Figure 3 for De-coupling and De-positioning Dense Self-supervised Learning

Figure 4 for De-coupling and De-positioning Dense Self-supervised Learning

Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. Although the dense features extracted by employing segmentation maps and bounding boxes allow networks to perform SSL for each object, we show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We address this by introducing three data augmentation strategies, and leveraging them in (i) a decoupling module that aims to robustify the network to variations in the object's surroundings, and (ii) a de-positioning module that encourages the network to discard positional object information. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection. Our extensive experiments evidence the better generalization of our method compared to the SOTA dense SSL methods

Via

Access Paper or Ask Questions

Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

Mar 28, 2023
Yanhao Wu, Tong Zhang, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

Figure 2 for Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

Figure 3 for Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

Figure 4 for Spatiotemporal Self-supervised Learning for Point Clouds in the Wild

Self-supervised learning (SSL) has the potential to benefit many applications, particularly those where manually annotating data is cumbersome. One such situation is the semantic segmentation of point clouds. In this context, existing methods employ contrastive learning strategies and define positive pairs by performing various augmentation of point clusters in a single frame. As such, these methods do not exploit the temporal nature of LiDAR data. In this paper, we introduce an SSL strategy that leverages positive pairs in both the spatial and temporal domain. To this end, we design (i) a point-to-cluster learning strategy that aggregates spatial information to distinguish objects; and (ii) a cluster-to-cluster learning strategy based on unsupervised object tracking that exploits temporal correspondences. We demonstrate the benefits of our approach via extensive experiments performed by self-supervised training on two large-scale LiDAR datasets and transferring the resulting models to other point cloud segmentation benchmarks. Our results evidence that our method outperforms the state-of-the-art point cloud SSL methods.

* CVPR accepted

Via

Access Paper or Ask Questions

NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects

Mar 21, 2023
Dongqing Wang, Tong Zhang, Sabine Süsstrunk

Figure 1 for NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects

Figure 2 for NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects

Figure 3 for NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects

Figure 4 for NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects

We propose NEMTO, the first end-to-end neural rendering pipeline to model 3D transparent objects with complex geometry and unknown indices of refraction. Commonly used appearance modeling such as the Disney BSDF model cannot accurately address this challenging problem due to the complex light paths bending through refractions and the strong dependency of surface appearance on illumination. With 2D images of the transparent object as input, our method is capable of high-quality novel view and relighting synthesis. We leverage implicit Signed Distance Functions (SDF) to model the object geometry and propose a refraction-aware ray bending network to model the effects of light refraction within the object. Our ray bending network is more tolerant to geometric inaccuracies than traditional physically-based methods for rendering transparent objects. We provide extensive evaluations on both synthetic and real-world datasets to demonstrate our high-quality synthesis and the applicability of our method.

Via

Access Paper or Ask Questions

TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

Jan 05, 2023
Bahar Aydemir, Ludo Hoffstetter, Tong Zhang, Mathieu Salzmann, Sabine Süsstrunk

Figure 1 for TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

Figure 2 for TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

Figure 3 for TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

Figure 4 for TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information, such as scene context, semantic relationships, gaze direction, and object dissimilarity. However, none of these models consider the temporal nature of gaze shifts during image observation. We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals by exploiting human temporal attention patterns. Our approach locally modulates the saliency predictions by combining the learned temporal maps. Our experiments show that our method outperforms the state-of-the-art models, including a multi-duration saliency model, on the SALICON benchmark. Our code will be publicly available on GitHub.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

DSI2I: Dense Style for Unpaired Image-to-Image Translation

Dec 29, 2022
Baran Ozaydin, Tong Zhang, Sabine Süsstrunk, Mathieu Salzmann

Figure 1 for DSI2I: Dense Style for Unpaired Image-to-Image Translation

Figure 2 for DSI2I: Dense Style for Unpaired Image-to-Image Translation

Figure 3 for DSI2I: Dense Style for Unpaired Image-to-Image Translation

Figure 4 for DSI2I: Dense Style for Unpaired Image-to-Image Translation

Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar, without ground-truth input-translation pairs. Existing UEI2I methods represent style using either a global, image-level feature vector, or one vector per object instance/class but requiring knowledge of the scene semantics. Here, by contrast, we propose to represent style as a dense feature map, allowing for a finer-grained transfer to the source image without requiring any external semantic information. We then rely on perceptual and adversarial losses to disentangle our dense style and content representations, and exploit unsupervised cross-domain semantic correspondences to warp the exemplar style to the source content. We demonstrate the effectiveness of our method on two datasets using standard metrics together with a new localized style metric measuring style similarity in a class-wise manner. Our results evidence that the translations produced by our approach are more diverse and closer to the exemplars than those of the state-of-the-art methods while nonetheless preserving the source content.

Via

Access Paper or Ask Questions

VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction

Dec 15, 2022
Yufan Ren, Fangjinhua Wang, Tong Zhang, Marc Pollefeys, Sabine Süsstrunk

Figure 1 for VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction

Figure 2 for VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction

Figure 3 for VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction

Figure 4 for VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction

With the success of neural volume rendering in novel view synthesis, neural implicit reconstruction with volume rendering has become popular. However, most methods optimize per-scene functions and are unable to generalize to novel scenes. We introduce VolRecon, a generalizable implicit reconstruction method with Signed Ray Distance Function (SRDF). To reconstruct with fine details and little noise, we combine projection features, aggregated from multi-view features with a view transformer, and volume features interpolated from a coarse global feature volume. A ray transformer computes SRDF values of all the samples along a ray to estimate the surface location, which are used for volume rendering of color and depth. Extensive experiments on DTU and ETH3D demonstrate the effectiveness and generalization ability of our method. On DTU, our method outperforms SparseNeuS by about 30% in sparse view reconstruction and achieves comparable quality as MVSNet in full view reconstruction. Besides, our method shows good generalization ability on the large-scale ETH3D benchmark. Project page: https://fangjinhuawang.github.io/VolRecon.

Via

Access Paper or Ask Questions

DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata

Nov 21, 2022
Ehsan Pajouheshgar, Yitao Xu, Tong Zhang, Sabine Süsstrunk

Figure 1 for DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata

Figure 2 for DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata

Figure 3 for DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata

Figure 4 for DyNCA: Real-time Dynamic Texture Synthesis Using Neural Cellular Automata

Current Dynamic Texture Synthesis (DyTS) models in the literature can synthesize realistic videos. However, these methods require a slow iterative optimization process to synthesize a single fixed-size short video, and they do not offer any post-training control over the synthesis process. We propose Dynamic Neural Cellular Automata (DyNCA), a framework for real-time and controllable dynamic texture synthesis. Our method is built upon the recently introduced NCA models, and can synthesize infinitely-long and arbitrary-size realistic texture videos in real-time. We quantitatively and qualitatively evaluate our model and show that our synthesized videos appear more realistic than the existing results. We improve the SOTA DyTS performance by $2\sim 4$ orders of magnitude. Moreover, our model offers several real-time and interactive video controls including motion speed, motion direction, and an editing brush tool.

Via

Access Paper or Ask Questions

PoGaIN: Poisson-Gaussian Image Noise Modeling from Paired Samples

Oct 10, 2022
Nicolas Bähler, Majed El Helou, Étienne Objois, Kaan Okumuş, Sabine Süsstrunk

Figure 1 for PoGaIN: Poisson-Gaussian Image Noise Modeling from Paired Samples

Figure 2 for PoGaIN: Poisson-Gaussian Image Noise Modeling from Paired Samples

Figure 3 for PoGaIN: Poisson-Gaussian Image Noise Modeling from Paired Samples

Figure 4 for PoGaIN: Poisson-Gaussian Image Noise Modeling from Paired Samples

Image noise can often be accurately fitted to a Poisson-Gaussian distribution. However, estimating the distribution parameters from only a noisy image is a challenging task. Here, we study the case when paired noisy and noise-free samples are available. No method is currently available to exploit the noise-free information, which holds the promise of achieving more accurate estimates. To fill this gap, we derive a novel, cumulant-based, approach for Poisson-Gaussian noise modeling from paired image samples. We show its improved performance over different baselines with special emphasis on MSE, effect of outliers, image dependence and bias, and additionally derive the log-likelihood function for further insight and discuss real-world applicability.

* 5 pages, 4 figures, and 3 tables. Code is available at https://github.com/IVRL/PoGaIN

Via

Access Paper or Ask Questions

DSR: Towards Drone Image Super-Resolution

Aug 25, 2022
Xiaoyu Lin, Baran Ozaydin, Vidit Vidit, Majed El Helou, Sabine Süsstrunk

Figure 1 for DSR: Towards Drone Image Super-Resolution

Figure 2 for DSR: Towards Drone Image Super-Resolution

Figure 3 for DSR: Towards Drone Image Super-Resolution

Figure 4 for DSR: Towards Drone Image Super-Resolution

Despite achieving remarkable progress in recent years, single-image super-resolution methods are developed with several limitations. Specifically, they are trained on fixed content domains with certain degradations (whether synthetic or real). The priors they learn are prone to overfitting the training configuration. Therefore, the generalization to novel domains such as drone top view data, and across altitudes, is currently unknown. Nonetheless, pairing drones with proper image super-resolution is of great value. It would enable drones to fly higher covering larger fields of view, while maintaining a high image quality. To answer these questions and pave the way towards drone image super-resolution, we explore this application with particular focus on the single-image case. We propose a novel drone image dataset, with scenes captured at low and high resolutions, and across a span of altitudes. Our results show that off-the-shelf state-of-the-art networks witness a significant drop in performance on this different domain. We additionally show that simple fine-tuning, and incorporating altitude awareness into the network's architecture, both improve the reconstruction performance.

* Accepted at ECCVW 2022

Via

Access Paper or Ask Questions

Fast Adversarial Training with Adaptive Step Size

Jun 06, 2022
Zhichao Huang, Yanbo Fan, Chen Liu, Weizhong Zhang, Yong Zhang, Mathieu Salzmann, Sabine Süsstrunk, Jue Wang

Figure 1 for Fast Adversarial Training with Adaptive Step Size

Figure 2 for Fast Adversarial Training with Adaptive Step Size

Figure 3 for Fast Adversarial Training with Adaptive Step Size

Figure 4 for Fast Adversarial Training with Adaptive Step Size

While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet. The key idea of recent works to accelerate adversarial training is to substitute multi-step attacks (e.g., PGD) with single-step attacks (e.g., FGSM). However, these single-step methods suffer from catastrophic overfitting, where the accuracy against PGD attack suddenly drops to nearly 0% during training, destroying the robustness of the networks. In this work, we study the phenomenon from the perspective of training instances. We show that catastrophic overfitting is instance-dependent and fitting instances with larger gradient norm is more likely to cause catastrophic overfitting. Based on our findings, we propose a simple but effective method, Adversarial Training with Adaptive Step size (ATAS). ATAS learns an instancewise adaptive step size that is inversely proportional to its gradient norm. The theoretical analysis shows that ATAS converges faster than the commonly adopted non-adaptive counterparts. Empirically, ATAS consistently mitigates catastrophic overfitting and achieves higher robust accuracy on CIFAR10, CIFAR100 and ImageNet when evaluated on various adversarial budgets.

Via

Access Paper or Ask Questions