Alert button
Picture for Federico Raue

Federico Raue

Alert button

YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution

Aug 15, 2023
Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

Figure 1 for YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution
Figure 2 for YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution
Figure 3 for YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution
Figure 4 for YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution

This work introduces "You Only Diffuse Areas" (YODA), a novel method for partial diffusion in Single-Image Super-Resolution (SISR). The core idea is to utilize diffusion selectively on spatial regions based on attention maps derived from the low-resolution image and the current time step in the diffusion process. This time-dependent targeting enables a more effective conversion to high-resolution outputs by focusing on areas that benefit the most from the iterative refinement process, i.e., detail-rich objects. We empirically validate YODA by extending leading diffusion-based SISR methods SR3 and SRDiff. Our experiments demonstrate new state-of-the-art performance gains in face and general SR across PSNR, SSIM, and LPIPS metrics. A notable finding is YODA's stabilization effect on training by reducing color shifts, especially when induced by small batch sizes, potentially contributing to resource-constrained scenarios. The proposed spatial and temporal adaptive diffusion mechanism opens promising research directions, including developing enhanced attention map extraction techniques and optimizing inference latency based on sparser diffusion.

* Brian B. Moser and Stanislav Frolov contributed equally 
Viaarxiv icon

DWA: Differential Wavelet Amplifier for Image Super-Resolution

Jul 10, 2023
Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

Figure 1 for DWA: Differential Wavelet Amplifier for Image Super-Resolution
Figure 2 for DWA: Differential Wavelet Amplifier for Image Super-Resolution
Figure 3 for DWA: Differential Wavelet Amplifier for Image Super-Resolution
Figure 4 for DWA: Differential Wavelet Amplifier for Image Super-Resolution

This work introduces Differential Wavelet Amplifier (DWA), a drop-in module for wavelet-based image Super-Resolution (SR). DWA invigorates an approach recently receiving less attention, namely Discrete Wavelet Transformation (DWT). DWT enables an efficient image representation for SR and reduces the spatial area of its input by a factor of 4, the overall model size, and computation cost, framing it as an attractive approach for sustainable ML. Our proposed DWA model improves wavelet-based SR models by leveraging the difference between two convolutional filters to refine relevant feature extraction in the wavelet domain, emphasizing local contrasts and suppressing common noise in the input signals. We show its effectiveness by integrating it into existing SR models, e.g., DWSR and MWCNN, and demonstrate a clear improvement in classical SR tasks. Moreover, DWA enables a direct application of DWSR and MWCNN to input image space, reducing the DWT representation channel-wise since it omits traditional DWT.

Viaarxiv icon

DartsReNet: Exploring new RNN cells in ReNet architectures

Apr 11, 2023
Brian Moser, Federico Raue, Jörn Hees, Andreas Dengel

We present new Recurrent Neural Network (RNN) cells for image classification using a Neural Architecture Search (NAS) approach called DARTS. We are interested in the ReNet architecture, which is a RNN based approach presented as an alternative for convolutional and pooling steps. ReNet can be defined using any standard RNN cells, such as LSTM and GRU. One limitation is that standard RNN cells were designed for one dimensional sequential data and not for two dimensions like it is the case for image classification. We overcome this limitation by using DARTS to find new cell designs. We compare our results with ReNet that uses GRU and LSTM cells. Our found cells outperform the standard RNN cells on CIFAR-10 and SVHN. The improvements on SVHN indicate generalizability, as we derived the RNN cell designs from CIFAR-10 without performing a new cell search for SVHN.

Viaarxiv icon

Waving Goodbye to Low-Res: A Diffusion-Wavelet Approach for Image Super-Resolution

Apr 05, 2023
Brian Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel

Figure 1 for Waving Goodbye to Low-Res: A Diffusion-Wavelet Approach for Image Super-Resolution
Figure 2 for Waving Goodbye to Low-Res: A Diffusion-Wavelet Approach for Image Super-Resolution
Figure 3 for Waving Goodbye to Low-Res: A Diffusion-Wavelet Approach for Image Super-Resolution
Figure 4 for Waving Goodbye to Low-Res: A Diffusion-Wavelet Approach for Image Super-Resolution

This paper presents a novel Diffusion-Wavelet (DiWa) approach for Single-Image Super-Resolution (SISR). It leverages the strengths of Denoising Diffusion Probabilistic Models (DDPMs) and Discrete Wavelet Transformation (DWT). By enabling DDPMs to operate in the DWT domain, our DDPM models effectively hallucinate high-frequency information for super-resolved images on the wavelet spectrum, resulting in high-quality and detailed reconstructions in image space. Quantitatively, we outperform state-of-the-art diffusion-based SISR methods, namely SR3 and SRDiff, regarding PSNR, SSIM, and LPIPS on both face (8x scaling) and general (4x scaling) SR benchmarks. Meanwhile, using DWT enabled us to use fewer parameters than the compared models: 92M parameters instead of 550M compared to SR3 and 9.3M instead of 12M compared to SRDiff. Additionally, our method outperforms other state-of-the-art generative methods on classical general SR datasets while saving inference time. Finally, our work highlights its potential for various applications.

Viaarxiv icon

Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances

Sep 27, 2022
Brian Moser, Federico Raue, Stanislav Frolov, Jörn Hees, Sebastian Palacio, Andreas Dengel

Figure 1 for Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances
Figure 2 for Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances
Figure 3 for Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances
Figure 4 for Hitchhiker's Guide to Super-Resolution: Introduction and Recent Advances

With the advent of Deep Learning (DL), Super-Resolution (SR) has also become a thriving research area. However, despite promising results, the field still faces challenges that require further research e.g., allowing flexible upsampling, more effective loss functions, and better evaluation metrics. We review the domain of SR in light of recent advances, and examine state-of-the-art models such as diffusion (DDPM) and transformer-based SR models. We present a critical discussion on contemporary strategies used in SR, and identify promising yet unexplored research directions. We complement previous surveys by incorporating the latest developments in the field such as uncertainty-driven losses, wavelet networks, neural architecture search, novel normalization methods, and the latests evaluation techniques. We also include several visualizations for the models and methods throughout each chapter in order to facilitate a global understanding of the trends in the field. This review is ultimately aimed at helping researchers to push the boundaries of DL applied to SR.

Viaarxiv icon

Less is More: Proxy Datasets in NAS approaches

Mar 14, 2022
Brian Moser, Federico Raue, Jörn Hees, Andreas Dengel

Figure 1 for Less is More: Proxy Datasets in NAS approaches
Figure 2 for Less is More: Proxy Datasets in NAS approaches
Figure 3 for Less is More: Proxy Datasets in NAS approaches
Figure 4 for Less is More: Proxy Datasets in NAS approaches

Neural Architecture Search (NAS) defines the design of Neural Networks as a search problem. Unfortunately, NAS is computationally intensive because of various possibilities depending on the number of elements in the design and the possible connections between them. In this work, we extensively analyze the role of the dataset size based on several sampling approaches for reducing the dataset size (unsupervised and supervised cases) as an agnostic approach to reduce search time. We compared these techniques with four common NAS approaches in NAS-Bench-201 in roughly 1,400 experiments on CIFAR-100. One of our surprising findings is that in most cases we can reduce the amount of training data to 25\%, consequently reducing search time to 25\%, while at the same time maintaining the same accuracy as if training on the full dataset. Additionally, some designs derived from subsets out-perform designs derived from the full dataset by up to 22 p.p. accuracy.

Viaarxiv icon

Spatial Transformer Networks for Curriculum Learning

Aug 22, 2021
Fatemeh Azimi, Jean-Francois Jacques Nicolas Nies, Sebastian Palacio, Federico Raue, Jörn Hees, Andreas Dengel

Figure 1 for Spatial Transformer Networks for Curriculum Learning
Figure 2 for Spatial Transformer Networks for Curriculum Learning
Figure 3 for Spatial Transformer Networks for Curriculum Learning
Figure 4 for Spatial Transformer Networks for Curriculum Learning

Curriculum learning is a bio-inspired training technique that is widely adopted to machine learning for improved optimization and better training of neural networks regarding the convergence rate or obtained accuracy. The main concept in curriculum learning is to start the training with simpler tasks and gradually increase the level of difficulty. Therefore, a natural question is how to determine or generate these simpler tasks. In this work, we take inspiration from Spatial Transformer Networks (STNs) in order to form an easy-to-hard curriculum. As STNs have been proven to be capable of removing the clutter from the input images and obtaining higher accuracy in image classification tasks, we hypothesize that images processed by STNs can be seen as easier tasks and utilized in the interest of curriculum learning. To this end, we study multiple strategies developed for shaping the training curriculum, using the data generated by STNs. We perform various experiments on cluttered MNIST and Fashion-MNIST datasets, where on the former, we obtain an improvement of $3.8$pp in classification accuracy compared to the baseline.

Viaarxiv icon

A Reinforcement Learning Approach for Sequential Spatial Transformer Networks

Jun 27, 2021
Fatemeh Azimi, Federico Raue, Joern Hees, Andreas Dengel

Figure 1 for A Reinforcement Learning Approach for Sequential Spatial Transformer Networks
Figure 2 for A Reinforcement Learning Approach for Sequential Spatial Transformer Networks
Figure 3 for A Reinforcement Learning Approach for Sequential Spatial Transformer Networks
Figure 4 for A Reinforcement Learning Approach for Sequential Spatial Transformer Networks

Spatial Transformer Networks (STN) can generate geometric transformations which modify input images to improve the classifier's performance. In this work, we combine the idea of STN with Reinforcement Learning (RL). To this end, we break the affine transformation down into a sequence of simple and discrete transformations. We formulate the task as a Markovian Decision Process (MDP) and use RL to solve this sequential decision-making problem. STN architectures learn the transformation parameters by minimizing the classification error and backpropagating the gradients through a sub-differentiable sampling module. In our method, we are not bound to the differentiability of the sampling modules. Moreover, we have freedom in designing the objective rather than only minimizing the error; e.g., we can directly set the target as maximizing the accuracy. We design multiple experiments to verify the effectiveness of our method using cluttered MNIST and Fashion-MNIST datasets and show that our method outperforms STN with a proper definition of MDP components.

Viaarxiv icon

AudioCLIP: Extending CLIP to Image, Text and Audio

Jun 24, 2021
Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

Figure 1 for AudioCLIP: Extending CLIP to Image, Text and Audio
Figure 2 for AudioCLIP: Extending CLIP to Image, Text and Audio
Figure 3 for AudioCLIP: Extending CLIP to Image, Text and Audio
Figure 4 for AudioCLIP: Extending CLIP to Image, Text and Audio

In the past, the rapidly evolving field of sound classification greatly benefited from the application of methods from other domains. Today, we observe the trend to fuse domain-specific tasks and approaches together, which provides the community with new outstanding models. In this work, we present an extension of the CLIP model that handles audio in addition to text and images. Our proposed model incorporates the ESResNeXt audio-model into the CLIP framework using the AudioSet dataset. Such a combination enables the proposed model to perform bimodal and unimodal classification and querying, while keeping CLIP's ability to generalize to unseen datasets in a zero-shot inference fashion. AudioCLIP achieves new state-of-the-art results in the Environmental Sound Classification (ESC) task, out-performing other approaches by reaching accuracies of 90.07% on the UrbanSound8K and 97.15% on the ESC-50 datasets. Further it sets new baselines in the zero-shot ESC-task on the same datasets 68.78% and 69.40%, respectively). Finally, we also assess the cross-modal querying performance of the proposed model as well as the influence of full and partial training on the results. For the sake of reproducibility, our code is published.

* submitted to GCPR 2021 
Viaarxiv icon