Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maksim Kaledin

FastWave: Optimized Diffusion Model for Audio Super-Resolution

Mar 04, 2026

Nikita Kuznetsov, Maksim Kaledin

Abstract:Audio Super-Resolution is a set of techniques aimed at high-quality estimation of the given signal as if it would be sampled with higher sample rate. Among suggested methods there are diffusion and flow models (which are considered slower), generative adversarial networks (which are considered faster), however both approaches are currently presented by high-parametric networks, requiring high computational costs both for training and inference. We propose a solution to both these problems by re-considering the recent advances in the training of diffusion models and applying them to super-resolution from any to 48 kHz sample rate. Our approach shows better results than NU-Wave 2 and is comparable to state-of-the-art models. Our model called FastWave has around 50 GFLOPs of computational complexity and 1.3 M parameters and can be trained with less resources and significantly faster than the majority of recently proposed diffusion- and flow-based solutions. The code has been made publicly available.

Via

Access Paper or Ask Questions

HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks

Mar 21, 2025

Ekaterina Dmitrieva, Maksim Kaledin

Figure 1 for HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks

Figure 2 for HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks

Figure 3 for HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks

Figure 4 for HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks

Abstract:Speech Enhancement techniques have become core technologies in mobile devices and voice software simplifying downstream speech tasks. Still, modern Deep Learning (DL) solutions often require high amount of computational resources what makes their usage on low-resource devices challenging. We present HiFi-Stream, an optimized version of recently published HiFi++ model. Our experiments demonstrate that HiFiStream saves most of the qualities of the original model despite its size and computational complexity: the lightest version has only around 490k parameters which is 3.5x reduction in comparison to the original HiFi++ making it one of the smallest and fastest models available. The model is evaluated in streaming setting where it demonstrates its superior performance in comparison to modern baselines.

* 5 pages (4 content pages + 1 page of references)

Via

Access Paper or Ask Questions