Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Felix Perfler

ISAC: An Invertible and Stable Auditory Filter Bank with Customizable Kernels for ML Integration

May 12, 2025

Daniel Haider, Felix Perfler, Peter Balazs, Clara Hollomey, Nicki Holighaus

Abstract:This paper introduces ISAC, an invertible and stable, perceptually-motivated filter bank that is specifically designed to be integrated into machine learning paradigms. More precisely, the center frequencies and bandwidths of the filters are chosen to follow a non-linear, auditory frequency scale, the filter kernels have user-defined maximum temporal support and may serve as learnable convolutional kernels, and there exists a corresponding filter bank such that both form a perfect reconstruction pair. ISAC provides a powerful and user-friendly audio front-end suitable for any application, including analysis-synthesis schemes.

* Accepted at the IEEE International Conference on Sampling Theory and Applications (SampTA) 2025

Via

Access Paper or Ask Questions

Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Aug 30, 2024

Daniel Haider, Felix Perfler, Vincent Lostanlen, Martin Ehler, Peter Balazs

Figure 1 for Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Figure 2 for Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Figure 3 for Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Figure 4 for Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Abstract:Convolutional layers with 1-D filters are often used as frontend to encode audio signals. Unlike fixed time-frequency representations, they can adapt to the local characteristics of input data. However, 1-D filters on raw audio are hard to train and often suffer from instabilities. In this paper, we address these problems with hybrid solutions, i.e., combining theory-driven and data-driven approaches. First, we preprocess the audio signals via a auditory filterbank, guaranteeing good frequency localization for the learned encoder. Second, we use results from frame theory to define an unsupervised learning objective that encourages energy conservation and perfect reconstruction. Third, we adapt mixed compressed spectral norms as learning objectives to the encoder coefficients. Using these solutions in a low-complexity encoder-mask-decoder model significantly improves the perceptual evaluation of speech quality (PESQ) in speech enhancement.

* Accepted at INTERSPEECH 2024

Via

Access Paper or Ask Questions