Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Clara Hollomey

ISAC: An Invertible and Stable Auditory Filter Bank with Customizable Kernels for ML Integration

May 12, 2025

Daniel Haider, Felix Perfler, Peter Balazs, Clara Hollomey, Nicki Holighaus

Abstract:This paper introduces ISAC, an invertible and stable, perceptually-motivated filter bank that is specifically designed to be integrated into machine learning paradigms. More precisely, the center frequencies and bandwidths of the filters are chosen to follow a non-linear, auditory frequency scale, the filter kernels have user-defined maximum temporal support and may serve as learnable convolutional kernels, and there exists a corresponding filter bank such that both form a perfect reconstruction pair. ISAC provides a powerful and user-friendly audio front-end suitable for any application, including analysis-synthesis schemes.

* Accepted at the IEEE International Conference on Sampling Theory and Applications (SampTA) 2025

Via

Access Paper or Ask Questions

Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

Jan 04, 2023

Nicki Holighaus, Günther Koliander, Clara Hollomey, Friedrich Pillichshammer

Figure 1 for Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

Figure 2 for Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

Figure 3 for Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

Figure 4 for Grid-Based Decimation for Wavelet Transforms with Stably Invertible Implementation

Abstract:The constant center frequency to bandwidth ratio (Q-factor) of wavelet transforms provides a very natural representation for audio data. However, invertible wavelet transforms have either required non-uniform decimation -- leading to irregular data structures that are cumbersome to work with -- or require excessively high oversampling with unacceptable computational overhead. Here, we present a novel decimation strategy for wavelet transforms that leads to stable representations with oversampling rates close to one and uniform decimation. Specifically, we show that finite implementations of the resulting representation are energy-preserving in the sense of frame theory. The obtained wavelet coefficients can be stored in a timefrequency matrix with a natural interpretation of columns as time frames and rows as frequency channels. This matrix structure immediately grants access to a large number of algorithms that are successfully used in time-frequency audio processing, but could not previously be used jointly with wavelet transforms. We demonstrate the application of our method in processing based on nonnegative matrix factorization, in onset detection, and in phaseless reconstruction.

Via

Access Paper or Ask Questions