Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vesa Välimäki

Four Decades of Digital Waveguides

Apr 14, 2026

Pablo Tablas de Paula, Julius O. Smith, Vesa Välimäki, Joshua D. Reiss

Abstract:Digital waveguide physical modeling offers efficient simulation of acoustic wave propagation as compared to general finite-difference schemes commonly used in computational physics. This efficiency has enabled the real-time implementation of physically modeled musical instruments and sound effects, as well as real-time vocal models and artificial reverberation. This paper provides an overview of the historical evolution and applications of digital waveguide modeling and highlights recent advances in the field. Parametric optimization using classical, evolutionary and neural approaches are also discussed and compared. Digital waveguides provide physically accurate simulations with reduced computational cost, and can now be optimized with modern machine learning and differentiable digital signal processing techniques.

Via

Access Paper or Ask Questions

Solving Room Impulse Response Inverse Problems Using Flow Matching with Analytic Wiener Denoiser

Jan 31, 2026

Kyung Yun Lee, Nils Meyer-Kahlen, Vesa Välimäki, Sebastian J. Schlecht

Abstract:Room impulse response (RIR) estimation naturally arises as a class of inverse problems, including denoising and deconvolution. While recent approaches often rely on supervised learning or learned generative priors, such methods require large amounts of training data and may generalize poorly outside the training distribution. In this work, we present RIRFlow, a training-free Bayesian framework for RIR inverse problems using flow matching. We derive a flow-consistent analytic prior from the statistical structure of RIRs, eliminating the need for data-driven priors. Specifically, we model RIR as a Gaussian process with exponentially decaying variance, which yields a closed-form minimum mean squared error (MMSE) Wiener denoiser. This analytic denoiser is integrated as a prior in an existing flow-based inverse solver, where inverse problems are solved via guided posterior sampling. Furthermore, we extend the solver to nonlinear and non-Gaussian inverse problems via a local Gaussian approximation of the guided posterior, and empirically demonstrate that this approximation remains effective in practice. Experiments on real RIRs across different inverse problems demonstrate robust performance, highlighting the effectiveness of combining a classic RIR model with the recent flow-based generative inference.

* Submitted to the Journal of the Acoustical Society of America (JASA)

Via

Access Paper or Ask Questions

Learning Recursive Attenuation Filters Under Noisy Conditions

Dec 18, 2025

Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Figure 1 for Learning Recursive Attenuation Filters Under Noisy Conditions

Figure 2 for Learning Recursive Attenuation Filters Under Noisy Conditions

Figure 3 for Learning Recursive Attenuation Filters Under Noisy Conditions

Figure 4 for Learning Recursive Attenuation Filters Under Noisy Conditions

Abstract:Recursion is a fundamental concept in the design of filters and audio systems. In particular, artificial reverberation systems that use delay networks depend on recursive paths to control both echo density and the decay rate of modal components. The differentiable digital signal processing framework has shown promise in automatically tuning both recursive and non-recursive elements given a target room impulse response. This is done by applying gradient descent to loss functions based on energy-decay or spectrogram differences. However, these representations are highly sensitive to background noise, which is ubiquitous in real measurements, producing spurious loss minima and leading to incorrect attenuation. This paper addresses the problem of tuning recursive attenuation filters of a feedback delay network when targets are noisy. We examine the loss landscape associated with different optimization objectives and propose a method that ensures correct minima under low signal-to-noise conditions. We demonstrate the effectiveness of the proposed approach through statistical analysis on 80 individual optimization examples. The results reveal that explicitly modeling the noise restores correct minima. Furthermore, we identify the sensitivity of attenuation filter parameters tuning to perturbations in frequency-independent parameters. These findings provide practical guidelines for more robust and reproducible gradient-based optimization of feedback delay networks.

* Submitted to the Journal of Audio Engineering Society

Via

Access Paper or Ask Questions

Automatic Music Mixing using a Generative Model of Effect Embeddings

Nov 11, 2025

Eloi Moliner, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Kin Wai Cheuk, Joan Serrà, Vesa Välimäki, Yuki Mitsufuji

Abstract:Music mixing involves combining individual tracks into a cohesive mixture, a task characterized by subjectivity where multiple valid solutions exist for the same input. Existing automatic mixing systems treat this task as a deterministic regression problem, thus ignoring this multiplicity of solutions. Here we introduce MEGAMI (Multitrack Embedding Generative Auto MIxing), a generative framework that models the conditional distribution of professional mixes given unprocessed tracks. MEGAMI uses a track-agnostic effects processor conditioned on per-track generated embeddings, handles arbitrary unlabeled tracks through a permutation-equivariant architecture, and enables training on both dry and wet recordings via domain adaptation. Our objective evaluation using distributional metrics shows consistent improvements over existing methods, while listening tests indicate performances approaching human-level quality across diverse musical genres.

* submitted to IEEE ICASSP 2026

Via

Access Paper or Ask Questions

Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial approaches

Apr 07, 2025

Eloi Moliner, Michal Švento, Alec Wright, Lauri Juvela, Pavel Rajmic, Vesa Välimäki

Abstract:Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem.This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this application, based on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using black- and gray-box models. This study compares this method with a previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the effect operator and varying lengths of available effected recordings.Through experiments on guitar distortion effects, we show that the diffusion-based approach provides more stable results and is less sensitive to data availability, while the adversarial approach is superior at estimating more pronounced distortion effects. Our findings contribute to the robust unsupervised blind estimation of audio effects, demonstrating the potential of diffusion models for system identification in music technology.

* Submitted to the 28th International Conference on Digital Audio Effects (DAFx25)

Via

Access Paper or Ask Questions

Resampling Filter Design for Multirate Neural Audio Effect Processing

Jan 30, 2025

Alistair Carson, Vesa Välimäki, Alec Wright, Stefan Bilbao

Abstract:Neural networks have become ubiquitous in audio effects modelling, especially for guitar amplifiers and distortion pedals. One limitation of such models is that the sample rate of the training data is implicitly encoded in the model weights and therefore not readily adjustable at inference. Recent work explored modifications to recurrent neural network architecture to approximate a sample rate independent system, enabling audio processing at a rate that differs from the original training rate. This method works well for integer oversampling and can reduce aliasing caused by nonlinear activation functions. For small fractional changes in sample rate, fractional delay filters can be used to approximate sample rate independence, but in some cases this method fails entirely. Here, we explore the use of signal resampling at the input and output of the neural network as an alternative solution. We investigate several resampling filter designs and show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method with many fewer operations per sample and less than one millisecond of latency at typical audio rates. Furthermore, we investigate interpolation and decimation filters for the task of integer oversampling and show that cascaded half-band IIR and FIR designs can be used in conjunction with the model adjustment method to reduce aliasing in a range of distortion effect models.

* Preprint

Via

Access Paper or Ask Questions

Estimation and Restoration of Unknown Nonlinear Distortion using Diffusion

Jan 10, 2025

Michal Švento, Eloi Moliner, Lauri Juvela, Alec Wright, Vesa Välimäki

Abstract:The restoration of nonlinearly distorted audio signals, alongside the identification of the applied memoryless nonlinear operation, is studied. The paper focuses on the difficult but practically important case in which both the nonlinearity and the original input signal are unknown. The proposed method uses a generative diffusion model trained unconditionally on guitar or speech signals to jointly model and invert the nonlinear system at inference time. Both the memoryless nonlinear function model and the restored audio signal are obtained as output. Successful example case studies are presented including inversion of hard and soft clipping, digital quantization, half-wave rectification, and wavefolding nonlinearities. Our results suggest that, out of the nonlinear functions tested here, the cubic Catmull-Rom spline is best suited to approximating these nonlinearities. In the case of guitar recordings, comparisons with informed and supervised methods show that the proposed blind method is at least as good as they are in terms of objective metrics. Experiments on distorted speech show that the proposed blind method outperforms general-purpose speech enhancement techniques and restores the original voice quality. The proposed method can be applied to audio effects modeling, restoration of music and speech recordings, and characterization of analog recording media.

* Submitted to the Journal of Audio Engineering Society, special issue "The Sound of Digital Audio Effects"

Via

Access Paper or Ask Questions

HRTF Estimation using a Score-based Prior

Oct 02, 2024

Etienne Thuillier, Jean-Marie Lemercier, Eloi Moliner, Timo Gerkmann, Vesa Välimäki

Figure 1 for HRTF Estimation using a Score-based Prior

Figure 2 for HRTF Estimation using a Score-based Prior

Figure 3 for HRTF Estimation using a Score-based Prior

Figure 4 for HRTF Estimation using a Score-based Prior

Abstract:We present a head-related transfer function (HRTF) estimation method which relies on a data-driven prior given by a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. The impulse response of the room is estimated along with the HRTF by optimizing a parametric model of reverberation based on the statistical behaviour of room acoustics. The posterior distribution of HRTF given the reverberant measurement and excitation signal is modelled using the score-based HRTF prior and a log-likelihood approximation. We show that the resulting method outperforms several baselines, including an oracle recommender system that assigns the optimal HRTF in our training set based on the smallest distance to the true HRTF at the given direction of arrival. In particular, we show that the diffusion prior can account for the large variability of high-frequency content in HRTFs.

Via

Access Paper or Ask Questions

FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Sep 13, 2024

Gloria Dal Santo, Gian Marco De Bortoli, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Figure 1 for FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Figure 2 for FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Figure 3 for FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Figure 4 for FLAMO: An Open-Source Library for Frequency-Domain Differentiable Audio Processing

Abstract:We present FLAMO, a Frequency-sampling Library for Audio-Module Optimization designed to implement and optimize differentiable linear time-invariant audio systems. The library is open-source and built on the frequency-sampling filter design method, allowing for the creation of differentiable modules that can be used stand-alone or within the computation graph of neural networks, simplifying the development of differentiable audio systems. It includes predefined filtering modules and auxiliary classes for constructing, training, and logging the optimized systems, all accessible through an intuitive interface. Practical application of these modules is demonstrated through two case studies: the optimization of an artificial reverberator and an active acoustics system for improved response smoothness.

Via

Access Paper or Ask Questions

Similarity Metrics For Late Reverberation

Aug 27, 2024

Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa Välimäki

Figure 1 for Similarity Metrics For Late Reverberation

Figure 2 for Similarity Metrics For Late Reverberation

Figure 3 for Similarity Metrics For Late Reverberation

Abstract:Automatic tuning of reverberation algorithms relies on the optimization of a cost function. While general audio similarity metrics are useful, they are not optimized for the specific statistical properties of reverberation in rooms. This paper presents two novel metrics for assessing the similarity of late reverberation in room impulse responses. These metrics are differentiable and can be utilized within a machine-learning framework. We compare the performance of these metrics to two popular audio metrics using a large dataset of room impulse responses encompassing various room configurations and microphone positions. The results indicate that the proposed functions based on averaged power and frequency-band energy decay outperform the baselines with the former exhibiting the most suitable profile towards the minimum. The proposed work holds promise as an improvement to the design and evaluation of reverberation similarity metrics.

Via

Access Paper or Ask Questions