Abstract:In sound field control applications, it is commonly assumed that one has access to an accurate representation of the sound field in the region of interest. This is a problematic assumption since the reconstruction of a sound field from available microphone measurements is especially challenging in real-time applications where only causal measurements are available. Notably, causal time-windowed observations introduce correlation between frequency components, making sound field reconstruction methods that process each frequency band independently sub-optimal. In this work, we formulate a causal finite-window spatio-temporal linear minimum mean-square error estimator for sound field reconstruction. The sound field is modeled as the solution to the wave equation driven by a stationary stochastic spatio-temporal source distribution, which induces a physically interpretable covariance function. It is shown that this covariance function is closely related to the classical diffuse-field coherence model. Since the computational complexity grows rapidly with the number of spatio-temporal observations, we formulate a budget-constrained spatio-temporal sample selection approach to minimize the posterior reconstruction variance. The proposed estimator and sampling strategy are evaluated using both simulated and measured sound fields, demonstrating improved short-window reconstruction compared to frequency domain finite-window baselines.




Abstract:We consider the problem of reconstructing the sound field in a room using prior information of the boundary geometry, represented as a point cloud. In general, when no boundary information is available, an accurate sound field reconstruction over a large spatial region and at high frequencies requires numerous microphone measurements. On the other hand, if all geometrical and acoustical aspects of the boundaries are known, the sound field could, in theory, be simulated without any measurements. In this work, we address the intermediate case, where only partial or uncertain boundary information is available. This setting is similar to one studied in virtual reality applications, where the goal is to create a perceptually convincing audio experience. In this work, we focus on spatial sound control applications, which in contrast require an accurate sound field reconstruction. Therefore, we formulate the problem within a linear Bayesian framework, incorporating a boundary-informed prior derived from impedance boundary conditions. The formulation allows for joint optimization of the unknown hyperparameters, including the noise and signal variances and the impedance boundary conditions. Using numerical experiments, we show that incorporating the boundary-informed prior significantly enhances the reconstruction, notably even when only a few hundreds of boundary points are available or when the boundary positions are calibrated with an uncertainty up to 1 dm.


Abstract:In this work, we introduce a spatio-temporal kernel for Gaussian process (GP) regression-based sound field estimation. Notably, GPs have the attractive property that the sound field is a linear function of the measurements, allowing the field to be estimated efficiently from distributed microphone measurements. However, to ensure analytical tractability, most existing kernels for sound field estimation have been formulated in the frequency domain, formed independently for each frequency. To address the analytical intractability of spatio-temporal kernels, we here propose to instead learn the kernel directly from data by the means of deep kernel learning. Furthermore, to improve the generalization of the deep kernel, we propose a method for regularizing the learning process using the wave equation. The representational advantages of the deep kernel and the improved generalization obtained by using the wave equation regularization are illustrated using numerical simulations.
Abstract:The ability to accurately estimate room impulse responses (RIRs) is integral to many applications of spatial audio processing. Regrettably, estimating the RIR using ambient signals, such as speech or music, remains a challenging problem due to, e.g., low signal-to-noise ratios, finite sample lengths, and poor spectral excitation. Commonly, in order to improve the conditioning of the estimation problem, priors are placed on the amplitudes of the RIR. Although serving as a regularizer, this type of prior is generally not useful when only approximate knowledge of the delay structure is available, which, for example, is the case when the prior is a simulated RIR from an approximation of the room geometry. In this work, we target the delay structure itself, constructing a prior based on the concept of optimal transport. As illustrated using both simulated and measured data, the resulting method is able to beneficially incorporate information even from simple simulation models, displaying considerable robustness to perturbations in the assumed room dimensions and its temperature.