Abstract:Small-size acoustic arrays exploit spatial diversity to achieve capabilities beyond those of single-element devices, with applications ranging from teleconferencing to immersive multimedia. A key requirement for broadband array processing is a frequency-invariant spatial response, which ensures consistent directivity across wide bandwidths and prevents spectral coloration. Differential beamforming offers an inherently frequency-invariant solution by leveraging pressure differences between closely spaced elements of small-size arrays. Traditional approaches, however, assume the array elements to be omnidirectional, whereas real transducers exhibit frequency-dependent directivity that can degrade performance if not properly modeled. To address this limitation, we propose a generalized modal matching framework for frequency-invariant differential beamforming, applicable to unconstrained planar arrays of first-order directional elements. By representing the desired beampattern as a truncated circular harmonic expansion and fitting it to the actual element responses, our method accommodates arbitrary planar geometries and element orientations. This approach enables the synthesis of beampatterns of any order and steering direction without imposing rigid layout requirements. Simulations confirm that accounting for sensor directivity at the design stage yields accurate and robust performance across varying frequencies, geometries, and noise conditions.



Abstract:Sound field reconstruction aims to estimate pressure fields in areas lacking direct measurements. Existing techniques often rely on strong assumptions or face challenges related to data availability or the explicit modeling of physical properties. To bridge these gaps, this study introduces a zero-shot, physics-informed dictionary learning approach to perform sound field reconstruction. Our method relies only on a few sparse measurements to learn a dictionary, without the need for additional training data. Moreover, by enforcing the Helmholtz equation during the optimization process, the proposed approach ensures that the reconstructed sound field is represented as a linear combination of a few physically meaningful atoms. Evaluations on real-world data show that our approach achieves comparable performance to state-of-the-art dictionary learning techniques, with the advantage of requiring only a few observations of the sound field and no training on a dataset.


Abstract:Spherical microphone arrays are convenient tools for capturing the spatial characteristics of a sound field. However, achieving superior spatial resolution requires arrays with numerous capsules, consequently leading to expensive devices. To address this issue, we present a method for spatially upsampling spherical microphone arrays with a limited number of capsules. Our approach exploits a physics-informed neural network with Rowdy activation functions, leveraging physical constraints to provide high-order microphone array signals, starting from low-order devices. Results show that, within its domain of application, our approach outperforms a state of the art method based on signal processing for spherical microphone arrays upsampling.




Abstract:In this paper, we present HOMULA-RIR, a dataset of room impulse responses (RIRs) acquired using both higher-order microphones (HOMs) and a uniform linear array (ULA), in order to model a remote attendance teleconferencing scenario. Specifically, measurements were performed in a seminar room, where a 64-microphone ULA was used as a multichannel audio acquisition system in the proximity of the speakers, while HOMs were used to model 25 attendees actually present in the seminar room. The HOMs cover a wide area of the room, making the dataset suitable also for applications of virtual acoustics. Through the measurement of the reverberation time and clarity index, and sample applications such as source localization and separation, we demonstrate the effectiveness of the HOMULA-RIR dataset.


Abstract:Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound field (SF-Diff) over an extended domain. The architecture is devised in order to be conditioned on a set of limited available measurements at different frequencies and generate the sound field in target, unknown, locations. The results show that SF-Diff is able to provide accurate reconstructions, outperforming a state-of-the-art baseline based on kernel interpolation.