Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bar Shaybet

SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization

Oct 26, 2025

Bar Shaybet, Vladimir Tourbabin, Boaz Rafaely

Figure 1 for SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization

Figure 2 for SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization

Figure 3 for SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization

Figure 4 for SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization

Abstract:Accurate Direction-of-Arrival (DOA) estimation in reverberant environments remains a fundamental challenge for spatial audio applications. While deep learning methods have shown strong performance in such conditions, they typically lack a mechanism to assess the reliability of their predictions - an essential feature for real-world deployment. In this work, we present the SRP-PHAT-NET, a deep neural network framework that leverages SRP-PHAT directional maps as spatial features and introduces a built-in reliability estimation. To enable meaningful reliability scoring, the model is trained using Gaussian-weighted labels centered around the true direction. We systematically analyze the influence of label smoothing on accuracy and reliability, demonstrating that the choice of Gaussian kernel width can be tuned to application-specific requirements. Experimental results show that selectively using high-confidence predictions yields significantly improved localization accuracy, highlighting the practical benefits of integrating reliability into deep learning-based DOA estimation.

* In submission process to the IEEE Transactions on Audio, Speech and Language Processing, 2025

Via

Access Paper or Ask Questions

Ambisonics Networks -- The Effect Of Radial Functions Regularization

Feb 29, 2024

Bar Shaybet, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

Abstract:Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This can be overcome by regularization, with the downside of introducing errors to the Ambisonics encoding. This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. Ideally, these networks should be robust to the way of regularization. Simulated data of a single speaker in a room and experimental data from the LOCATA challenge were used to evaluate this robustness on an example algorithm of speaker localization based on the direct-path dominance (DPD) test. Results show that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated, highlighting the importance of regularization information.

* to be published in Icassp 2024

Via

Access Paper or Ask Questions