Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dimme de Groot

A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition

Mar 10, 2026

Dimme de Groot, Yuanyuan Zhang, Jorge Martinez, Odette Scharenborg

Abstract:We present DRES: a 1.5-hour Dutch realistic elicited (semi-spontaneous) speech dataset from 80 speakers recorded in noisy, public indoor environments. DRES was designed as a test set for the evaluation of state-of-the-art (SOTA) automatic speech recognition (ASR) and speech enhancement (SE) models in a real-world scenario: a person speaking in a public indoor space with background talkers and noise. The speech was recorded with a four-channel linear microphone array. In this work we evaluate the speech quality of five well-known single-channel SE algorithms and the recognition performance of eight SOTA off-the-shelf ASR models before and after applying SE on the speech of DRES. We found that five out of the eight ASR models have WERs lower than 22% on DRES, despite the challenging conditions. In contrast to recent work, we did not find a positive effect of modern single-channel SE on ASR performance, emphasizing the importance of evaluating in realistic conditions.

* Submitted to Interspeech 2026

Via

Access Paper or Ask Questions

Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech

Aug 25, 2025

Dimme de Groot, Tanvina Patel, Devendra Kayande, Odette Scharenborg, Zhengjun Yue

Abstract:Dysarthric speech poses significant challenges for automatic speech recognition (ASR) systems due to its high variability and reduced intelligibility. In this work we explore the use of diffusion models for dysarthric speech enhancement, which is based on the hypothesis that using diffusion-based speech enhancement moves the distribution of dysarthric speech closer to that of typical speech, which could potentially improve dysarthric speech recognition performance. We assess the effect of two diffusion-based and one signal-processing-based speech enhancement algorithms on intelligibility and speech quality of two English dysarthric speech corpora. We applied speech enhancement to both typical and dysarthric speech and evaluate the ASR performance using Whisper-Turbo, and the subjective and objective speech quality of the original and enhanced dysarthric speech. We also fine-tuned Whisper-Turbo on the enhanced speech to assess its impact on recognition performance.

* Accepted to Interspeech 2025

Via

Access Paper or Ask Questions

Performance of Objective Speech Quality Metrics on Languages Beyond Validation Data: A Study of Turkish and Korean

May 22, 2025

Javier Perez, Dimme de Groot, Jorge Martinez

Abstract:Objective speech quality measures are widely used to assess the performance of video conferencing platforms and telecommunication systems. They predict human-rated speech quality and are crucial for assessing the systems quality of experience. Despite the widespread use, the quality measures are developed on a limited set of languages. This can be problematic since the performance on unseen languages is consequently not guaranteed or even studied. Here we raise awareness to this issue by investigating the performance of two objective speech quality measures (PESQ and ViSQOL) on Turkish and Korean. Using English as baseline, we show that Turkish samples have significantly higher ViSQOL scores and that for Turkish male speakers the correlation between PESQ and ViSQOL is highest. These results highlight the need to explore biases across metrics and to develop a labeled speech quality dataset with a variety of languages.

Via

Access Paper or Ask Questions

Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications

Jan 14, 2025

Dimme de Groot, Baturalp Karslioglu, Odette Scharenborg, Jorge Martinez

Figure 1 for Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications

Figure 2 for Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications

Figure 3 for Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications

Figure 4 for Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications

Abstract:In this paper we propose a robust loudspeaker beamforming algorithm which is used to enhance the performance of voice driven applications in scenarios where the loudspeakers introduce the majority of the noise, e.g. when music is playing loudly. The loudspeaker beamformer modifies the loudspeaker playback signals to create a low-acoustic-energy region around the device that implements automatic speech recognition for a voice driven application (VDA). The algorithm utilises a distortion measure based on human auditory perception to limit the distortion perceived by human listeners. Simulations and real-world experiments show that the proposed loudspeaker beamformer improves the speech recognition performance in all tested scenarios. Moreover, the algorithm allows to further reduce the acoustic energy around the VDA device at the expense of reduced objective audio quality at the listener's location.

* To appear at ICASSP 2025

Via

Access Paper or Ask Questions