Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongxiao Guo

AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition

May 29, 2025

Yuhang Dai, He Wang, Xingchen Li, Zihan Zhang, Shuiyuan Wang, Lei Xie, Xin Xu, Hongxiao Guo, Shaoji Zhang, Hui Bu(+1 more)

Abstract:This paper delineates AISHELL-5, the first open-source in-car multi-channel multi-speaker Mandarin automatic speech recognition (ASR) dataset. AISHLL-5 includes two parts: (1) over 100 hours of multi-channel speech data recorded in an electric vehicle across more than 60 real driving scenarios. This audio data consists of four far-field speech signals captured by microphones located on each car door, as well as near-field signals obtained from high-fidelity headset microphones worn by each speaker. (2) a collection of 40 hours of real-world environmental noise recordings, which supports the in-car speech data simulation. Moreover, we also provide an open-access, reproducible baseline system based on this dataset. This system features a speech frontend model that employs speech source separation to extract each speaker's clean speech from the far-field signals, along with a speech recognition module that accurately transcribes the content of each individual speaker. Experimental results demonstrate the challenges faced by various mainstream ASR models when evaluated on the AISHELL-5. We firmly believe the AISHELL-5 dataset will significantly advance the research on ASR systems under complex driving scenarios by establishing the first publicly available in-car ASR benchmark.

* 5 pages, 1 figures, 3 tables, accepted by InterSpeech 2025

Via

Access Paper or Ask Questions

Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Jun 14, 2024

Ming Gao, Hang Chen, Jun Du, Xin Xu, Hongxiao Guo, Hui Bu, Jianxing Yang, Ming Li, Chin-Hui Lee

Figure 1 for Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Figure 2 for Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Figure 3 for Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Figure 4 for Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Abstract:Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B.

* to be published in Interspeech 2024

Via

Access Paper or Ask Questions