Alert button

"speech": models, code, and papers
Alert button

Soft Random Sampling: A Theoretical and Empirical Analysis

Nov 21, 2023
Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei Zhang, George Saon, Brian Kingsbury

Viaarxiv icon

ChatGPT in the context of precision agriculture data analytics

Nov 10, 2023
Ilyas Potamitis

Viaarxiv icon

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Add code
Bookmark button
Alert button
Sep 12, 2023
Ahmed Adel Attia, Jing Liu, Wei Ai, Dorottya Demszky, Carol Espy-Wilson

Figure 1 for Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
Figure 2 for Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
Figure 3 for Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
Figure 4 for Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
Viaarxiv icon

Spiking Structured State Space Model for Monaural Speech Enhancement

Sep 07, 2023
Yu Du, Xu Liu, Yansong Chua

Figure 1 for Spiking Structured State Space Model for Monaural Speech Enhancement
Figure 2 for Spiking Structured State Space Model for Monaural Speech Enhancement
Figure 3 for Spiking Structured State Space Model for Monaural Speech Enhancement
Figure 4 for Spiking Structured State Space Model for Monaural Speech Enhancement
Viaarxiv icon

DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation

Add code
Bookmark button
Alert button
Sep 13, 2023
Zhichao Wu, Qiulin Li, Sixing Liu, Qun Yang

Viaarxiv icon

Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks

Sep 14, 2023
Sizhou Chen, Songyang Gao, Sen Fang

Figure 1 for Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Figure 2 for Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Figure 3 for Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Figure 4 for Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Viaarxiv icon

RoDia: A New Dataset for Romanian Dialect Identification from Speech

Add code
Bookmark button
Alert button
Sep 06, 2023
Codrut Rotaru, Nicolae-Catalin Ristea, Radu Tudor Ionescu

Figure 1 for RoDia: A New Dataset for Romanian Dialect Identification from Speech
Figure 2 for RoDia: A New Dataset for Romanian Dialect Identification from Speech
Figure 3 for RoDia: A New Dataset for Romanian Dialect Identification from Speech
Figure 4 for RoDia: A New Dataset for Romanian Dialect Identification from Speech
Viaarxiv icon

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Add code
Bookmark button
Alert button
Sep 15, 2023
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

Figure 1 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 2 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 3 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 4 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Viaarxiv icon

Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models

Sep 22, 2023
Asad Ullah, Alessandro Ragano, Andrew Hines

Figure 1 for Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models
Figure 2 for Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models
Figure 3 for Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models
Figure 4 for Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models
Viaarxiv icon

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Add code
Bookmark button
Alert button
Sep 18, 2023
Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee

Figure 1 for Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Figure 2 for Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Figure 3 for Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Figure 4 for Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Viaarxiv icon