Alert button

"speech": models, code, and papers
Alert button

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

Add code
Bookmark button
Alert button
Oct 02, 2023
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Figure 1 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 2 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 3 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Figure 4 for Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data
Viaarxiv icon

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Add code
Bookmark button
Alert button
Sep 07, 2023
Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao

Figure 1 for Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Figure 2 for Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Figure 3 for Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Figure 4 for Dual-path Transformer Based Neural Beamformer for Target Speech Extraction
Viaarxiv icon

Co-Speech Gesture Detection through Multi-phase Sequence Labeling

Aug 21, 2023
Esam Ghaleb, Ilya Burenko, Marlou Rasenberg, Wim Pouw, Peter Uhrig, Judith Holler, Ivan Toni, Aslı Özyürek, Raquel Fernández

Figure 1 for Co-Speech Gesture Detection through Multi-phase Sequence Labeling
Figure 2 for Co-Speech Gesture Detection through Multi-phase Sequence Labeling
Figure 3 for Co-Speech Gesture Detection through Multi-phase Sequence Labeling
Figure 4 for Co-Speech Gesture Detection through Multi-phase Sequence Labeling
Viaarxiv icon

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin

Add code
Bookmark button
Alert button
Sep 02, 2023
Tao Li, Chenxu Hu, Jian Cong, Xinfa Zhu, Jingbei Li, Qiao Tian, Yuping Wang, Lei Xie

Figure 1 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 2 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 3 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Figure 4 for DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Viaarxiv icon

Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

Jul 30, 2023
Eric Sun, Jinyu Li, Jian Xue, Yifan Gong

Viaarxiv icon

PILL: Plug Into LLM with Adapter Expert and Attention Gate

Add code
Bookmark button
Alert button
Nov 03, 2023
Fangyuan Zhang, Tingting Liang, Zhengyuan Wu, Yuyu Yin

Viaarxiv icon

PolyVoice: Language Models for Speech to Speech Translation

Add code
Bookmark button
Alert button
Jun 13, 2023
Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang

Figure 1 for PolyVoice: Language Models for Speech to Speech Translation
Figure 2 for PolyVoice: Language Models for Speech to Speech Translation
Figure 3 for PolyVoice: Language Models for Speech to Speech Translation
Figure 4 for PolyVoice: Language Models for Speech to Speech Translation
Viaarxiv icon

Mispronunciation detection using self-supervised speech representations

Jul 30, 2023
Jazmin Vidal, Pablo Riera, Luciana Ferrer

Figure 1 for Mispronunciation detection using self-supervised speech representations
Figure 2 for Mispronunciation detection using self-supervised speech representations
Figure 3 for Mispronunciation detection using self-supervised speech representations
Viaarxiv icon

Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI's Whisper

Add code
Bookmark button
Alert button
Sep 01, 2023
Tomasz Wojnar, Jaroslaw Hryszko, Adam Roman

Figure 1 for Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI's Whisper
Figure 2 for Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI's Whisper
Figure 3 for Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI's Whisper
Figure 4 for Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI's Whisper
Viaarxiv icon

ChiSCor: A Corpus of Freely Told Fantasy Stories by Dutch Children for Computational Linguistics and Cognitive Science

Oct 31, 2023
Bram M. A. van Dijk, Max J. van Duijn, Suzan Verberne, Marco R. Spruit

Viaarxiv icon