Alert button

"speech": models, code, and papers
Alert button

Generalization Ability of MOS Prediction Networks

Oct 18, 2021
Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi

Figure 1 for Generalization Ability of MOS Prediction Networks
Figure 2 for Generalization Ability of MOS Prediction Networks
Figure 3 for Generalization Ability of MOS Prediction Networks
Figure 4 for Generalization Ability of MOS Prediction Networks
Viaarxiv icon

Speaker Generation

Nov 07, 2021
Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao

Figure 1 for Speaker Generation
Figure 2 for Speaker Generation
Figure 3 for Speaker Generation
Figure 4 for Speaker Generation
Viaarxiv icon

ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment

Nov 26, 2019
Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Dan Li, Zhongping Yang, Xiping Wu, Yi Lin

Figure 1 for ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment
Figure 2 for ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment
Figure 3 for ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment
Figure 4 for ATCSpeech: a multilingual pilot-controller speech corpus from real Air Traffic Control environment
Viaarxiv icon

Looking Enhances Listening: Recovering Missing Speech Using Images

Feb 13, 2020
Tejas Srinivasan, Ramon Sanabria, Florian Metze

Figure 1 for Looking Enhances Listening: Recovering Missing Speech Using Images
Figure 2 for Looking Enhances Listening: Recovering Missing Speech Using Images
Figure 3 for Looking Enhances Listening: Recovering Missing Speech Using Images
Figure 4 for Looking Enhances Listening: Recovering Missing Speech Using Images
Viaarxiv icon

Towards Building ASR Systems for the Next Billion Users

Nov 06, 2021
Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

Figure 1 for Towards Building ASR Systems for the Next Billion Users
Figure 2 for Towards Building ASR Systems for the Next Billion Users
Figure 3 for Towards Building ASR Systems for the Next Billion Users
Figure 4 for Towards Building ASR Systems for the Next Billion Users
Viaarxiv icon

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Apr 25, 2022
Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han

Figure 1 for Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Figure 2 for Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Figure 3 for Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Figure 4 for Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Viaarxiv icon

Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training

Oct 08, 2019
Qiao Cheng, Meiyuan Fang, Yaqian Han, Jin Huang, Yitao Duan

Figure 1 for Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
Figure 2 for Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
Figure 3 for Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
Figure 4 for Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training
Viaarxiv icon

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

Jan 15, 2020
Haoran Miao, Gaofeng Cheng, Changfeng Gao, Pengyuan Zhang, Yonghong Yan

Figure 1 for Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Figure 2 for Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Figure 3 for Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Figure 4 for Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture
Viaarxiv icon

Separation Guided Speaker Diarization in Realistic Mismatched Conditions

Jul 06, 2021
Shu-Tong Niu, Jun Du, Lei Sun, Chin-Hui Lee

Figure 1 for Separation Guided Speaker Diarization in Realistic Mismatched Conditions
Figure 2 for Separation Guided Speaker Diarization in Realistic Mismatched Conditions
Figure 3 for Separation Guided Speaker Diarization in Realistic Mismatched Conditions
Figure 4 for Separation Guided Speaker Diarization in Realistic Mismatched Conditions
Viaarxiv icon

Survey on Deep Neural Networks in Speech and Vision Systems

Aug 16, 2019
Mahbubul Alam, Manar D. Samad, Lasitha Vidyaratne, Alexander Glandon, Khan M. Iftekharuddin

Figure 1 for Survey on Deep Neural Networks in Speech and Vision Systems
Figure 2 for Survey on Deep Neural Networks in Speech and Vision Systems
Figure 3 for Survey on Deep Neural Networks in Speech and Vision Systems
Figure 4 for Survey on Deep Neural Networks in Speech and Vision Systems
Viaarxiv icon