Alert button

"speech": models, code, and papers
Alert button

Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon

Feb 15, 2021
Hadi Veisi, Hawre Hosseini, Mohammad Mohammadamini, Wirya Fathy, Aso Mahmudi

Figure 1 for Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Figure 2 for Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Figure 3 for Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Figure 4 for Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Viaarxiv icon

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Add code
Bookmark button
Alert button
Oct 09, 2021
Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin

Figure 1 for Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Figure 2 for Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Figure 3 for Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Figure 4 for Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Viaarxiv icon

"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World

Sep 20, 2021
Emily Wenger, Max Bronckers, Christian Cianfarani, Jenna Cryan, Angela Sha, Haitao Zheng, Ben Y. Zhao

Figure 1 for "Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Figure 2 for "Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Figure 3 for "Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Figure 4 for "Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Viaarxiv icon

Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection

May 10, 2022
Otavio Braga, Olivier Siohan

Figure 1 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Figure 2 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Figure 3 for Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Viaarxiv icon

NORESQA -- A Framework for Speech Quality Assessment using Non-Matching References

Add code
Bookmark button
Alert button
Sep 16, 2021
Pranay Manocha, Buye Xu, Anurag Kumar

Figure 1 for NORESQA -- A Framework for Speech Quality Assessment using Non-Matching References
Figure 2 for NORESQA -- A Framework for Speech Quality Assessment using Non-Matching References
Figure 3 for NORESQA -- A Framework for Speech Quality Assessment using Non-Matching References
Figure 4 for NORESQA -- A Framework for Speech Quality Assessment using Non-Matching References
Viaarxiv icon

Signal inpainting from Fourier magnitudes

Add code
Bookmark button
Alert button
Oct 28, 2022
Louis Bahrman, Marina Krémé, Paul Magron, Antoine Deleforge

Figure 1 for Signal inpainting from Fourier magnitudes
Figure 2 for Signal inpainting from Fourier magnitudes
Figure 3 for Signal inpainting from Fourier magnitudes
Figure 4 for Signal inpainting from Fourier magnitudes
Viaarxiv icon

Speech BERT Embedding For Improving Prosody in Neural TTS

Add code
Bookmark button
Alert button
Jun 15, 2021
Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He

Figure 1 for Speech BERT Embedding For Improving Prosody in Neural TTS
Figure 2 for Speech BERT Embedding For Improving Prosody in Neural TTS
Figure 3 for Speech BERT Embedding For Improving Prosody in Neural TTS
Figure 4 for Speech BERT Embedding For Improving Prosody in Neural TTS
Viaarxiv icon

Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching

Dec 19, 2021
Chia-Yu Li, Ngoc Thang Vu

Figure 1 for Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching
Figure 2 for Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching
Figure 3 for Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching
Figure 4 for Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching
Viaarxiv icon

Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation

Add code
Bookmark button
Alert button
Oct 24, 2022
Chantal Amrhein, Barry Haddow

Figure 1 for Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation
Figure 2 for Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation
Figure 3 for Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation
Figure 4 for Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text Translation
Viaarxiv icon

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Add code
Bookmark button
Alert button
Oct 25, 2021
Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

Figure 1 for DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Figure 2 for DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Figure 3 for DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Figure 4 for DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Viaarxiv icon