Alert button

"speech recognition": models, code, and papers
Alert button

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

Jul 11, 2022
Naoki Makishima, Satoshi Suzuki, Atsushi Ando, Ryo Masumura

Figure 1 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Figure 2 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Figure 3 for Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Viaarxiv icon

Contaminated speech training methods for robust DNN-HMM distant speech recognition

Add code
Bookmark button
Alert button
Oct 10, 2017
Mirco Ravanelli, Maurizio Omologo

Figure 1 for Contaminated speech training methods for robust DNN-HMM distant speech recognition
Figure 2 for Contaminated speech training methods for robust DNN-HMM distant speech recognition
Figure 3 for Contaminated speech training methods for robust DNN-HMM distant speech recognition
Figure 4 for Contaminated speech training methods for robust DNN-HMM distant speech recognition
Viaarxiv icon

End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting

Jul 17, 2022
Thierry Desot, François Portet, Michel Vacher

Figure 1 for End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Figure 2 for End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Figure 3 for End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Figure 4 for End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Viaarxiv icon

Twin Regularization for online speech recognition

Add code
Bookmark button
Alert button
Jun 12, 2018
Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio

Figure 1 for Twin Regularization for online speech recognition
Figure 2 for Twin Regularization for online speech recognition
Figure 3 for Twin Regularization for online speech recognition
Figure 4 for Twin Regularization for online speech recognition
Viaarxiv icon

Bilingual End-to-End ASR with Byte-Level Subwords

May 01, 2022
Liuhui Deng, Roger Hsiao, Arnab Ghoshal

Figure 1 for Bilingual End-to-End ASR with Byte-Level Subwords
Figure 2 for Bilingual End-to-End ASR with Byte-Level Subwords
Figure 3 for Bilingual End-to-End ASR with Byte-Level Subwords
Figure 4 for Bilingual End-to-End ASR with Byte-Level Subwords
Viaarxiv icon

Multi-Stream End-to-End Speech Recognition

Jun 17, 2019
Ruizhi Li, Xiaofei Wang, Sri Harish Mallidi, Shinji Watanabe, Takaaki Hori, Hynek Hermansky

Figure 1 for Multi-Stream End-to-End Speech Recognition
Figure 2 for Multi-Stream End-to-End Speech Recognition
Figure 3 for Multi-Stream End-to-End Speech Recognition
Figure 4 for Multi-Stream End-to-End Speech Recognition
Viaarxiv icon

Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages

Mar 28, 2022
Sneha Das, Nicklas Leander Lund, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line H. Clemmensen

Figure 1 for Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages
Figure 2 for Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages
Figure 3 for Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages
Figure 4 for Continuous Metric Learning For Transferable Speech Emotion Recognition and Embedding Across Low-resource Languages
Viaarxiv icon

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

Oct 16, 2019
Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze

Figure 1 for Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition
Figure 2 for Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition
Figure 3 for Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition
Figure 4 for Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition
Viaarxiv icon

Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

Nov 20, 2019
Bhavya Ghai, Buvana Ramanan, Klaus Mueller

Figure 1 for Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?
Figure 2 for Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?
Viaarxiv icon

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages

Add code
Bookmark button
Alert button
Aug 26, 2022
Kaushal Santosh Bhogale, Abhigyan Raman, Tahir Javed, Sumanth Doddapaneni, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

Figure 1 for Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages
Figure 2 for Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages
Figure 3 for Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages
Figure 4 for Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages
Viaarxiv icon