Alert button

"speech": models, code, and papers
Alert button

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

Add code
Bookmark button
Alert button
Apr 27, 2021
William Chan, Daniel Park, Chris Lee, Yu Zhang, Quoc Le, Mohammad Norouzi

Figure 1 for SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
Figure 2 for SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
Viaarxiv icon

End-to-End Speech Recognition and Disfluency Removal

Add code
Bookmark button
Alert button
Sep 28, 2020
Paria Jamshid Lou, Mark Johnson

Figure 1 for End-to-End Speech Recognition and Disfluency Removal
Figure 2 for End-to-End Speech Recognition and Disfluency Removal
Figure 3 for End-to-End Speech Recognition and Disfluency Removal
Figure 4 for End-to-End Speech Recognition and Disfluency Removal
Viaarxiv icon

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

Jul 04, 2021
Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima

Figure 1 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 2 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 3 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Figure 4 for Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
Viaarxiv icon

Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

Oct 07, 2021
Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer

Figure 1 for Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Figure 2 for Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Figure 3 for Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Figure 4 for Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Viaarxiv icon

SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement

Add code
Bookmark button
Alert button
Jun 13, 2020
Luka Chkhetiani, Levan Bejanidze

Figure 1 for SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement
Viaarxiv icon

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

Add code
Bookmark button
Alert button
Aug 09, 2020
Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, Tie-Yan Liu

Figure 1 for LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Figure 2 for LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Figure 3 for LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Figure 4 for LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Viaarxiv icon

LSSED: a large-scale dataset and benchmark for speech emotion recognition

Add code
Bookmark button
Alert button
Jan 30, 2021
Weiquan Fan, Xiangmin Xu, Xiaofen Xing, Weidong Chen, Dongyan Huang

Figure 1 for LSSED: a large-scale dataset and benchmark for speech emotion recognition
Figure 2 for LSSED: a large-scale dataset and benchmark for speech emotion recognition
Figure 3 for LSSED: a large-scale dataset and benchmark for speech emotion recognition
Figure 4 for LSSED: a large-scale dataset and benchmark for speech emotion recognition
Viaarxiv icon

Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

Add code
Bookmark button
Alert button
Apr 07, 2022
Mu Yang, Kevin Hirschi, Stephen D. Looney, Okim Kang, John H. L. Hansen

Figure 1 for Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Figure 2 for Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Figure 3 for Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Figure 4 for Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Viaarxiv icon

Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

Add code
Bookmark button
Alert button
Mar 12, 2021
Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov, Ivan Medennikov, Yuri Matveev

Figure 1 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 2 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 3 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Figure 4 for Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
Viaarxiv icon

Data-augmented cross-lingual synthesis in a teacher-student framework

Add code
Bookmark button
Alert button
Mar 31, 2022
Marcel de Korte, Jaebok Kim, Aki Kunikoshi, Adaeze Adigwe, Esther Klabbers

Figure 1 for Data-augmented cross-lingual synthesis in a teacher-student framework
Figure 2 for Data-augmented cross-lingual synthesis in a teacher-student framework
Figure 3 for Data-augmented cross-lingual synthesis in a teacher-student framework
Figure 4 for Data-augmented cross-lingual synthesis in a teacher-student framework
Viaarxiv icon