Alert button

"speech recognition": models, code, and papers
Alert button

Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition

Add code
Bookmark button
Alert button
Jul 27, 2023
Tian-Hao Zhang, Dinghao Zhou, Guiping Zhong, Baoxiang Li

Figure 1 for Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Figure 2 for Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Figure 3 for Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Figure 4 for Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Viaarxiv icon

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

Add code
Bookmark button
Alert button
Oct 06, 2022
Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg

Figure 1 for Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Figure 2 for Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Figure 3 for Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Viaarxiv icon

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

Add code
Bookmark button
Alert button
Mar 20, 2023
Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro

Figure 1 for Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Figure 2 for Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Figure 3 for Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Figure 4 for Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Viaarxiv icon

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

Jun 29, 2023
Simone Wills, Yu Bai, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

Figure 1 for Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
Figure 2 for Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
Figure 3 for Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
Figure 4 for Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
Viaarxiv icon

Towards Selection of Text-to-speech Data to Augment ASR Training

May 30, 2023
Shuo Liu, Leda Sarı, Chunyang Wu, Gil Keren, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli

Figure 1 for Towards Selection of Text-to-speech Data to Augment ASR Training
Figure 2 for Towards Selection of Text-to-speech Data to Augment ASR Training
Figure 3 for Towards Selection of Text-to-speech Data to Augment ASR Training
Figure 4 for Towards Selection of Text-to-speech Data to Augment ASR Training
Viaarxiv icon

Long-term Conversation Analysis: Exploring Utility and Privacy

Add code
Bookmark button
Alert button
Jun 28, 2023
Francesco Nespoli, Jule Pohlhausen, Patrick A. Naylor, Joerg Bitzer

Figure 1 for Long-term Conversation Analysis: Exploring Utility and Privacy
Figure 2 for Long-term Conversation Analysis: Exploring Utility and Privacy
Figure 3 for Long-term Conversation Analysis: Exploring Utility and Privacy
Figure 4 for Long-term Conversation Analysis: Exploring Utility and Privacy
Viaarxiv icon

Multi-blank Transducers for Speech Recognition

Add code
Bookmark button
Alert button
Nov 04, 2022
Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg

Figure 1 for Multi-blank Transducers for Speech Recognition
Figure 2 for Multi-blank Transducers for Speech Recognition
Figure 3 for Multi-blank Transducers for Speech Recognition
Figure 4 for Multi-blank Transducers for Speech Recognition
Viaarxiv icon

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Jul 07, 2023
Sara Papi, Peidong Wan, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur

Figure 1 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 2 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 3 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 4 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Viaarxiv icon

Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition

Feb 16, 2023
Minsu Kim, Hyung-Il Kim, Yong Man Ro

Figure 1 for Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Figure 2 for Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Figure 3 for Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Figure 4 for Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Viaarxiv icon

Towards End-to-end Unsupervised Speech Recognition

Add code
Bookmark button
Alert button
Apr 05, 2022
Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Figure 1 for Towards End-to-end Unsupervised Speech Recognition
Figure 2 for Towards End-to-end Unsupervised Speech Recognition
Figure 3 for Towards End-to-end Unsupervised Speech Recognition
Figure 4 for Towards End-to-end Unsupervised Speech Recognition
Viaarxiv icon