Alert button

"speech recognition": models, code, and papers
Alert button

End-to-End Multi-Person Audio/Visual Automatic Speech Recognition

May 11, 2022
Otavio Braga, Takaki Makino, Olivier Siohan, Hank Liao

Figure 1 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 2 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 3 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Figure 4 for End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Viaarxiv icon

Silent versus modal multi-speaker speech recognition from ultrasound and video

Feb 27, 2021
Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

Figure 1 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 2 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 3 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Figure 4 for Silent versus modal multi-speaker speech recognition from ultrasound and video
Viaarxiv icon

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

Jul 15, 2022
Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

Figure 1 for Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments
Figure 2 for Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments
Viaarxiv icon

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

Mar 01, 2023
Rao Ma, Mark J F Gales, Kate Knill, Mengjie Qian

Figure 1 for N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Figure 2 for N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Figure 3 for N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Figure 4 for N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Viaarxiv icon

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

Mar 01, 2022
Yufeng Yang, Peidong Wang, DeLiang Wang

Figure 1 for A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Figure 2 for A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Figure 3 for A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Figure 4 for A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Viaarxiv icon

End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study

Feb 19, 2021
Prashanth Gurunath Shivakumar, Shrikanth Narayanan

Figure 1 for End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Figure 2 for End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Figure 3 for End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Figure 4 for End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Viaarxiv icon

Collaborative Training of Acoustic Encoders for Speech Recognition

Jul 13, 2021
Varun Nagaraja, Yangyang Shi, Ganesh Venkatesh, Ozlem Kalinli, Michael L. Seltzer, Vikas Chandra

Figure 1 for Collaborative Training of Acoustic Encoders for Speech Recognition
Figure 2 for Collaborative Training of Acoustic Encoders for Speech Recognition
Figure 3 for Collaborative Training of Acoustic Encoders for Speech Recognition
Viaarxiv icon

Differentiable Allophone Graphs for Language-Universal Speech Recognition

Jul 24, 2021
Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe

Figure 1 for Differentiable Allophone Graphs for Language-Universal Speech Recognition
Figure 2 for Differentiable Allophone Graphs for Language-Universal Speech Recognition
Figure 3 for Differentiable Allophone Graphs for Language-Universal Speech Recognition
Figure 4 for Differentiable Allophone Graphs for Language-Universal Speech Recognition
Viaarxiv icon

A Token-Wise Beam Search Algorithm for RNN-T

Feb 28, 2023
Gil Keren

Figure 1 for A Token-Wise Beam Search Algorithm for RNN-T
Figure 2 for A Token-Wise Beam Search Algorithm for RNN-T
Figure 3 for A Token-Wise Beam Search Algorithm for RNN-T
Figure 4 for A Token-Wise Beam Search Algorithm for RNN-T
Viaarxiv icon

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

Mar 29, 2022
Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

Figure 1 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Figure 2 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Figure 3 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Figure 4 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Viaarxiv icon