Alert button

"speech recognition": models, code, and papers
Alert button

DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input

Aug 22, 2022
Jun Rekimoto

Figure 1 for DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input
Figure 2 for DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input
Figure 3 for DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input
Figure 4 for DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input
Viaarxiv icon

Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition

Sep 08, 2021
Maxime Burchi, Valentin Vielzeuf

Figure 1 for Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Figure 2 for Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Figure 3 for Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Figure 4 for Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Viaarxiv icon

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Nov 03, 2022
Li Li, Dongxing Xu, Haoran Wei, Yanhua Long

Figure 1 for Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Figure 2 for Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Figure 3 for Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Figure 4 for Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Viaarxiv icon

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT

Feb 20, 2021
Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Figure 1 for Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT
Figure 2 for Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT
Figure 3 for Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT
Figure 4 for Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT
Viaarxiv icon

MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition

Oct 16, 2019
Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe

Figure 1 for MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition
Figure 2 for MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition
Figure 3 for MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition
Figure 4 for MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition
Viaarxiv icon

Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition

Nov 04, 2020
Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Figure 1 for Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Figure 2 for Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Figure 3 for Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Figure 4 for Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Viaarxiv icon

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Nov 17, 2022
Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu

Figure 1 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 2 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 3 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Figure 4 for Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Viaarxiv icon

Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop

Nov 02, 2022
Md Tahmid Rahman Laskar, Cheng Chen, Xue-Yong Fu, Shashi Bhushan TN

Figure 1 for Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop
Figure 2 for Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop
Figure 3 for Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop
Figure 4 for Improving Named Entity Recognition in Telephone Conversations via Effective Active Learning with Human in the Loop
Viaarxiv icon

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

Mar 09, 2023
Xize Cheng, Linjun Li, Tao Jin, Rongjie Huang, Wang Lin, Zehan Wang, Huangdai Liu, Ye Wang, Aoxiong Yin, Zhou Zhao

Figure 1 for MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Figure 2 for MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Figure 3 for MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Figure 4 for MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Viaarxiv icon

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

Oct 07, 2021
Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

Figure 1 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 2 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 3 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Figure 4 for Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition
Viaarxiv icon