Alert button

"speech recognition": models, code, and papers
Alert button

Investigating the Emergent Audio Classification Ability of ASR Foundation Models

Nov 15, 2023
Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill

Figure 1 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 2 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 3 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 4 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Viaarxiv icon

TST: Time-Sparse Transducer for Automatic Speech Recognition

Jul 17, 2023
Xiaohui Zhang, Mangui Liang, Zhengkun Tian, Jiangyan Yi, Jianhua Tao

Figure 1 for TST: Time-Sparse Transducer for Automatic Speech Recognition
Figure 2 for TST: Time-Sparse Transducer for Automatic Speech Recognition
Figure 3 for TST: Time-Sparse Transducer for Automatic Speech Recognition
Figure 4 for TST: Time-Sparse Transducer for Automatic Speech Recognition
Viaarxiv icon

Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training

Add code
Bookmark button
Alert button
Jul 24, 2023
Gege Qi, Yuefeng Chen, Xiaofeng Mao, Xiaojun Jia, Ranjie Duan, Rong Zhang, Hui Xue

Figure 1 for Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training
Figure 2 for Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training
Figure 3 for Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training
Viaarxiv icon

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

Add code
Bookmark button
Alert button
Aug 12, 2023
Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

Figure 1 for Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
Figure 2 for Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
Figure 3 for Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
Figure 4 for Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
Viaarxiv icon

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Jul 13, 2023
Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Figure 1 for Personalization for BERT-based Discriminative Speech Recognition Rescoring
Figure 2 for Personalization for BERT-based Discriminative Speech Recognition Rescoring
Figure 3 for Personalization for BERT-based Discriminative Speech Recognition Rescoring
Figure 4 for Personalization for BERT-based Discriminative Speech Recognition Rescoring
Viaarxiv icon

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR

Oct 31, 2023
Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Figure 1 for RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
Figure 2 for RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
Figure 3 for RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
Figure 4 for RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR
Viaarxiv icon

Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure

Jun 14, 2023
Weidong Ji, Shijie Zan, Guohui Zhou, Xu Wang

Figure 1 for Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Figure 2 for Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Figure 3 for Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Figure 4 for Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure
Viaarxiv icon

Multi-channel Conversational Speaker Separation via Neural Diarization

Nov 15, 2023
Hassan Taherian, DeLiang Wang

Figure 1 for Multi-channel Conversational Speaker Separation via Neural Diarization
Figure 2 for Multi-channel Conversational Speaker Separation via Neural Diarization
Figure 3 for Multi-channel Conversational Speaker Separation via Neural Diarization
Figure 4 for Multi-channel Conversational Speaker Separation via Neural Diarization
Viaarxiv icon

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

May 21, 2023
Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

Figure 1 for Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Figure 2 for Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Figure 3 for Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Figure 4 for Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network
Viaarxiv icon

Whisper in Focus: Enhancing Stuttered Speech Classification with Encoder Layer Optimization

Nov 09, 2023
Huma Ameer, Seemab Latif, Rabia Latif, Sana Mukhtar

Viaarxiv icon