Alert button

"speech recognition": models, code, and papers
Alert button

LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Oct 07, 2023
Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu

Figure 1 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 2 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 3 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Figure 4 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Viaarxiv icon

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network

Oct 04, 2023
Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

Figure 1 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 2 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 3 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Figure 4 for UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
Viaarxiv icon

Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization

Sep 29, 2023
Alexandra Antonova

Viaarxiv icon

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Sep 15, 2023
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

Figure 1 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 2 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 3 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 4 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Viaarxiv icon

The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning

Sep 29, 2023
Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews

Viaarxiv icon

Fast Word Error Rate Estimation Using Self-Supervised Representations For Speech And Text

Oct 12, 2023
Chanho Park, Chengsong Lu, Mingjie Chen, Thomas Hain

Viaarxiv icon

calamanCy: A Tagalog Natural Language Processing Toolkit

Nov 13, 2023
Lester James V. Miranda

Figure 1 for calamanCy: A Tagalog Natural Language Processing Toolkit
Figure 2 for calamanCy: A Tagalog Natural Language Processing Toolkit
Figure 3 for calamanCy: A Tagalog Natural Language Processing Toolkit
Figure 4 for calamanCy: A Tagalog Natural Language Processing Toolkit
Viaarxiv icon

Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

Sep 16, 2023
Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee

Figure 1 for Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints
Figure 2 for Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints
Figure 3 for Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints
Figure 4 for Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints
Viaarxiv icon

SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

Oct 07, 2023
Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie

Figure 1 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 2 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 3 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Figure 4 for SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Viaarxiv icon

Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

May 08, 2023
Jeong Hun Yeo, Minsu Kim, Yong Man Ro

Figure 1 for Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Figure 2 for Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Figure 3 for Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Figure 4 for Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Viaarxiv icon