Alert button

"speech recognition": models, code, and papers
Alert button

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

Add code
Bookmark button
Alert button
Oct 09, 2023
Pavel Denisov, Ngoc Thang Vu

Viaarxiv icon

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Add code
Bookmark button
Alert button
Sep 15, 2023
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

Figure 1 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 2 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 3 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 4 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Viaarxiv icon

A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

Add code
Bookmark button
Alert button
May 19, 2023
Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney

Figure 1 for A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Figure 2 for A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Figure 3 for A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Figure 4 for A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Viaarxiv icon

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

Add code
Bookmark button
Alert button
Sep 29, 2023
Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen

Viaarxiv icon

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Oct 05, 2023
Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

Figure 1 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 2 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 3 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 4 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Viaarxiv icon

Adapting the adapters for code-switching in multilingual ASR

Add code
Bookmark button
Alert button
Oct 11, 2023
Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Aldarmaki

Viaarxiv icon

Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization

Sep 29, 2023
Alexandra Antonova

Viaarxiv icon

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Add code
Bookmark button
Alert button
Jun 05, 2023
Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed

Figure 1 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Figure 2 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Figure 3 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Viaarxiv icon

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Add code
Bookmark button
Alert button
Oct 16, 2023
Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

Figure 1 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 2 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 3 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 4 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Viaarxiv icon

Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model

Add code
Bookmark button
Alert button
Oct 16, 2023
Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. JR Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela Wiepert, David T. Jones, Hugo Botha

Viaarxiv icon