Alert button

"speech recognition": models, code, and papers
Alert button

ResidualTransformer: Residual Low-rank Learning with Weight-sharing for Transformer Layers

Oct 03, 2023
Yiming Wang, Jinyu Li

Viaarxiv icon

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

Add code
Bookmark button
Alert button
Oct 09, 2023
Pavel Denisov, Ngoc Thang Vu

Viaarxiv icon

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models

Add code
Bookmark button
Alert button
Oct 04, 2023
Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

Viaarxiv icon

GNCformer Enhanced Self-attention for Automatic Speech Recognition

May 22, 2023
J. Li, Z. Duan, S. Li, X. Yu, G. Yang

Figure 1 for GNCformer Enhanced Self-attention for Automatic Speech Recognition
Figure 2 for GNCformer Enhanced Self-attention for Automatic Speech Recognition
Figure 3 for GNCformer Enhanced Self-attention for Automatic Speech Recognition
Figure 4 for GNCformer Enhanced Self-attention for Automatic Speech Recognition
Viaarxiv icon

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Add code
Bookmark button
Alert button
Sep 15, 2023
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

Figure 1 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 2 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 3 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 4 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Viaarxiv icon

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Oct 05, 2023
Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

Figure 1 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 2 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 3 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 4 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Viaarxiv icon

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

Add code
Bookmark button
Alert button
Sep 29, 2023
Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen

Viaarxiv icon

Adapting the adapters for code-switching in multilingual ASR

Add code
Bookmark button
Alert button
Oct 11, 2023
Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Aldarmaki

Viaarxiv icon

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Add code
Bookmark button
Alert button
Oct 16, 2023
Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

Figure 1 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 2 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 3 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 4 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Viaarxiv icon

Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model

Add code
Bookmark button
Alert button
Oct 16, 2023
Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. JR Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela Wiepert, David T. Jones, Hugo Botha

Viaarxiv icon