Alert button

"speech recognition": models, code, and papers
Alert button

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

Oct 16, 2023
Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

Figure 1 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 2 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 3 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Figure 4 for Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Viaarxiv icon

Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model

Oct 16, 2023
Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. JR Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela Wiepert, David T. Jones, Hugo Botha

Viaarxiv icon

Confidence-based Ensembles of End-to-End Speech Recognition Models

Jun 27, 2023
Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg

Figure 1 for Confidence-based Ensembles of End-to-End Speech Recognition Models
Figure 2 for Confidence-based Ensembles of End-to-End Speech Recognition Models
Figure 3 for Confidence-based Ensembles of End-to-End Speech Recognition Models
Figure 4 for Confidence-based Ensembles of End-to-End Speech Recognition Models
Viaarxiv icon

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Sep 10, 2023
Harunori Kawano, Sota Shimizu

Figure 1 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 2 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 3 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 4 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Viaarxiv icon

Adapting the adapters for code-switching in multilingual ASR

Oct 11, 2023
Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Aldarmaki

Viaarxiv icon

ResidualTransformer: Residual Low-rank Learning with Weight-sharing for Transformer Layers

Oct 03, 2023
Yiming Wang, Jinyu Li

Viaarxiv icon

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models

Oct 04, 2023
Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

Viaarxiv icon

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Oct 05, 2023
Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

Figure 1 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 2 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 3 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Figure 4 for DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Viaarxiv icon

Rehearsal-Free Online Continual Learning for Automatic Speech Recognition

Jun 19, 2023
Steven Vander Eeckt, Hugo Van hamme

Figure 1 for Rehearsal-Free Online Continual Learning for Automatic Speech Recognition
Figure 2 for Rehearsal-Free Online Continual Learning for Automatic Speech Recognition
Viaarxiv icon

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

Sep 29, 2023
Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen

Viaarxiv icon