Alert button

"speech recognition": models, code, and papers
Alert button

CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting

Sep 18, 2023
Yuang Li, Yinglu Li, Min Zhang, Chang Su, Mengyao Piao, Xiaosong Qiao, Jiawei Yu, Miaomiao Ma, Yanqing Zhao, Hao Yang

Figure 1 for CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
Figure 2 for CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
Figure 3 for CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
Figure 4 for CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
Viaarxiv icon

Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation

Jun 02, 2023
Hanbyul Kim, Seunghyun Seo, Lukas Lee, Seolki Baek

Figure 1 for Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
Figure 2 for Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
Figure 3 for Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
Figure 4 for Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation
Viaarxiv icon

MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Feb 27, 2023
Yoohwan Kwon, Soo-Whan Chung

Figure 1 for MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Figure 2 for MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Figure 3 for MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Figure 4 for MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Viaarxiv icon

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Aug 22, 2023
Harunori Kawano, Sota Shimizu

Figure 1 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 2 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 3 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 4 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Viaarxiv icon

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

Sep 25, 2023
Zijian Yang, Wei Zhou, Ralf Schlüter, Hermann Ney

Figure 1 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 2 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 3 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Figure 4 for On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
Viaarxiv icon

Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR

Sep 22, 2023
Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan "Honza" Silovsky

Viaarxiv icon

Accelerator-Aware Training for Transducer-Based Speech Recognition

May 12, 2023
Suhaila M. Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen, Raviteja Chinta, Tariq Afzal, Nathan Susanj, Athanasios Mouchtaris, Grant P. Strimel, Ariya Rastrow

Figure 1 for Accelerator-Aware Training for Transducer-Based Speech Recognition
Figure 2 for Accelerator-Aware Training for Transducer-Based Speech Recognition
Figure 3 for Accelerator-Aware Training for Transducer-Based Speech Recognition
Figure 4 for Accelerator-Aware Training for Transducer-Based Speech Recognition
Viaarxiv icon

RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation

Sep 29, 2023
Samuel Pegg, Kai Li, Xiaolin Hu

Figure 1 for RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation
Figure 2 for RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation
Figure 3 for RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation
Figure 4 for RTFS-Net: Recurrent time-frequency modelling for efficient audio-visual speech separation
Viaarxiv icon

The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains

Sep 21, 2023
Reza Khanmohammadi, Tuka Alhanai, Mohammad M. Ghassemi

Viaarxiv icon

Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling

Sep 21, 2023
Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed

Figure 1 for Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Figure 2 for Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Figure 3 for Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Viaarxiv icon