Alert button

"speech recognition": models, code, and papers
Alert button

How does end-to-end speech recognition training impact speech enhancement artifacts?

Nov 20, 2023
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

Viaarxiv icon

On Speech Pre-emphasis as a Simple and Inexpensive Method to Boost Speech Enhancement

Jan 17, 2024
Iván López-Espejo, Aditya Joglekar, Antonio M. Peinado, Jesper Jensen

Viaarxiv icon

Instruction-Following Speech Recognition

Sep 18, 2023
Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

Figure 1 for Instruction-Following Speech Recognition
Figure 2 for Instruction-Following Speech Recognition
Figure 3 for Instruction-Following Speech Recognition
Figure 4 for Instruction-Following Speech Recognition
Viaarxiv icon

Transcending Controlled Environments Assessing the Transferability of ASRRobust NLU Models to Real-World Applications

Jan 12, 2024
Hania Khan, Aleena Fatima Khalid, Zaryab Hassan

Viaarxiv icon

A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition

Nov 07, 2023
Andrei Barcovschi, Rishabh Jain, Peter Corcoran

Viaarxiv icon

Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric

Jan 20, 2024
Golara Javadi, Kamer Ali Yuksel, Yunsu Kim, Thiago Castro Ferreira, Mohamed Al-Badrashiny

Viaarxiv icon

Multi-Modal Emotion Recognition by Text, Speech and Video Using Pretrained Transformers

Feb 11, 2024
Minoo Shayaninasab, Bagher Babaali

Viaarxiv icon

Adapting OpenAI's Whisper for Speech Recognition on Code-Switch Mandarin-English SEAME and ASRU2019 Datasets

Nov 29, 2023
Yuhang Yang, Yizhou Peng, Xionghu Zhong, Hao Huang, Eng Siong Chng

Viaarxiv icon

Using Large Language Model for End-to-End Chinese ASR and NER

Jan 21, 2024
Yuang Li, Jiawei Yu, Yanqing Zhao, Min Zhang, Mengxin Ren, Xiaofeng Zhao, Xiaosong Qiao, Chang Su, Miaomiao Ma, Hao Yang

Viaarxiv icon

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

Oct 16, 2023
Zhihong Lei, Ernest Pusateri, Shiyi Han, Leo Liu, Mingbin Xu, Tim Ng, Ruchir Travadi, Youyuan Zhang, Mirko Hannemann, Man-Hung Siu, Zhen Huang

Viaarxiv icon