Alert button

"speech recognition": models, code, and papers
Alert button

Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

Sep 15, 2023
Tiantian Feng, Shrikanth Narayanan

Figure 1 for Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Figure 2 for Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Figure 3 for Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Figure 4 for Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Viaarxiv icon

Can Whisper perform speech-based in-context learning

Add code
Bookmark button
Alert button
Sep 13, 2023
Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

Figure 1 for Can Whisper perform speech-based in-context learning
Figure 2 for Can Whisper perform speech-based in-context learning
Figure 3 for Can Whisper perform speech-based in-context learning
Figure 4 for Can Whisper perform speech-based in-context learning
Viaarxiv icon

ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction

Add code
Bookmark button
Alert button
Oct 08, 2023
Jiajun He, Zekun Yang, Tomoki Toda

Figure 1 for ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction
Figure 2 for ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction
Figure 3 for ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction
Figure 4 for ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction
Viaarxiv icon

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus

Add code
Bookmark button
Alert button
Sep 19, 2023
Yi-Wei Wang, Ke-Han Lu, Kuan-Yu Chen

Figure 1 for HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
Figure 2 for HypR: A comprehensive study for ASR hypothesis revising with a reference corpus
Viaarxiv icon

Federated Self-Learning with Weak Supervision for Speech Recognition

Jun 21, 2023
Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

Figure 1 for Federated Self-Learning with Weak Supervision for Speech Recognition
Figure 2 for Federated Self-Learning with Weak Supervision for Speech Recognition
Figure 3 for Federated Self-Learning with Weak Supervision for Speech Recognition
Figure 4 for Federated Self-Learning with Weak Supervision for Speech Recognition
Viaarxiv icon

Multi-Head State Space Model for Speech Recognition

May 25, 2023
Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

Figure 1 for Multi-Head State Space Model for Speech Recognition
Figure 2 for Multi-Head State Space Model for Speech Recognition
Figure 3 for Multi-Head State Space Model for Speech Recognition
Figure 4 for Multi-Head State Space Model for Speech Recognition
Viaarxiv icon

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Jun 28, 2023
Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

Figure 1 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 2 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 3 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Figure 4 for Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Viaarxiv icon

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Add code
Bookmark button
Alert button
Sep 10, 2023
Harunori Kawano, Sota Shimizu

Figure 1 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 2 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 3 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Figure 4 for An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Viaarxiv icon

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Add code
Bookmark button
Alert button
Jun 01, 2023
Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

Figure 1 for Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Figure 2 for Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Figure 3 for Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Figure 4 for Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts
Viaarxiv icon

Iteratively Improving Speech Recognition and Voice Conversion

Add code
Bookmark button
Alert button
May 24, 2023
Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki

Figure 1 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 2 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 3 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 4 for Iteratively Improving Speech Recognition and Voice Conversion
Viaarxiv icon