Alert button

"speech recognition": models, code, and papers
Alert button

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction

Jan 24, 2024
Jiajun He, Xiaohan Shi, Xingfeng Li, Tomoki Toda

Viaarxiv icon

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Add code
Bookmark button
Alert button
Feb 23, 2024
Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

Viaarxiv icon

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Jan 11, 2024
Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang

Viaarxiv icon

Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking

Mar 13, 2024
Ming Dong, Yujing Chen, Miao Zhang, Hao Sun, Tingting He

Figure 1 for Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Figure 2 for Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Figure 3 for Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Figure 4 for Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
Viaarxiv icon

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data

Dec 15, 2023
Hendrik Laux, Emil Mededovic, Ahmed Hallawa, Lukas Martin, Arne Peine, Anke Schmeink

Figure 1 for LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Figure 2 for LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Figure 3 for LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Figure 4 for LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Viaarxiv icon

Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification?

Jan 10, 2024
Changye Li, Weizhe Xu, Trevor Cohen, Serguei Pakhomov

Viaarxiv icon

CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition

Jan 04, 2024
Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin

Viaarxiv icon

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

Jan 11, 2024
Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Viaarxiv icon

HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention

Add code
Bookmark button
Alert button
Feb 22, 2024
Shuang Chen, Amir Atapour-Abarghouei, Hubert P. H. Shum

Viaarxiv icon

AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents

Feb 05, 2024
Abraham Toluwase Owodunni, Aditya Yadavalli, Chris Chinenye Emezue, Tobi Olatunji, Clinton C Mbataku

Viaarxiv icon