Audio Visual Speech Recognition


Audio visual speech recognition is the process of recognizing speech from both audio and visual cues.

OCR-Enhanced Multimodal ASR Can Read While Listening

Add code
Jan 26, 2026
Viaarxiv icon

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances

Add code
Jan 13, 2026
Viaarxiv icon

MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

Add code
Jan 14, 2026
Viaarxiv icon

Scalable Frameworks for Real-World Audio-Visual Speech Recognition

Add code
Dec 16, 2025
Figure 1 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 2 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 3 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Figure 4 for Scalable Frameworks for Real-World Audio-Visual Speech Recognition
Viaarxiv icon

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

Add code
Nov 10, 2025
Viaarxiv icon

Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMS

Add code
Oct 26, 2025
Viaarxiv icon

Interpreting the Role of Visemes in Audio-Visual Speech Recognition

Add code
Sep 19, 2025
Viaarxiv icon

Improving Noise Robust Audio-Visual Speech Recognition via Router-Gated Cross-Modal Feature Fusion

Add code
Aug 26, 2025
Viaarxiv icon

AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition

Add code
Aug 11, 2025
Viaarxiv icon