Audio Visual Speech Recognition


Audio visual speech recognition is the process of recognizing speech from both audio and visual cues.

Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty

Add code
May 30, 2026
Viaarxiv icon

MMTM: Tri-Modal Topic Modeling for Long-Form Video via Similarity-Gated Fusion

Add code
May 28, 2026
Viaarxiv icon

TokTalk: Expressive Real-time Facial Animation from Audio-LLM Tokens

Add code
May 29, 2026
Viaarxiv icon

Audio-Visual Intelligence in Large Foundation Models

Add code
May 05, 2026
Viaarxiv icon

2nd of the 5th PVUW MeViS-Audio Track: ASR-SaSaSa2VA

Add code
Apr 27, 2026
Viaarxiv icon

VisG AV-HuBERT: Viseme-Guided AV-HuBERT

Add code
Apr 01, 2026
Viaarxiv icon

When AVSR Meets Video Conferencing: Dataset, Degradation, and the Hidden Mechanism Behind Performance Collapse

Add code
Mar 24, 2026
Viaarxiv icon

Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition

Add code
Mar 12, 2026
Viaarxiv icon

Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement

Add code
Mar 04, 2026
Viaarxiv icon

The USTC-NERCSLIP Systems for the CHiME-9 MCoRec Challenge

Add code
Mar 02, 2026
Viaarxiv icon