Visual Speech Recognition


Cross-Modal Bottleneck Fusion For Noise Robust Audio-Visual Speech Recognition

Add code
Feb 09, 2026
Viaarxiv icon

D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning

Add code
Feb 08, 2026
Viaarxiv icon

OCR-Enhanced Multimodal ASR Can Read While Listening

Add code
Jan 26, 2026
Viaarxiv icon

MA-LipNet: Multi-Dimensional Attention Networks for Robust Lipreading

Add code
Jan 27, 2026
Viaarxiv icon

asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation

Add code
Jan 28, 2026
Viaarxiv icon

Noise-Robust AV-ASR Using Visual Features Both in the Whisper Encoder and Decoder

Add code
Jan 26, 2026
Viaarxiv icon

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction

Add code
Jan 20, 2026
Viaarxiv icon

AI-based System for Transforming text and sound to Educational Videos

Add code
Jan 16, 2026
Viaarxiv icon

Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances

Add code
Jan 13, 2026
Viaarxiv icon