Audio Visual Speech Recognition


Audio visual speech recognition is the process of recognizing speech from both audio and visual cues.

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

Add code
May 07, 2025
Viaarxiv icon

CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization

Add code
May 06, 2025
Viaarxiv icon

Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides

Add code
Apr 21, 2025
Viaarxiv icon

Visual-Aware Speech Recognition for Noisy Scenarios

Add code
Apr 09, 2025
Viaarxiv icon

MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens

Add code
Mar 14, 2025
Viaarxiv icon

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

Add code
Mar 09, 2025
Viaarxiv icon

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Add code
Mar 08, 2025
Viaarxiv icon

Deep Learning for Speech Emotion Recognition: A CNN Approach Utilizing Mel Spectrograms

Add code
Mar 25, 2025
Viaarxiv icon

Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models

Add code
Feb 09, 2025
Viaarxiv icon

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

Add code
Feb 03, 2025
Viaarxiv icon