Picture for Stavros Petridis

Stavros Petridis

FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs

Add code
Dec 23, 2025
Viaarxiv icon

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

Add code
Nov 10, 2025
Viaarxiv icon

Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMS

Add code
Oct 26, 2025
Viaarxiv icon

Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis

Add code
May 25, 2025
Viaarxiv icon

Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach

Add code
May 21, 2025
Viaarxiv icon

FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion

Add code
May 21, 2025
Viaarxiv icon

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Add code
May 01, 2025
Viaarxiv icon

Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction

Add code
Mar 11, 2025
Figure 1 for Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
Figure 2 for Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
Figure 3 for Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
Figure 4 for Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
Viaarxiv icon

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

Add code
Mar 09, 2025
Viaarxiv icon

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

Add code
Mar 08, 2025
Viaarxiv icon