Picture for Alexander Waibel

Alexander Waibel

CAPE: A CLIP-Aware Pointing Ensemble of Complementary Heatmap Cues for Embodied Reference Understanding

Add code
Jul 29, 2025
Viaarxiv icon

Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation

Add code
Jul 28, 2025
Viaarxiv icon

Towards Better Disentanglement in Non-Autoregressive Zero-Shot Expressive Voice Conversion

Add code
Jun 04, 2025
Viaarxiv icon

KIT's Low-resource Speech Translation Systems for IWSLT2025: System Enhancement with Synthetic Data and Model Regularization

Add code
May 26, 2025
Viaarxiv icon

KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025

Add code
May 19, 2025
Viaarxiv icon

The AI Co-Ethnographer: How Far Can Automation Take Qualitative Research?

Add code
Apr 21, 2025
Viaarxiv icon

From Speech to Summary: A Comprehensive Survey of Speech Summarization

Add code
Apr 10, 2025
Viaarxiv icon

Zero-Shot Strategies for Length-Controllable Summarization

Add code
Dec 31, 2024
Viaarxiv icon

MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models

Add code
Nov 27, 2024
Figure 1 for MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Figure 2 for MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Figure 3 for MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Figure 4 for MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Viaarxiv icon

Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS

Add code
Oct 19, 2024
Viaarxiv icon