speech


DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Add code
Mar 17, 2026
Viaarxiv icon

CineSRD: Leveraging Visual, Acoustic, and Linguistic Cues for Open-World Visual Media Speaker Diarization

Add code
Mar 17, 2026
Viaarxiv icon

RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery

Add code
Mar 17, 2026
Viaarxiv icon

Linearized Bregman Iterations for Sparse Spiking Neural Networks

Add code
Mar 17, 2026
Viaarxiv icon

Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

Add code
Mar 17, 2026
Viaarxiv icon

Collecting Prosody in the Wild: A Content-Controlled, Privacy-First Smartphone Protocol and Empirical Evaluation

Add code
Mar 17, 2026
Viaarxiv icon

Investigating the Impact of Speech Enhancement on Audio Deepfake Detection in Noisy Environments

Add code
Mar 16, 2026
Viaarxiv icon

LLMs and Speech: Integration vs. Combination

Add code
Mar 16, 2026
Viaarxiv icon

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Add code
Mar 16, 2026
Viaarxiv icon

NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation

Add code
Mar 16, 2026
Viaarxiv icon