speech


FoeGlass: Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors

Add code
Jun 03, 2026
Viaarxiv icon

Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech

Add code
Jun 03, 2026
Viaarxiv icon

SpeakerCard-1M: An Evidence-Grounded Speaker Card Corpus for In-the-Wild Speaker Verification

Add code
Jun 03, 2026
Viaarxiv icon

Age-Aware Adapter Tuning for Children's Speech Recognition

Add code
Jun 03, 2026
Viaarxiv icon

A Motivational Architecture for Conversational AGI

Add code
Jun 03, 2026
Viaarxiv icon

Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

Add code
Jun 03, 2026
Viaarxiv icon

Read What You Hear: Reference-Free Hypotheses Evaluation with Acoustic Discrepancy

Add code
Jun 03, 2026
Viaarxiv icon

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

Add code
Jun 03, 2026
Viaarxiv icon

Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

Add code
Jun 03, 2026
Viaarxiv icon

CleanCodec: Efficient and Robust Speech Tokenization via Perceptually Guided Encoding

Add code
Jun 03, 2026
Viaarxiv icon