speech


SN-WER: Script-Normalized WER for Multi-Script Indic ASR Evaluation

Add code
Jun 01, 2026
Viaarxiv icon

WAXAL-NET: Finetuned Edge ASR Across 19 African Languages

Add code
Jun 01, 2026
Viaarxiv icon

SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment

Add code
Jun 01, 2026
Viaarxiv icon

Breaking the Pair: Evaluating Dyadic Interaction via Speaker Switching

Add code
Jun 01, 2026
Viaarxiv icon

When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes

Add code
Jun 01, 2026
Viaarxiv icon

Echo: A Joint-Embedding Predictive Architecture for Speaker Diarization and Speech Recognition in a Shared Latent Space

Add code
Jun 01, 2026
Viaarxiv icon

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning

Add code
Jun 01, 2026
Viaarxiv icon

TalkTag: Fine-Grained Morphosyntactic Error Annotation for Transcribed Speech

Add code
Jun 01, 2026
Viaarxiv icon

Real-Time Generation of Streamable Talking Portrait Video with Reference-Guided Deep Compression VAEs

Add code
Jun 01, 2026
Viaarxiv icon

Semantic Motion Anchors: Bridging Motion and Meaning in Co-Speech Gestures

Add code
Jun 01, 2026
Viaarxiv icon