speech


Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment

Add code
Aug 11, 2025
Viaarxiv icon

SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis

Add code
Aug 11, 2025
Viaarxiv icon

Iterative refinement, not training objective, makes HuBERT behave differently from wav2vec 2.0

Add code
Aug 11, 2025
Viaarxiv icon

Toward Machine Interpreting: Lessons from Human Interpreting Studies

Add code
Aug 11, 2025
Viaarxiv icon

Optimal Transport Regularization for Speech Text Alignment in Spoken Language Models

Add code
Aug 11, 2025
Viaarxiv icon

A Survey on Non-Intrusive ASR Refinement: From Output-Level Correction to Full-Model Distillation

Add code
Aug 10, 2025
Viaarxiv icon

Keyword Mamba: Spoken Keyword Spotting with State Space Models

Add code
Aug 10, 2025
Viaarxiv icon

Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models

Add code
Aug 10, 2025
Viaarxiv icon

How Does a Deep Neural Network Look at Lexical Stress?

Add code
Aug 10, 2025
Viaarxiv icon

Freeze and Reveal: Exposing Modality Bias in Vision-Language Models

Add code
Aug 10, 2025
Viaarxiv icon