speech


Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Add code
Oct 27, 2025
Viaarxiv icon

Far from the Shallow: Brain-Predictive Reasoning Embedding through Residual Disentanglement

Add code
Oct 26, 2025
Viaarxiv icon

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

Add code
Oct 26, 2025
Viaarxiv icon

LRW-Persian: Lip-reading in the Wild Dataset for Persian Language

Add code
Oct 26, 2025
Viaarxiv icon

Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views

Add code
Oct 26, 2025
Viaarxiv icon

HyBeam: Hybrid Microphone-Beamforming Array-Agnostic Speech Enhancement for Wearables

Add code
Oct 26, 2025
Viaarxiv icon

Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMS

Add code
Oct 26, 2025
Viaarxiv icon

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

Add code
Oct 26, 2025
Viaarxiv icon

A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus

Add code
Oct 26, 2025
Viaarxiv icon

The Limits of Data Scaling: Sub-token Utilization and Acoustic Saturation in Multilingual ASR

Add code
Oct 26, 2025
Viaarxiv icon