speech


End-to-End Direction-Aware Keyword Spotting with Spatial Priors in Noisy Environments

Add code
Mar 10, 2026
Viaarxiv icon

ALARM: Audio-Language Alignment for Reasoning Models

Add code
Mar 10, 2026
Viaarxiv icon

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

Add code
Mar 10, 2026
Viaarxiv icon

Learning Multiple Utterance-Level Attribute Representations with a Unified Speech Encoder

Add code
Mar 09, 2026
Viaarxiv icon

Evolution Strategy-Based Calibration for Low-Bit Quantization of Speech Models

Add code
Mar 09, 2026
Viaarxiv icon

Listening with the Eyes: Benchmarking Egocentric Co-Speech Grounding across Space and Time

Add code
Mar 09, 2026
Viaarxiv icon

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction

Add code
Mar 09, 2026
Viaarxiv icon

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

Add code
Mar 09, 2026
Viaarxiv icon

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

Add code
Mar 09, 2026
Viaarxiv icon

Using Multimodal and Language-Agnostic Sentence Embeddings for Abstractive Summarization

Add code
Mar 09, 2026
Viaarxiv icon