speech


Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?

Add code
Sep 19, 2025
Viaarxiv icon

Are Multimodal Foundation Models All That Is Needed for Emofake Detection?

Add code
Sep 19, 2025
Viaarxiv icon

A Steered Response Power Method for Sound Source Localization With Generic Acoustic Models

Add code
Sep 19, 2025
Viaarxiv icon

Direct Simultaneous Translation Activation for Large Audio-Language Models

Add code
Sep 19, 2025
Figure 1 for Direct Simultaneous Translation Activation for Large Audio-Language Models
Figure 2 for Direct Simultaneous Translation Activation for Large Audio-Language Models
Figure 3 for Direct Simultaneous Translation Activation for Large Audio-Language Models
Figure 4 for Direct Simultaneous Translation Activation for Large Audio-Language Models
Viaarxiv icon

State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization

Add code
Sep 19, 2025
Viaarxiv icon

GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition

Add code
Sep 19, 2025
Viaarxiv icon

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition

Add code
Sep 19, 2025
Viaarxiv icon

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Add code
Sep 18, 2025
Viaarxiv icon

From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models

Add code
Sep 18, 2025
Viaarxiv icon

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization

Add code
Sep 18, 2025
Viaarxiv icon