speech


FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection

Add code
Apr 07, 2026
Viaarxiv icon

Closing the Speech-Text Gap with Limited Audio for Effective Domain Adaptation in LLM-Based ASR

Add code
Apr 07, 2026
Viaarxiv icon

A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech

Add code
Apr 07, 2026
Viaarxiv icon

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

Add code
Apr 06, 2026
Viaarxiv icon

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

Add code
Apr 06, 2026
Viaarxiv icon

ClickAIXR: On-Device Multimodal Vision-Language Interaction with Real-World Objects in Extended Reality

Add code
Apr 06, 2026
Viaarxiv icon

OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text

Add code
Apr 06, 2026
Viaarxiv icon

Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency

Add code
Apr 06, 2026
Viaarxiv icon

Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift

Add code
Apr 05, 2026
Viaarxiv icon

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis

Add code
Apr 05, 2026
Viaarxiv icon