speech


Listen, Look, Drive: Coupling Audio Instructions for User-aware VLA-based Autonomous Driving

Add code
Jan 17, 2026
Viaarxiv icon

Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening

Add code
Jan 17, 2026
Viaarxiv icon

The Third VoicePrivacy Challenge: Preserving Emotional Expressiveness and Linguistic Content in Voice Anonymization

Add code
Jan 17, 2026
Viaarxiv icon

TANDEM: Temporal-Aware Neural Detection for Multimodal Hate Speech

Add code
Jan 16, 2026
Viaarxiv icon

The Big Ban Theory: A Pre- and Post-Intervention Dataset of Online Content Moderation Actions

Add code
Jan 16, 2026
Viaarxiv icon

AdaMARP: An Adaptive Multi-Agent Interaction Framework for General Immersive Role-Playing

Add code
Jan 16, 2026
Viaarxiv icon

FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning

Add code
Jan 16, 2026
Viaarxiv icon

Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies

Add code
Jan 16, 2026
Viaarxiv icon

F-Actor: Controllable Conversational Behaviour in Full-Duplex Models

Add code
Jan 16, 2026
Viaarxiv icon

WenetSpeech-Wu: Datasets, Benchmarks, and Models for a Unified Chinese Wu Dialect Speech Processing Ecosystem

Add code
Jan 16, 2026
Viaarxiv icon