speech


"OK Aura, Be Fair With Me": Demographics-Agnostic Training for Bias Mitigation in Wake-up Word Detection

Add code
Apr 07, 2026
Viaarxiv icon

Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction

Add code
Apr 07, 2026
Viaarxiv icon

INTERACT: An AI-Driven Extended Reality Framework for Accesible Communication Featuring Real-Time Sign Language Interpretation and Emotion Recognition

Add code
Apr 07, 2026
Viaarxiv icon

AI-Driven Modular Services for Accessible Multilingual Education in Immersive Extended Reality Settings: Integrating Speech Processing, Translation, and Sign Language Rendering

Add code
Apr 07, 2026
Viaarxiv icon

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

Add code
Apr 06, 2026
Viaarxiv icon

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation

Add code
Apr 06, 2026
Viaarxiv icon

ClickAIXR: On-Device Multimodal Vision-Language Interaction with Real-World Objects in Extended Reality

Add code
Apr 06, 2026
Viaarxiv icon

OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text

Add code
Apr 06, 2026
Viaarxiv icon

Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency

Add code
Apr 06, 2026
Viaarxiv icon

Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift

Add code
Apr 05, 2026
Viaarxiv icon