speech


MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models

Add code
Apr 14, 2026
Viaarxiv icon

Listening Alone, Understanding Together: Collaborative Context Recovery for Privacy-Aware AI

Add code
Apr 14, 2026
Viaarxiv icon

StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection

Add code
Apr 13, 2026
Viaarxiv icon

Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech

Add code
Apr 13, 2026
Viaarxiv icon

Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS

Add code
Apr 13, 2026
Viaarxiv icon

Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update

Add code
Apr 13, 2026
Viaarxiv icon

Direction-Preserving MIMO Speech Enhancement Using a Neural Covariance Estimator

Add code
Apr 13, 2026
Viaarxiv icon

Environmental Footprint of GenAI Research: Insights from the Moshi Foundation Model

Add code
Apr 13, 2026
Viaarxiv icon

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Add code
Apr 13, 2026
Viaarxiv icon

Bridging What the Model Thinks and How It Speaks: Self-Aware Speech Language Models for Expressive Speech Generation

Add code
Apr 13, 2026
Viaarxiv icon