Speech


From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing

Add code
Dec 31, 2025
Viaarxiv icon

Distilled HuBERT for Mobile Speech Emotion Recognition: A Cross-Corpus Validation Study

Add code
Dec 31, 2025
Viaarxiv icon

Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech

Add code
Dec 30, 2025
Viaarxiv icon

PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation

Add code
Dec 30, 2025
Viaarxiv icon

MiMo-Audio: Audio Language Models are Few-Shot Learners

Add code
Dec 29, 2025
Viaarxiv icon

PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech

Add code
Dec 29, 2025
Viaarxiv icon

Do You Have Freestyle? Expressive Humanoid Locomotion via Audio Control

Add code
Dec 29, 2025
Viaarxiv icon

Single Channel Blind Dereverberation of Speech Signals

Add code
Dec 29, 2025
Viaarxiv icon

AI4Reading: Chinese Audiobook Interpretation System Based on Multi-Agent Collaboration

Add code
Dec 29, 2025
Viaarxiv icon

VALLR-Pin: Uncertainty-Factorized Visual Speech Recognition for Mandarin with Pinyin Guidance

Add code
Dec 29, 2025
Viaarxiv icon