speech


What Does the Speaker Embedding Encode?

Add code
Dec 20, 2025
Viaarxiv icon

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition

Add code
Dec 20, 2025
Viaarxiv icon

Phoneme-based speech recognition driven by large language models and sampling marginalization

Add code
Dec 20, 2025
Viaarxiv icon

GeoSense-AI: Fast Location Inference from Crisis Microblogs

Add code
Dec 20, 2025
Viaarxiv icon

SAM Audio: Segment Anything in Audio

Add code
Dec 19, 2025
Viaarxiv icon

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models

Add code
Dec 19, 2025
Viaarxiv icon

When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems

Add code
Dec 19, 2025
Viaarxiv icon

LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection

Add code
Dec 19, 2025
Viaarxiv icon

Robust TTS Training via Self-Purifying Flow Matching for the WildSpoof 2026 TTS Track

Add code
Dec 19, 2025
Viaarxiv icon

Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots

Add code
Dec 19, 2025
Viaarxiv icon