speech


PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing

Add code
Apr 14, 2026
Viaarxiv icon

Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition

Add code
Apr 14, 2026
Viaarxiv icon

VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark

Add code
Apr 14, 2026
Viaarxiv icon

SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization

Add code
Apr 14, 2026
Viaarxiv icon

ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks

Add code
Apr 14, 2026
Viaarxiv icon

An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding

Add code
Apr 14, 2026
Viaarxiv icon

CoSyncDiT: Cognitive Synchronous Diffusion Transformer for Movie Dubbing

Add code
Apr 14, 2026
Viaarxiv icon

TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants

Add code
Apr 14, 2026
Viaarxiv icon

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Add code
Apr 14, 2026
Viaarxiv icon

The Enforcement and Feasibility of Hate Speech Moderation on Twitter

Add code
Apr 14, 2026
Viaarxiv icon