speech


Cascade-Free Mandarin Visual Speech Recognition via Semantic-Guided Cross-Representation Alignment

Add code
Mar 23, 2026
Viaarxiv icon

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Add code
Mar 23, 2026
Viaarxiv icon

DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers

Add code
Mar 23, 2026
Viaarxiv icon

TaigiSpeech: A Low-Resource Real-World Speech Intent Dataset and Preliminary Results with Scalable Data Mining In-the-Wild

Add code
Mar 23, 2026
Viaarxiv icon

Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning

Add code
Mar 23, 2026
Viaarxiv icon

SelfTTS: cross-speaker style transfer through explicit embedding disentanglement and self-refinement using self-augmentation

Add code
Mar 23, 2026
Viaarxiv icon

DATASHI: A Parallel English-Tashlhiyt Corpus for Orthography Normalization and Low-Resource Language Processing

Add code
Mar 23, 2026
Viaarxiv icon

Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks

Add code
Mar 23, 2026
Viaarxiv icon

MSP-Conversation: A Corpus for Naturalistic, Time-Continuous Emotion Recognition

Add code
Mar 23, 2026
Viaarxiv icon

Assessing the Ability of Neural TTS Systems to Model Consonant-Induced F0 Perturbation

Add code
Mar 22, 2026
Viaarxiv icon