speech


Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization

Add code
Mar 13, 2026
Viaarxiv icon

Towards unified brain-to-text decoding across speech production and perception

Add code
Mar 13, 2026
Viaarxiv icon

Self-Supervised Speech Models Encode Phonetic Context via Position-dependent Orthogonal Subspaces

Add code
Mar 13, 2026
Viaarxiv icon

Understanding the strengths and weaknesses of SSL models for audio deepfake model attribution

Add code
Mar 13, 2026
Viaarxiv icon

Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching

Add code
Mar 13, 2026
Viaarxiv icon

VoXtream2: Full-stream TTS with dynamic speaking rate control

Add code
Mar 13, 2026
Viaarxiv icon

Team RAS in 10th ABAW Competition: Multimodal Valence and Arousal Estimation Approach

Add code
Mar 13, 2026
Viaarxiv icon

Learning from Child-Directed Speech in Two-Language Scenarios: A French-English Case Study

Add code
Mar 13, 2026
Viaarxiv icon

As Language Models Scale, Low-order Linear Depth Dynamics Emerge

Add code
Mar 13, 2026
Viaarxiv icon

MamTra: A Hybrid Mamba-Transformer Backbone for Speech Synthesis

Add code
Mar 12, 2026
Viaarxiv icon