speech


A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation

Add code
Jan 18, 2026
Viaarxiv icon

Confidence-based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens

Add code
Jan 18, 2026
Viaarxiv icon

Song Aesthetics Evaluation with Multi-Stem Attention and Hierarchical Uncertainty Modeling

Add code
Jan 18, 2026
Viaarxiv icon

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

Harmonizing the Arabic Audio Space with Data Scheduling

Add code
Jan 18, 2026
Viaarxiv icon

Bone-conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models

Add code
Jan 18, 2026
Viaarxiv icon

ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech

Add code
Jan 18, 2026
Viaarxiv icon

Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks

Add code
Jan 18, 2026
Viaarxiv icon

CTC-DID: CTC-Based Arabic dialect identification for streaming applications

Add code
Jan 18, 2026
Viaarxiv icon