speech


Bone-conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models

Add code
Jan 18, 2026
Viaarxiv icon

A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation

Add code
Jan 18, 2026
Viaarxiv icon

ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech

Add code
Jan 18, 2026
Viaarxiv icon

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

CTC-DID: CTC-Based Arabic dialect identification for streaming applications

Add code
Jan 18, 2026
Viaarxiv icon

Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks

Add code
Jan 18, 2026
Viaarxiv icon

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition

Add code
Jan 18, 2026
Viaarxiv icon

Robust Online Overdetermined Independent Vector Analysis Based on Bilinear Decomposition

Add code
Jan 18, 2026
Viaarxiv icon

Song Aesthetics Evaluation with Multi-Stem Attention and Hierarchical Uncertainty Modeling

Add code
Jan 18, 2026
Viaarxiv icon

Confidence-based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens

Add code
Jan 18, 2026
Viaarxiv icon