speech


OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion

Add code
Jan 26, 2026
Viaarxiv icon

From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition

Add code
Jan 26, 2026
Viaarxiv icon

Unheard in the Digital Age: Rethinking AI Bias and Speech Diversity

Add code
Jan 26, 2026
Viaarxiv icon

BanglaRobustNet: A Hybrid Denoising-Attention Architecture for Robust Bangla Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran

Add code
Jan 25, 2026
Viaarxiv icon

AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking

Add code
Jan 25, 2026
Viaarxiv icon

AmbER$^2$: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text

Add code
Jan 25, 2026
Viaarxiv icon

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition

Add code
Jan 25, 2026
Viaarxiv icon

AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation

Add code
Jan 25, 2026
Viaarxiv icon

SpatialEmb: Extract and Encode Spatial Information for 1-Stage Multi-channel Multi-speaker ASR on Arbitrary Microphone Arrays

Add code
Jan 25, 2026
Viaarxiv icon