speech


Au-M-ol: A Unified Model for Medical Audio and Language Understanding

Add code
Apr 25, 2026
Viaarxiv icon

Spectro-Temporal Modulation Representation Framework for Human-Imitated Speech Detection

Add code
Apr 25, 2026
Viaarxiv icon

Measuring Temporal Linguistic Emergence in Diffusion Language Models

Add code
Apr 25, 2026
Viaarxiv icon

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

Add code
Apr 24, 2026
Viaarxiv icon

TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis

Add code
Apr 24, 2026
Viaarxiv icon

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus

Add code
Apr 24, 2026
Viaarxiv icon

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Add code
Apr 24, 2026
Viaarxiv icon

A Brain-Inspired Deep Separation Network for Single Channel Raman Spectra Unmixing

Add code
Apr 24, 2026
Viaarxiv icon

Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Analysis

Add code
Apr 24, 2026
Viaarxiv icon

Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models

Add code
Apr 24, 2026
Viaarxiv icon