speech


TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding

Add code
Jan 11, 2026
Viaarxiv icon

DIVINE: Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment

Add code
Jan 11, 2026
Viaarxiv icon

Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech

Add code
Jan 11, 2026
Viaarxiv icon

Categorize Early, Integrate Late: Divergent Processing Strategies in Automatic Speech Recognition

Add code
Jan 11, 2026
Viaarxiv icon

Talking to Extraordinary Objects: Folktales Offer Analogies for Interacting with Technology

Add code
Jan 10, 2026
Viaarxiv icon

QMAVIS: Long Video-Audio Understanding using Fusion of Large Multimodal Models

Add code
Jan 10, 2026
Viaarxiv icon

Lightweight Resolution-Aware Audio Deepfake Detection via Cross-Scale Attention and Consistency Learning

Add code
Jan 10, 2026
Viaarxiv icon

On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation

Add code
Jan 09, 2026
Viaarxiv icon

Discriminative-Generative Target Speaker Extraction with Decoder-Only Language Models

Add code
Jan 09, 2026
Viaarxiv icon

Continual-learning for Modelling Low-Resource Languages from Large Language Models

Add code
Jan 09, 2026
Viaarxiv icon