speech


TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding

Add code
Jan 11, 2026
Viaarxiv icon

DIVINE: Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment

Add code
Jan 11, 2026
Viaarxiv icon

Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech

Add code
Jan 11, 2026
Viaarxiv icon

Categorize Early, Integrate Late: Divergent Processing Strategies in Automatic Speech Recognition

Add code
Jan 11, 2026
Viaarxiv icon

QMAVIS: Long Video-Audio Understanding using Fusion of Large Multimodal Models

Add code
Jan 10, 2026
Viaarxiv icon

Lightweight Resolution-Aware Audio Deepfake Detection via Cross-Scale Attention and Consistency Learning

Add code
Jan 10, 2026
Viaarxiv icon

Talking to Extraordinary Objects: Folktales Offer Analogies for Interacting with Technology

Add code
Jan 10, 2026
Viaarxiv icon

Pantagruel: Unified Self-Supervised Encoders for French Text and Speech

Add code
Jan 09, 2026
Viaarxiv icon

An Intelligent AI glasses System with Multi-Agent Architecture for Real-Time Voice Processing and Task Execution

Add code
Jan 09, 2026
Viaarxiv icon

Afri-MCQA: Multimodal Cultural Question Answering for African Languages

Add code
Jan 09, 2026
Viaarxiv icon