speech


MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

Add code
Jan 14, 2026
Viaarxiv icon

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

Add code
Jan 14, 2026
Viaarxiv icon

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

Add code
Jan 14, 2026
Viaarxiv icon

Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances

Add code
Jan 13, 2026
Viaarxiv icon

An Under-Explored Application for Explainable Multimodal Misogyny Detection in code-mixed Hindi-English

Add code
Jan 13, 2026
Viaarxiv icon

Teaching Robots Like Dogs: Learning Agile Navigation from Luring, Gesture, and Speech

Add code
Jan 13, 2026
Viaarxiv icon

Analyzing Bias in False Refusal Behavior of Large Language Models for Hate Speech Detoxification

Add code
Jan 13, 2026
Viaarxiv icon

Detecting Mental Manipulation in Speech via Synthetic Multi-Speaker Dialogue

Add code
Jan 13, 2026
Viaarxiv icon

Decoding Order Matters in Autoregressive Speech Synthesis

Add code
Jan 13, 2026
Viaarxiv icon

SEE: Signal Embedding Energy for Quantifying Noise Interference in Large Audio Language Models

Add code
Jan 12, 2026
Viaarxiv icon