Spoken Language Understanding


Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation

Add code
Jul 09, 2025
Viaarxiv icon

MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark

Add code
Jun 05, 2025
Viaarxiv icon

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

Add code
Jun 04, 2025
Viaarxiv icon

DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech

Add code
Jun 09, 2025
Viaarxiv icon

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

Add code
May 26, 2025
Viaarxiv icon

"KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding

Add code
May 26, 2025
Viaarxiv icon

S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamlessly Speech-Text Alignment and Streaming Speech Decoder

Add code
Jun 16, 2025
Viaarxiv icon

GLAP: General contrastive audio-text pretraining across domains and languages

Add code
Jun 12, 2025
Viaarxiv icon

"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding

Add code
May 21, 2025
Viaarxiv icon

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models

Add code
May 23, 2025
Viaarxiv icon