speech


How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu

Add code
Oct 08, 2025
Viaarxiv icon

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Add code
Oct 08, 2025
Figure 1 for SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Figure 2 for SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Figure 3 for SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Figure 4 for SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Viaarxiv icon

AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs

Add code
Oct 08, 2025
Figure 1 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 2 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 3 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 4 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Viaarxiv icon

Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease

Add code
Oct 08, 2025
Viaarxiv icon

Towards Responsible Evaluation for Text-to-Speech

Add code
Oct 08, 2025
Figure 1 for Towards Responsible Evaluation for Text-to-Speech
Figure 2 for Towards Responsible Evaluation for Text-to-Speech
Figure 3 for Towards Responsible Evaluation for Text-to-Speech
Figure 4 for Towards Responsible Evaluation for Text-to-Speech
Viaarxiv icon

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Add code
Oct 08, 2025
Viaarxiv icon

Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation

Add code
Oct 08, 2025
Figure 1 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 2 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 3 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Figure 4 for Enhancing Speech Emotion Recognition via Fine-Tuning Pre-Trained Models and Hyper-Parameter Optimisation
Viaarxiv icon

TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation

Add code
Oct 08, 2025
Viaarxiv icon

Making Machines Sound Sarcastic: LLM-Enhanced and Retrieval-Guided Sarcastic Speech Synthesis

Add code
Oct 08, 2025
Viaarxiv icon

Evaluating Self-Supervised Speech Models via Text-Based LLMS

Add code
Oct 06, 2025
Viaarxiv icon