speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Ground Truth Generation for Multilingual Historical NLP using LLMs

Add code
Nov 18, 2025
Figure 1 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 2 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 3 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Figure 4 for Ground Truth Generation for Multilingual Historical NLP using LLMs
Viaarxiv icon

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis

Add code
Nov 10, 2025
Figure 1 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 2 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 3 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Figure 4 for E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Viaarxiv icon

Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment

Add code
Nov 09, 2025
Viaarxiv icon

Accelerating scientific discovery with the common task framework

Add code
Nov 06, 2025
Viaarxiv icon

Enabling Automatic Self-Talk Detection via Earables

Add code
Nov 10, 2025
Figure 1 for Enabling Automatic Self-Talk Detection via Earables
Figure 2 for Enabling Automatic Self-Talk Detection via Earables
Figure 3 for Enabling Automatic Self-Talk Detection via Earables
Figure 4 for Enabling Automatic Self-Talk Detection via Earables
Viaarxiv icon

CantoASR: Prosody-Aware ASR-LALM Collaboration for Low-Resource Cantonese

Add code
Nov 06, 2025
Viaarxiv icon

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Add code
Oct 27, 2025
Viaarxiv icon

Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMS

Add code
Oct 26, 2025
Viaarxiv icon

Scriboora: Rethinking Human Pose Forecasting

Add code
Nov 19, 2025
Figure 1 for Scriboora: Rethinking Human Pose Forecasting
Figure 2 for Scriboora: Rethinking Human Pose Forecasting
Figure 3 for Scriboora: Rethinking Human Pose Forecasting
Figure 4 for Scriboora: Rethinking Human Pose Forecasting
Viaarxiv icon

LRW-Persian: Lip-reading in the Wild Dataset for Persian Language

Add code
Oct 26, 2025
Viaarxiv icon