speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Speech Language Models for Under-Represented Languages: Insights from Wolof

Add code
Sep 18, 2025
Viaarxiv icon

VOX-KRIKRI: Unifying Speech and Language through Continuous Fusion

Add code
Sep 19, 2025
Viaarxiv icon

Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs

Add code
Sep 18, 2025
Viaarxiv icon

Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization

Add code
Sep 19, 2025
Viaarxiv icon

Frustratingly Easy Data Augmentation for Low-Resource ASR

Add code
Sep 18, 2025
Viaarxiv icon

Impact of Phonetics on Speaker Identity in Adversarial Voice Attack

Add code
Sep 18, 2025
Viaarxiv icon

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization

Add code
Sep 18, 2025
Viaarxiv icon

FunAudio-ASR Technical Report

Add code
Sep 15, 2025
Figure 1 for FunAudio-ASR Technical Report
Figure 2 for FunAudio-ASR Technical Report
Figure 3 for FunAudio-ASR Technical Report
Figure 4 for FunAudio-ASR Technical Report
Viaarxiv icon

Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Untowered Airspace

Add code
Sep 17, 2025
Viaarxiv icon

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

Add code
Sep 17, 2025
Viaarxiv icon