speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition

Add code
Sep 16, 2025
Viaarxiv icon

Listening, Imagining \& Refining: A Heuristic Optimized ASR Correction Framework with LLMs

Add code
Sep 18, 2025
Viaarxiv icon

Frustratingly Easy Data Augmentation for Low-Resource ASR

Add code
Sep 18, 2025
Viaarxiv icon

Impact of Phonetics on Speaker Identity in Adversarial Voice Attack

Add code
Sep 18, 2025
Viaarxiv icon

From Who Said What to Who They Are: Modular Training-free Identity-Aware LLM Refinement of Speaker Diarization

Add code
Sep 18, 2025
Viaarxiv icon

Language Conditioning Improves Accuracy of Aircraft Goal Prediction in Untowered Airspace

Add code
Sep 17, 2025
Viaarxiv icon

Are Multimodal Foundation Models All That Is Needed for Emofake Detection?

Add code
Sep 19, 2025
Viaarxiv icon

Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST

Add code
Sep 17, 2025
Viaarxiv icon

CS-FLEURS: A Massively Multilingual and Code-Switched Speech Dataset

Add code
Sep 17, 2025
Viaarxiv icon

FunAudio-ASR Technical Report

Add code
Sep 15, 2025
Viaarxiv icon