speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

Add code
May 29, 2025
Viaarxiv icon

AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition

Add code
May 29, 2025
Viaarxiv icon

Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection

Add code
May 28, 2025
Viaarxiv icon

Structured State Space Model Dynamics and Parametrization for Spiking Neural Networks

Add code
Jun 04, 2025
Viaarxiv icon

Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition

Add code
May 29, 2025
Viaarxiv icon

Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization

Add code
May 30, 2025
Viaarxiv icon

MSDA: Combining Pseudo-labeling and Self-Supervision for Unsupervised Domain Adaptation in ASR

Add code
May 30, 2025
Viaarxiv icon

SocialDF: Benchmark Dataset and Detection Model for Mitigating Harmful Deepfake Content on Social Media Platforms

Add code
Jun 05, 2025
Viaarxiv icon

Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction

Add code
May 30, 2025
Viaarxiv icon

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

Add code
May 30, 2025
Viaarxiv icon