speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

MOVER: Combining Multiple Meeting Recognition Systems

Add code
Aug 07, 2025
Figure 1 for MOVER: Combining Multiple Meeting Recognition Systems
Figure 2 for MOVER: Combining Multiple Meeting Recognition Systems
Figure 3 for MOVER: Combining Multiple Meeting Recognition Systems
Viaarxiv icon

A Survey on Non-Intrusive ASR Refinement: From Output-Level Correction to Full-Model Distillation

Add code
Aug 10, 2025
Viaarxiv icon

Whisfusion: Parallel ASR Decoding via a Diffusion Transformer

Add code
Aug 09, 2025
Viaarxiv icon

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance

Add code
Aug 25, 2025
Viaarxiv icon

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages

Add code
Aug 07, 2025
Viaarxiv icon

Efficient Scaling for LLM-based ASR

Add code
Aug 06, 2025
Viaarxiv icon

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models

Add code
Aug 08, 2025
Viaarxiv icon

NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations

Add code
Aug 06, 2025
Viaarxiv icon

A Study on Regularization-Based Continual Learning Methods for Indic ASR

Add code
Aug 08, 2025
Viaarxiv icon

MiDashengLM: Efficient Audio Understanding with General Audio Captions

Add code
Aug 06, 2025
Figure 1 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 2 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 3 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 4 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Viaarxiv icon