speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition

Add code
Nov 12, 2025
Viaarxiv icon

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

Add code
Nov 10, 2025
Viaarxiv icon

Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs

Add code
Nov 17, 2025
Viaarxiv icon

AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR

Add code
Nov 18, 2025
Viaarxiv icon

Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis

Add code
Nov 17, 2025
Viaarxiv icon

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

Add code
Nov 18, 2025
Viaarxiv icon

Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets

Add code
Nov 15, 2025
Figure 1 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 2 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 3 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 4 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Viaarxiv icon

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Add code
Nov 12, 2025
Figure 1 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 2 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 3 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 4 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Viaarxiv icon

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

Add code
Nov 13, 2025
Viaarxiv icon

CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition

Add code
Nov 10, 2025
Viaarxiv icon