speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR

Add code
Nov 18, 2025
Viaarxiv icon

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

Add code
Nov 18, 2025
Viaarxiv icon

Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis

Add code
Nov 17, 2025
Viaarxiv icon

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

Add code
Nov 10, 2025
Viaarxiv icon

Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets

Add code
Nov 15, 2025
Figure 1 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 2 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 3 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 4 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Viaarxiv icon

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

Add code
Nov 13, 2025
Viaarxiv icon

How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer

Add code
Nov 15, 2025
Figure 1 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 2 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 3 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 4 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Viaarxiv icon

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Add code
Nov 12, 2025
Figure 1 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 2 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 3 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 4 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Viaarxiv icon

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues

Add code
Nov 12, 2025
Figure 1 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 2 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 3 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 4 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Viaarxiv icon

CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition

Add code
Nov 10, 2025
Viaarxiv icon