speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Add code
Nov 12, 2025
Figure 1 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 2 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 3 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 4 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Viaarxiv icon

Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs

Add code
Nov 17, 2025
Viaarxiv icon

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues

Add code
Nov 12, 2025
Figure 1 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 2 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 3 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 4 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Viaarxiv icon

Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets

Add code
Nov 15, 2025
Figure 1 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 2 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 3 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 4 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Viaarxiv icon

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

Add code
Nov 13, 2025
Viaarxiv icon

Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis

Add code
Nov 17, 2025
Viaarxiv icon

AfriSpeech-MultiBench: A Verticalized Multidomain Multicountry Benchmark Suite for African Accented English ASR

Add code
Nov 18, 2025
Viaarxiv icon

Listen Like a Teacher: Mitigating Whisper Hallucinations using Adaptive Layer Attention and Knowledge Distillation

Add code
Nov 18, 2025
Viaarxiv icon

Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics

Add code
Nov 11, 2025
Figure 1 for Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Figure 2 for Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Figure 3 for Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Figure 4 for Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Viaarxiv icon

TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English

Add code
Nov 13, 2025
Viaarxiv icon