speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets

Add code
Nov 17, 2025
Figure 1 for Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Figure 2 for Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Figure 3 for Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Figure 4 for Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
Viaarxiv icon

Speech-Aware Long Context Pruning and Integration for Contextualized Automatic Speech Recognition

Add code
Nov 14, 2025
Viaarxiv icon

Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition

Add code
Nov 12, 2025
Viaarxiv icon

TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation

Add code
Nov 18, 2025
Viaarxiv icon

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Add code
Nov 12, 2025
Figure 1 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 2 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 3 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 4 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Viaarxiv icon

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues

Add code
Nov 12, 2025
Figure 1 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 2 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 3 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 4 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Viaarxiv icon

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

Add code
Nov 13, 2025
Viaarxiv icon

Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets

Add code
Nov 15, 2025
Figure 1 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 2 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 3 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 4 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Viaarxiv icon

Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs

Add code
Nov 17, 2025
Viaarxiv icon

Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis

Add code
Nov 17, 2025
Viaarxiv icon