speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis

Add code
Nov 17, 2025
Viaarxiv icon

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models

Add code
Nov 10, 2025
Viaarxiv icon

Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets

Add code
Nov 15, 2025
Figure 1 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 2 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 3 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 4 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Viaarxiv icon

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

Add code
Nov 13, 2025
Viaarxiv icon

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Add code
Nov 12, 2025
Figure 1 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 2 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 3 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 4 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Viaarxiv icon

How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer

Add code
Nov 15, 2025
Figure 1 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 2 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 3 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 4 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Viaarxiv icon

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues

Add code
Nov 12, 2025
Figure 1 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 2 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 3 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 4 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Viaarxiv icon

CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition

Add code
Nov 10, 2025
Viaarxiv icon

TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English

Add code
Nov 13, 2025
Viaarxiv icon

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages

Add code
Nov 12, 2025
Figure 1 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 2 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 3 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 4 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Viaarxiv icon