speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets

Add code
Nov 15, 2025
Figure 1 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 2 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 3 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Figure 4 for Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
Viaarxiv icon

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Add code
Nov 12, 2025
Figure 1 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 2 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 3 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Figure 4 for Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Viaarxiv icon

CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition

Add code
Nov 10, 2025
Viaarxiv icon

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

Add code
Nov 13, 2025
Viaarxiv icon

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues

Add code
Nov 12, 2025
Figure 1 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 2 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 3 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Figure 4 for Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Viaarxiv icon

How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer

Add code
Nov 15, 2025
Figure 1 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 2 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 3 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Figure 4 for How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
Viaarxiv icon

TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English

Add code
Nov 13, 2025
Viaarxiv icon

MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages

Add code
Nov 12, 2025
Figure 1 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 2 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 3 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Figure 4 for MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Viaarxiv icon

Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics

Add code
Nov 11, 2025
Figure 1 for Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Figure 2 for Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Figure 3 for Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Figure 4 for Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
Viaarxiv icon

WST: Weakly Supervised Transducer for Automatic Speech Recognition

Add code
Nov 06, 2025
Viaarxiv icon