speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

EvolveCaptions: Empowering DHH Users Through Real-Time Collaborative Captioning

Add code
Oct 02, 2025
Viaarxiv icon

Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment

Add code
Oct 23, 2025
Viaarxiv icon

State-of-the-Art Dysarthric Speech Recognition with MetaICL for on-the-fly Personalization

Add code
Sep 19, 2025
Viaarxiv icon

EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model

Add code
Sep 19, 2025
Figure 1 for EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
Figure 2 for EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
Figure 3 for EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
Figure 4 for EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
Viaarxiv icon

GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition

Add code
Sep 19, 2025
Viaarxiv icon

Rethinking Cross-Corpus Speech Emotion Recognition Benchmarking: Are Paralinguistic Pre-Trained Representations Sufficient?

Add code
Sep 19, 2025
Viaarxiv icon

A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband

Add code
Sep 26, 2025
Figure 1 for A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband
Figure 2 for A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband
Figure 3 for A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband
Figure 4 for A Parallel Ultra-Low Power Silent Speech Interface based on a Wearable, Fully-dry EMG Neckband
Viaarxiv icon

Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization

Add code
Sep 19, 2025
Viaarxiv icon

A Study of the Removability of Speaker-Adversarial Perturbations

Add code
Oct 10, 2025
Figure 1 for A Study of the Removability of Speaker-Adversarial Perturbations
Figure 2 for A Study of the Removability of Speaker-Adversarial Perturbations
Figure 3 for A Study of the Removability of Speaker-Adversarial Perturbations
Figure 4 for A Study of the Removability of Speaker-Adversarial Perturbations
Viaarxiv icon

CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching

Add code
Oct 09, 2025
Figure 1 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 2 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 3 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 4 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Viaarxiv icon