Speech Recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition

Add code
Sep 11, 2025
Viaarxiv icon

Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model

Add code
Sep 10, 2025
Viaarxiv icon

Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition

Add code
Sep 10, 2025
Viaarxiv icon

Streaming Sequence-to-Sequence Learning with Delayed Streams Modeling

Add code
Sep 10, 2025
Viaarxiv icon

A Bottom-up Framework with Language-universal Speech Attribute Modeling for Syllable-based ASR

Add code
Sep 09, 2025
Viaarxiv icon

EnvX: Agentize Everything with Agentic AI

Add code
Sep 09, 2025
Viaarxiv icon

Identifying and Calibrating Overconfidence in Noisy Speech Recognition

Add code
Sep 08, 2025
Viaarxiv icon

Layer-wise Analysis for Quality of Multilingual Synthesized Speech

Add code
Sep 05, 2025
Viaarxiv icon

Cloning a Conversational Voice AI Agent from Call\,Recording Datasets for Telesales

Add code
Sep 05, 2025
Viaarxiv icon

Contextualized Token Discrimination for Speech Search Query Correction

Add code
Sep 04, 2025
Viaarxiv icon