speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Beyond Mapping : Domain-Invariant Representations via Spectral Embedding of Optimal Transport Plans

Add code
Jan 19, 2026
Viaarxiv icon

Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers

Add code
Jan 15, 2026
Viaarxiv icon

RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models

Add code
Jan 19, 2026
Viaarxiv icon

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

Add code
Jan 14, 2026
Viaarxiv icon

WenetSpeech-Wu: Datasets, Benchmarks, and Models for a Unified Chinese Wu Dialect Speech Processing Ecosystem

Add code
Jan 16, 2026
Viaarxiv icon

TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice

Add code
Jan 22, 2026
Viaarxiv icon

AI-based System for Transforming text and sound to Educational Videos

Add code
Jan 16, 2026
Viaarxiv icon

HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction

Add code
Jan 20, 2026
Viaarxiv icon

ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech

Add code
Jan 18, 2026
Viaarxiv icon

MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

Add code
Jan 14, 2026
Viaarxiv icon