speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

TidyVoice: A Curated Multilingual Dataset for Speaker Verification Derived from Common Voice

Add code
Jan 22, 2026
Viaarxiv icon

CTC-DID: CTC-Based Arabic dialect identification for streaming applications

Add code
Jan 18, 2026
Viaarxiv icon

Beyond Mapping : Domain-Invariant Representations via Spectral Embedding of Optimal Transport Plans

Add code
Jan 19, 2026
Viaarxiv icon

RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models

Add code
Jan 19, 2026
Viaarxiv icon

HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction

Add code
Jan 20, 2026
Viaarxiv icon

Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers

Add code
Jan 15, 2026
Viaarxiv icon

WenetSpeech-Wu: Datasets, Benchmarks, and Models for a Unified Chinese Wu Dialect Speech Processing Ecosystem

Add code
Jan 16, 2026
Viaarxiv icon

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception

Add code
Jan 14, 2026
Viaarxiv icon

ParaMETA: Towards Learning Disentangled Paralinguistic Speaking Styles Representations from Speech

Add code
Jan 18, 2026
Viaarxiv icon

AI-based System for Transforming text and sound to Educational Videos

Add code
Jan 16, 2026
Viaarxiv icon