speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

Quantifying Quanvolutional Neural Networks Robustness for Speech in Healthcare Applications

Add code
Jan 05, 2026
Viaarxiv icon

PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech

Add code
Dec 29, 2025
Viaarxiv icon

VALLR-Pin: Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement

Add code
Dec 23, 2025
Viaarxiv icon

Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization

Add code
Dec 22, 2025
Figure 1 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 2 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 3 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 4 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Viaarxiv icon

Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning

Add code
Dec 26, 2025
Figure 1 for Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
Figure 2 for Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
Figure 3 for Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
Figure 4 for Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
Viaarxiv icon

TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition

Add code
Dec 20, 2025
Viaarxiv icon

Phoneme-based speech recognition driven by large language models and sampling marginalization

Add code
Dec 20, 2025
Viaarxiv icon

ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update

Add code
Dec 24, 2025
Figure 1 for ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
Figure 2 for ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
Figure 3 for ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
Figure 4 for ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
Viaarxiv icon

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models

Add code
Dec 26, 2025
Viaarxiv icon

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition

Add code
Dec 20, 2025
Figure 1 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 2 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 3 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Figure 4 for Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition
Viaarxiv icon