speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

VALLR-Pin: Uncertainty-Factorized Visual Speech Recognition for Mandarin with Pinyin Guidance

Add code
Dec 29, 2025
Viaarxiv icon

Distilled HuBERT for Mobile Speech Emotion Recognition: A Cross-Corpus Validation Study

Add code
Dec 31, 2025
Viaarxiv icon

Advancing Assistive Robotics: Multi-Modal Navigation and Biophysical Monitoring for Next-Generation Wheelchairs

Add code
Jan 06, 2026
Viaarxiv icon

Index-ASR Technical Report

Add code
Dec 31, 2025
Viaarxiv icon

Quantifying Quanvolutional Neural Networks Robustness for Speech in Healthcare Applications

Add code
Jan 05, 2026
Viaarxiv icon

PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech

Add code
Dec 29, 2025
Viaarxiv icon

Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning

Add code
Dec 26, 2025
Figure 1 for Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
Figure 2 for Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
Figure 3 for Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
Figure 4 for Contextual Biasing for LLM-Based ASR with Hotword Retrieval and Reinforcement Learning
Viaarxiv icon

VALLR-Pin: Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement

Add code
Dec 23, 2025
Viaarxiv icon

Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization

Add code
Dec 22, 2025
Figure 1 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 2 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 3 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 4 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Viaarxiv icon

ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update

Add code
Dec 24, 2025
Figure 1 for ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
Figure 2 for ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
Figure 3 for ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
Figure 4 for ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
Viaarxiv icon