speech recognition


Speech recognition is the task of identifying words spoken aloud, analyzing the voice and language, and accurately transcribing the words.

CantoASR: Prosody-Aware ASR-LALM Collaboration for Low-Resource Cantonese

Add code
Nov 06, 2025
Viaarxiv icon

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Add code
Oct 27, 2025
Viaarxiv icon

Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMS

Add code
Oct 26, 2025
Viaarxiv icon

Scriboora: Rethinking Human Pose Forecasting

Add code
Nov 19, 2025
Figure 1 for Scriboora: Rethinking Human Pose Forecasting
Figure 2 for Scriboora: Rethinking Human Pose Forecasting
Figure 3 for Scriboora: Rethinking Human Pose Forecasting
Figure 4 for Scriboora: Rethinking Human Pose Forecasting
Viaarxiv icon

LRW-Persian: Lip-reading in the Wild Dataset for Persian Language

Add code
Oct 26, 2025
Viaarxiv icon

Overview of the MEDIQA-OE 2025 Shared Task on Medical Order Extraction from Doctor-Patient Consultations

Add code
Oct 30, 2025
Viaarxiv icon

HMM for short independent sequences: Multiple sequence Baum-Welch application

Add code
Oct 30, 2025
Viaarxiv icon

Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm

Add code
Oct 31, 2025
Figure 1 for Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
Figure 2 for Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
Figure 3 for Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
Figure 4 for Reference Microphone Selection for Guided Source Separation based on the Normalized L-p Norm
Viaarxiv icon

The Tonogenesis Continuum in Tibetan: A Computational Investigation

Add code
Oct 26, 2025
Figure 1 for The Tonogenesis Continuum in Tibetan: A Computational Investigation
Figure 2 for The Tonogenesis Continuum in Tibetan: A Computational Investigation
Figure 3 for The Tonogenesis Continuum in Tibetan: A Computational Investigation
Figure 4 for The Tonogenesis Continuum in Tibetan: A Computational Investigation
Viaarxiv icon

A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus

Add code
Oct 26, 2025
Figure 1 for A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus
Figure 2 for A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus
Figure 3 for A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus
Figure 4 for A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus
Viaarxiv icon