Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tomas Zelezny

Sign Language Recognition in the Age of LLMs

Apr 13, 2026

Vaclav Javorek, Jakub Honzik, Ivan Gruber, Tomas Zelezny, Marek Hruz

Abstract:Recent Vision Language Models (VLMs) have demonstrated strong performance across a wide range of multimodal reasoning tasks. This raises the question of whether such general-purpose models can also address specialized visual recognition problems such as isolated sign language recognition (ISLR) without task-specific training. In this work, we investigate the capability of modern VLMs to perform ISLR in a zero-shot setting. We evaluate several open-source and proprietary VLMs on the WLASL300 benchmark. Our experiments show that, under prompt-only zero-shot inference, current open-source VLMs remain far behind classic supervised ISLR classifiers by a wide margin. However, follow-up experiments reveal that these models capture partial visual-semantic alignment between signs and text descriptions. Larger proprietary models achieve substantially higher accuracy, highlighting the importance of model scale and training data diversity. All our code is publicly available on GitHub.

* Accepted at the CVPR 2026 Workshop on Multimodal Sign Language Research (MSLR), 8 pages, 3 figures

Via

Access Paper or Ask Questions

Exploring Pose-based Sign Language Translation: Ablation Studies and Attention Insights

Jul 02, 2025

Tomas Zelezny, Jakub Straka, Vaclav Javorek, Ondrej Valach, Marek Hruz, Ivan Gruber

Figure 1 for Exploring Pose-based Sign Language Translation: Ablation Studies and Attention Insights

Figure 2 for Exploring Pose-based Sign Language Translation: Ablation Studies and Attention Insights

Figure 3 for Exploring Pose-based Sign Language Translation: Ablation Studies and Attention Insights

Figure 4 for Exploring Pose-based Sign Language Translation: Ablation Studies and Attention Insights

Abstract:Sign Language Translation (SLT) has evolved significantly, moving from isolated recognition approaches to complex, continuous gloss-free translation systems. This paper explores the impact of pose-based data preprocessing techniques - normalization, interpolation, and augmentation - on SLT performance. We employ a transformer-based architecture, adapting a modified T5 encoder-decoder model to process pose representations. Through extensive ablation studies on YouTubeASL and How2Sign datasets, we analyze how different preprocessing strategies affect translation accuracy. Our results demonstrate that appropriate normalization, interpolation, and augmentation techniques can significantly improve model robustness and generalization abilities. Additionally, we provide a deep analysis of the model's attentions and reveal interesting behavior suggesting that adding a dedicated register token can improve overall model performance. We publish our code on our GitHub repository, including the preprocessed YouTubeASL data.

* 8 pages, 9 figures, supplementary, SLRTP2025, CVPR2025

Via

Access Paper or Ask Questions