Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Taro Miyazaki

Multilingual Gloss-free Sign Language Translation: Towards Building a Sign Language Foundation Model

May 30, 2025

Sihan Tan, Taro Miyazaki, Kazuhiro Nakadai

Abstract:Sign Language Translation (SLT) aims to convert sign language (SL) videos into spoken language text, thereby bridging the communication gap between the sign and the spoken community. While most existing works focus on translating a single sign language into a single spoken language (one-to-one SLT), leveraging multilingual resources could mitigate low-resource issues and enhance accessibility. However, multilingual SLT (MLSLT) remains unexplored due to language conflicts and alignment difficulties across SLs and spoken languages. To address these challenges, we propose a multilingual gloss-free model with dual CTC objectives for token-level SL identification and spoken text generation. Our model supports 10 SLs and handles one-to-one, many-to-one, and many-to-many SLT tasks, achieving competitive performance compared to state-of-the-art methods on three widely adopted benchmarks: multilingual SP-10, PHOENIX14T, and CSL-Daily.

Via

Access Paper or Ask Questions

Improvement in Sign Language Translation Using Text CTC Alignment

Dec 12, 2024

Sihan Tan, Taro Miyazaki, Nabeela Khan, Kazuhiro Nakadai

Figure 1 for Improvement in Sign Language Translation Using Text CTC Alignment

Figure 2 for Improvement in Sign Language Translation Using Text CTC Alignment

Figure 3 for Improvement in Sign Language Translation Using Text CTC Alignment

Figure 4 for Improvement in Sign Language Translation Using Text CTC Alignment

Abstract:Current sign language translation (SLT) approaches often rely on gloss-based supervision with Connectionist Temporal Classification (CTC), limiting their ability to handle non-monotonic alignments between sign language video and spoken text. In this work, we propose a novel method combining joint CTC/Attention and transfer learning. The joint CTC/Attention introduces hierarchical encoding and integrates CTC with the attention mechanism during decoding, effectively managing both monotonic and non-monotonic alignments. Meanwhile, transfer learning helps bridge the modality gap between vision and language in SLT. Experimental results on two widely adopted benchmarks, RWTH-PHOENIX-Weather 2014 T and CSL-Daily, show that our method achieves results comparable to state-of-the-art and outperforms the pure-attention baseline. Additionally, this work opens a new door for future research into gloss-free SLT using text-based CTC alignment.

Via

Access Paper or Ask Questions