Picture for Yuval Pinter

Yuval Pinter

Ben-Gurion University of the Negev

Inside the LLM Word Factory

Add code
Jun 07, 2026
Viaarxiv icon

Tokenization with Split Trees

Add code
May 21, 2026
Viaarxiv icon

Universal NER v2: Towards a Massively Multilingual Named Entity Recognition Benchmark

Add code
Apr 14, 2026
Viaarxiv icon

Faster Superword Tokenization

Add code
Apr 06, 2026
Viaarxiv icon

The Degree of Language Diacriticity and Its Effect on Tasks

Add code
Mar 29, 2026
Viaarxiv icon

The Effect of Scripts and Formats on LLM Numeracy

Add code
Jan 21, 2026
Viaarxiv icon

Which Pieces Does Unigram Tokenization Really Need?

Add code
Dec 14, 2025
Figure 1 for Which Pieces Does Unigram Tokenization Really Need?
Figure 2 for Which Pieces Does Unigram Tokenization Really Need?
Figure 3 for Which Pieces Does Unigram Tokenization Really Need?
Figure 4 for Which Pieces Does Unigram Tokenization Really Need?
Viaarxiv icon

Hebrew Diacritics Restoration using Visual Representation

Add code
Oct 30, 2025
Viaarxiv icon

Probing Subphonemes in Morphology Models

Add code
May 16, 2025
Viaarxiv icon

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Add code
Mar 31, 2025
Viaarxiv icon