Picture for Sander Land

Sander Land

MinGram: A Minimalist Unigram Tokenizer with High Compression and Competitive Morphological Alignment

Add code
Jun 25, 2026
Viaarxiv icon

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Add code
Jun 12, 2026
Viaarxiv icon

Auditing LLM Benchmarks with Item Response Theory

Add code
May 28, 2026
Viaarxiv icon

Which Pieces Does Unigram Tokenization Really Need?

Add code
Dec 14, 2025
Figure 1 for Which Pieces Does Unigram Tokenization Really Need?
Figure 2 for Which Pieces Does Unigram Tokenization Really Need?
Figure 3 for Which Pieces Does Unigram Tokenization Really Need?
Figure 4 for Which Pieces Does Unigram Tokenization Really Need?
Viaarxiv icon

BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization

Add code
May 30, 2025
Viaarxiv icon

Command A: An Enterprise-Ready Large Language Model

Add code
Apr 01, 2025
Figure 1 for Command A: An Enterprise-Ready Large Language Model
Figure 2 for Command A: An Enterprise-Ready Large Language Model
Figure 3 for Command A: An Enterprise-Ready Large Language Model
Figure 4 for Command A: An Enterprise-Ready Large Language Model
Viaarxiv icon

Understanding Likelihood Over-optimisation in Direct Alignment Algorithms

Add code
Oct 15, 2024
Figure 1 for Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Figure 2 for Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Figure 3 for Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Figure 4 for Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
Viaarxiv icon

Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models

Add code
May 08, 2024
Figure 1 for Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Figure 2 for Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Figure 3 for Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Figure 4 for Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Viaarxiv icon