Picture for Chris Tanner

Chris Tanner

FrontierFinance: A Long-Horizon Computer-Use Benchmark of Real-World Financial Tasks

Add code
Apr 07, 2026
Viaarxiv icon

Faster Superword Tokenization

Add code
Apr 06, 2026
Viaarxiv icon

Cost-Efficient Estimation of General Abilities Across Benchmarks

Add code
Apr 01, 2026
Viaarxiv icon

The Effect of Scripts and Formats on LLM Numeracy

Add code
Jan 21, 2026
Viaarxiv icon

On Finding Inconsistencies in Documents

Add code
Dec 21, 2025
Viaarxiv icon

BLEUBERI: BLEU is a surprisingly effective reward for instruction following

Add code
May 16, 2025
Figure 1 for BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Figure 2 for BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Figure 3 for BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Figure 4 for BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Viaarxiv icon

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Add code
Mar 31, 2025
Viaarxiv icon

No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding

Add code
Mar 07, 2025
Viaarxiv icon

How Much is Enough? The Diminishing Returns of Tokenization Training Data

Add code
Feb 27, 2025
Viaarxiv icon

Are Language Model Logits Calibrated?

Add code
Oct 21, 2024
Viaarxiv icon