Picture for Kaleigh Mentzer

Kaleigh Mentzer

ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset

Add code
Feb 16, 2026
Viaarxiv icon

DatBench: Discriminative, Faithful, and Efficient VLM Evaluations

Add code
Jan 05, 2026
Viaarxiv icon

Luxical: High-Speed Lexical-Dense Text Embeddings

Add code
Dec 11, 2025
Viaarxiv icon