Picture for Noah A. Smith

Noah A. Smith

Paul G. Allen School of Computer Science & Engineering, University of Washington, Allen Institute for Artificial Intelligence

LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR

Add code
Jun 23, 2025
Viaarxiv icon

Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations

Add code
Jun 23, 2025
Viaarxiv icon

Sampling from Your Language Model One Byte at a Time

Add code
Jun 17, 2025
Viaarxiv icon

Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index

Add code
Jun 13, 2025
Viaarxiv icon

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

Add code
May 23, 2025
Viaarxiv icon

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

Add code
May 15, 2025
Viaarxiv icon

BLAB: Brutally Long Audio Bench

Add code
May 05, 2025
Viaarxiv icon

Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

Add code
Apr 25, 2025
Viaarxiv icon

On Linear Representations and Pretraining Data Frequency in Language Models

Add code
Apr 16, 2025
Viaarxiv icon

DataDecide: How to Predict Best Pretraining Data with Small Experiments

Add code
Apr 15, 2025
Viaarxiv icon