Picture for William Merrill

William Merrill

Michael Pokorny

Efficiently Representing Algorithms With Chain-of-Thought Transformers

Add code
Jun 18, 2026
Viaarxiv icon

Revisiting Padded Transformer Expressivity: Which Architectural Choices Matter and Which Don't

Add code
May 28, 2026
Viaarxiv icon

Olmo Hybrid: From Theory to Practice and Back

Add code
Apr 07, 2026
Viaarxiv icon

Why Are Linear RNNs More Parallelizable?

Add code
Mar 05, 2026
Viaarxiv icon

Discovering Interpretable Algorithms by Decompiling Transformers to RASP

Add code
Feb 09, 2026
Viaarxiv icon

Context-Free Recognition with Transformers

Add code
Jan 05, 2026
Viaarxiv icon

RELIC: Evaluating Compositional Instruction Following via Language Recognition

Add code
Jun 05, 2025
Viaarxiv icon

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

Add code
May 29, 2025
Viaarxiv icon

Exact Expressive Power of Transformers with Padding

Add code
May 25, 2025
Viaarxiv icon

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Add code
Mar 18, 2025
Viaarxiv icon