Picture for William Merrill

William Merrill

Michael Pokorny

Why Are Linear RNNs More Parallelizable?

Add code
Mar 05, 2026
Viaarxiv icon

Discovering Interpretable Algorithms by Decompiling Transformers to RASP

Add code
Feb 09, 2026
Viaarxiv icon

Context-Free Recognition with Transformers

Add code
Jan 05, 2026
Viaarxiv icon

RELIC: Evaluating Compositional Instruction Following via Language Recognition

Add code
Jun 05, 2025
Viaarxiv icon

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

Add code
May 29, 2025
Viaarxiv icon

Exact Expressive Power of Transformers with Padding

Add code
May 25, 2025
Viaarxiv icon

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Add code
Mar 18, 2025
Viaarxiv icon

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

Add code
Mar 05, 2025
Viaarxiv icon

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Add code
Feb 26, 2025
Figure 1 for Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Figure 2 for Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Figure 3 for Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Figure 4 for Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon