Picture for Sham Kakade

Sham Kakade

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

Add code
Jun 17, 2026
Viaarxiv icon

A Unifying View of Attention Sinks: Two Algorithms, Two Solutions

Add code
Jun 06, 2026
Viaarxiv icon

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

Add code
Jun 02, 2026
Viaarxiv icon

Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions

Add code
Jun 01, 2026
Viaarxiv icon

The Recurrent Transformer: Greater Effective Depth and Efficient Decoding

Add code
Apr 23, 2026
Viaarxiv icon

Peer-Predictive Self-Training for Language Model Reasoning

Add code
Apr 14, 2026
Viaarxiv icon

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

Add code
Feb 17, 2026
Viaarxiv icon

Weight Decay Improves Language Model Plasticity

Add code
Feb 11, 2026
Viaarxiv icon

Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training

Add code
Feb 10, 2026
Viaarxiv icon

Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging

Add code
Feb 03, 2026
Viaarxiv icon