Picture for Sham Kakade

Sham Kakade

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

Add code
Feb 17, 2026
Viaarxiv icon

Weight Decay Improves Language Model Plasticity

Add code
Feb 11, 2026
Viaarxiv icon

Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training

Add code
Feb 10, 2026
Viaarxiv icon

Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging

Add code
Feb 03, 2026
Viaarxiv icon

GQ-VAE: A gated quantized VAE for learning variable length tokens

Add code
Dec 26, 2025
Viaarxiv icon

The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton

Add code
Oct 10, 2025
Viaarxiv icon

Selective Underfitting in Diffusion Models

Add code
Oct 01, 2025
Viaarxiv icon

Fine-Tuning Masked Diffusion for Provable Self-Correction

Add code
Oct 01, 2025
Viaarxiv icon

Characterization and Mitigation of Training Instabilities in Microscaling Formats

Add code
Jun 25, 2025
Figure 1 for Characterization and Mitigation of Training Instabilities in Microscaling Formats
Figure 2 for Characterization and Mitigation of Training Instabilities in Microscaling Formats
Figure 3 for Characterization and Mitigation of Training Instabilities in Microscaling Formats
Figure 4 for Characterization and Mitigation of Training Instabilities in Microscaling Formats
Viaarxiv icon

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Add code
Jun 25, 2025
Figure 1 for Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs
Figure 2 for Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs
Figure 3 for Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs
Figure 4 for Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs
Viaarxiv icon