Picture for Sham Kakade

Sham Kakade

Peer-Predictive Self-Training for Language Model Reasoning

Add code
Apr 14, 2026
Viaarxiv icon

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

Add code
Feb 17, 2026
Viaarxiv icon

Weight Decay Improves Language Model Plasticity

Add code
Feb 11, 2026
Viaarxiv icon

Stop Training for the Worst: Progressive Unmasking Accelerates Masked Diffusion Training

Add code
Feb 10, 2026
Viaarxiv icon

Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging

Add code
Feb 03, 2026
Viaarxiv icon

GQ-VAE: A gated quantized VAE for learning variable length tokens

Add code
Dec 26, 2025
Viaarxiv icon

The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton

Add code
Oct 10, 2025
Figure 1 for The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Figure 2 for The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Figure 3 for The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Figure 4 for The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Viaarxiv icon

Selective Underfitting in Diffusion Models

Add code
Oct 01, 2025
Viaarxiv icon

Fine-Tuning Masked Diffusion for Provable Self-Correction

Add code
Oct 01, 2025
Viaarxiv icon

Characterization and Mitigation of Training Instabilities in Microscaling Formats

Add code
Jun 25, 2025
Figure 1 for Characterization and Mitigation of Training Instabilities in Microscaling Formats
Figure 2 for Characterization and Mitigation of Training Instabilities in Microscaling Formats
Figure 3 for Characterization and Mitigation of Training Instabilities in Microscaling Formats
Figure 4 for Characterization and Mitigation of Training Instabilities in Microscaling Formats
Viaarxiv icon