Picture for Sham Kakade

Sham Kakade

A Simplified Analysis of SGD for Linear Regression with Weight Averaging

Add code
Jun 18, 2025
Viaarxiv icon

Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

Add code
Jun 12, 2025
Viaarxiv icon

Interpreting the Linear Structure of Vision-language Model Embedding Spaces

Add code
Apr 16, 2025
Figure 1 for Interpreting the Linear Structure of Vision-language Model Embedding Spaces
Figure 2 for Interpreting the Linear Structure of Vision-language Model Embedding Spaces
Figure 3 for Interpreting the Linear Structure of Vision-language Model Embedding Spaces
Figure 4 for Interpreting the Linear Structure of Vision-language Model Embedding Spaces
Viaarxiv icon

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Add code
Apr 10, 2025
Figure 1 for Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Figure 2 for Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Figure 3 for Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Figure 4 for Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Viaarxiv icon

Data-Efficient Multi-Agent Spatial Planning with LLMs

Add code
Feb 26, 2025
Figure 1 for Data-Efficient Multi-Agent Spatial Planning with LLMs
Figure 2 for Data-Efficient Multi-Agent Spatial Planning with LLMs
Figure 3 for Data-Efficient Multi-Agent Spatial Planning with LLMs
Figure 4 for Data-Efficient Multi-Agent Spatial Planning with LLMs
Viaarxiv icon

Distributional Scaling Laws for Emergent Capabilities

Add code
Feb 24, 2025
Viaarxiv icon

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

Add code
Feb 10, 2025
Viaarxiv icon

Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants

Add code
Feb 04, 2025
Figure 1 for Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Figure 2 for Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Figure 3 for Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
Viaarxiv icon

Soup to go: mitigating forgetting during continual learning with model averaging

Add code
Jan 09, 2025
Viaarxiv icon

From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos

Add code
Dec 10, 2024
Viaarxiv icon