Picture for Sham M. Kakade

Sham M. Kakade

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

Add code
Mar 12, 2026
Viaarxiv icon

The Role of Sparsity for Length Generalization in Transformers

Add code
Feb 24, 2025
Figure 1 for The Role of Sparsity for Length Generalization in Transformers
Figure 2 for The Role of Sparsity for Length Generalization in Transformers
Figure 3 for The Role of Sparsity for Length Generalization in Transformers
Figure 4 for The Role of Sparsity for Length Generalization in Transformers
Viaarxiv icon

Mixture of Parrots: Experts improve memorization more than reasoning

Add code
Oct 24, 2024
Figure 1 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 2 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 3 for Mixture of Parrots: Experts improve memorization more than reasoning
Figure 4 for Mixture of Parrots: Experts improve memorization more than reasoning
Viaarxiv icon

Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques

Add code
Sep 04, 2024
Figure 1 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 2 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 3 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Figure 4 for Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques
Viaarxiv icon

Eliminating Position Bias of Language Models: A Mechanistic Approach

Add code
Jul 01, 2024
Figure 1 for Eliminating Position Bias of Language Models: A Mechanistic Approach
Figure 2 for Eliminating Position Bias of Language Models: A Mechanistic Approach
Figure 3 for Eliminating Position Bias of Language Models: A Mechanistic Approach
Figure 4 for Eliminating Position Bias of Language Models: A Mechanistic Approach
Viaarxiv icon

Transcendence: Generative Models Can Outperform The Experts That Train Them

Add code
Jun 17, 2024
Figure 1 for Transcendence: Generative Models Can Outperform The Experts That Train Them
Figure 2 for Transcendence: Generative Models Can Outperform The Experts That Train Them
Figure 3 for Transcendence: Generative Models Can Outperform The Experts That Train Them
Figure 4 for Transcendence: Generative Models Can Outperform The Experts That Train Them
Viaarxiv icon

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Add code
Jun 12, 2024
Figure 1 for Scaling Laws in Linear Regression: Compute, Parameters, and Data
Figure 2 for Scaling Laws in Linear Regression: Compute, Parameters, and Data
Figure 3 for Scaling Laws in Linear Regression: Compute, Parameters, and Data
Figure 4 for Scaling Laws in Linear Regression: Compute, Parameters, and Data
Viaarxiv icon

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Add code
May 29, 2024
Figure 1 for Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Figure 2 for Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Figure 3 for Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Figure 4 for Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass
Viaarxiv icon

Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent

Add code
Apr 18, 2024
Figure 1 for Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent
Figure 2 for Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent
Figure 3 for Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent
Figure 4 for Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent
Viaarxiv icon

Repeat After Me: Transformers are Better than State Space Models at Copying

Add code
Feb 01, 2024
Viaarxiv icon