Picture for Amrith Setlur

Amrith Setlur

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Add code
Oct 02, 2025
Viaarxiv icon

Lower Bounds for Public-Private Learning under Distribution Shift

Add code
Jul 23, 2025
Viaarxiv icon

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Add code
Jun 10, 2025
Viaarxiv icon

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Add code
Jun 09, 2025
Viaarxiv icon

Exact Unlearning of Finetuning Data via Model Merging at Scale

Add code
Apr 06, 2025
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Figure 1 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 2 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 3 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 4 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Viaarxiv icon

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Add code
Feb 18, 2025
Figure 1 for Scaling Test-Time Compute Without Verification or RL is Suboptimal
Figure 2 for Scaling Test-Time Compute Without Verification or RL is Suboptimal
Figure 3 for Scaling Test-Time Compute Without Verification or RL is Suboptimal
Figure 4 for Scaling Test-Time Compute Without Verification or RL is Suboptimal
Viaarxiv icon

What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?

Add code
Nov 12, 2024
Figure 1 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 2 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 3 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Figure 4 for What Do Learning Dynamics Reveal About Generalization in LLM Reasoning?
Viaarxiv icon

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Add code
Oct 10, 2024
Figure 1 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 2 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 3 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Figure 4 for Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Viaarxiv icon

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

Add code
Jun 20, 2024
Viaarxiv icon