Picture for Amrith Setlur

Amrith Setlur

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Add code
Feb 03, 2026
Viaarxiv icon

POPE: Learning to Reason on Hard Problems via Privileged On-Policy Exploration

Add code
Jan 26, 2026
Viaarxiv icon

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes

Add code
Jan 26, 2026
Viaarxiv icon

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Add code
Jan 20, 2026
Viaarxiv icon

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Add code
Oct 02, 2025
Figure 1 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 2 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 3 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Figure 4 for RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Viaarxiv icon

Lower Bounds for Public-Private Learning under Distribution Shift

Add code
Jul 23, 2025
Viaarxiv icon

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Add code
Jun 10, 2025
Viaarxiv icon

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Add code
Jun 09, 2025
Figure 1 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 2 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 3 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 4 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Viaarxiv icon

Exact Unlearning of Finetuning Data via Model Merging at Scale

Add code
Apr 06, 2025
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Figure 1 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 2 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 3 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 4 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Viaarxiv icon