Picture for Aviral Kumar

Aviral Kumar

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Add code
Jun 09, 2025
Figure 1 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 2 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 3 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Figure 4 for Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Viaarxiv icon

Horizon Reduction Makes RL Scalable

Add code
Jun 08, 2025
Figure 1 for Horizon Reduction Makes RL Scalable
Figure 2 for Horizon Reduction Makes RL Scalable
Figure 3 for Horizon Reduction Makes RL Scalable
Figure 4 for Horizon Reduction Makes RL Scalable
Viaarxiv icon

Grounded Reinforcement Learning for Visual Reasoning

Add code
May 29, 2025
Figure 1 for Grounded Reinforcement Learning for Visual Reasoning
Figure 2 for Grounded Reinforcement Learning for Visual Reasoning
Figure 3 for Grounded Reinforcement Learning for Visual Reasoning
Figure 4 for Grounded Reinforcement Learning for Visual Reasoning
Viaarxiv icon

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Add code
May 29, 2025
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Figure 1 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 2 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 3 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Figure 4 for Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Viaarxiv icon

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Add code
Feb 18, 2025
Figure 1 for Scaling Test-Time Compute Without Verification or RL is Suboptimal
Figure 2 for Scaling Test-Time Compute Without Verification or RL is Suboptimal
Figure 3 for Scaling Test-Time Compute Without Verification or RL is Suboptimal
Figure 4 for Scaling Test-Time Compute Without Verification or RL is Suboptimal
Viaarxiv icon

Value-Based Deep RL Scales Predictably

Add code
Feb 06, 2025
Viaarxiv icon

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Add code
Dec 18, 2024
Figure 1 for Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
Figure 2 for Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
Figure 3 for Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
Figure 4 for Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
Viaarxiv icon

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Add code
Dec 10, 2024
Figure 1 for Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
Figure 2 for Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
Figure 3 for Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
Figure 4 for Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
Viaarxiv icon

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone

Add code
Dec 09, 2024
Viaarxiv icon