Picture for Aviral Kumar

Aviral Kumar

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Add code
Jun 09, 2025
Viaarxiv icon

Horizon Reduction Makes RL Scalable

Add code
Jun 08, 2025
Viaarxiv icon

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Add code
May 29, 2025
Viaarxiv icon

Grounded Reinforcement Learning for Visual Reasoning

Add code
May 29, 2025
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Viaarxiv icon

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Add code
Feb 18, 2025
Viaarxiv icon

Value-Based Deep RL Scales Predictably

Add code
Feb 06, 2025
Viaarxiv icon

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Add code
Dec 18, 2024
Viaarxiv icon

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Add code
Dec 10, 2024
Viaarxiv icon

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone

Add code
Dec 09, 2024
Viaarxiv icon