Picture for Aviral Kumar

Aviral Kumar

RaC: Robot Learning for Long-Horizon Tasks by Scaling Recovery and Correction

Add code
Sep 09, 2025
Viaarxiv icon

Compute-Optimal Scaling for Value-Based Deep RL

Add code
Aug 20, 2025
Viaarxiv icon

Reasoning as an Adaptive Defense for Safety

Add code
Jul 01, 2025
Viaarxiv icon

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Add code
Jun 10, 2025
Viaarxiv icon

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Add code
Jun 09, 2025
Viaarxiv icon

Horizon Reduction Makes RL Scalable

Add code
Jun 08, 2025
Viaarxiv icon

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Add code
May 29, 2025
Viaarxiv icon

Grounded Reinforcement Learning for Visual Reasoning

Add code
May 29, 2025
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Viaarxiv icon

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Add code
Feb 18, 2025
Viaarxiv icon