Picture for Aviral Kumar

Aviral Kumar

Reasoning as an Adaptive Defense for Safety

Add code
Jul 01, 2025
Viaarxiv icon

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Add code
Jun 10, 2025
Viaarxiv icon

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Add code
Jun 09, 2025
Viaarxiv icon

Horizon Reduction Makes RL Scalable

Add code
Jun 08, 2025
Viaarxiv icon

Grounded Reinforcement Learning for Visual Reasoning

Add code
May 29, 2025
Viaarxiv icon

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Add code
May 29, 2025
Viaarxiv icon

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Add code
Mar 10, 2025
Viaarxiv icon

Scaling Test-Time Compute Without Verification or RL is Suboptimal

Add code
Feb 18, 2025
Viaarxiv icon

Value-Based Deep RL Scales Predictably

Add code
Feb 06, 2025
Viaarxiv icon

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Add code
Dec 18, 2024
Viaarxiv icon