Picture for Aviral Kumar

Aviral Kumar

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

Add code
Oct 17, 2024
Figure 1 for Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Figure 2 for Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Figure 3 for Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Figure 4 for Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Viaarxiv icon

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Add code
Oct 10, 2024
Viaarxiv icon

Generative Verifiers: Reward Modeling as Next-Token Prediction

Add code
Aug 27, 2024
Viaarxiv icon

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Add code
Aug 15, 2024
Viaarxiv icon

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Add code
Aug 06, 2024
Figure 1 for Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Figure 2 for Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Figure 3 for Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Figure 4 for Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Viaarxiv icon

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Add code
Jul 26, 2024
Figure 1 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 2 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 3 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 4 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Viaarxiv icon

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

Add code
Jun 20, 2024
Viaarxiv icon

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Add code
Jun 14, 2024
Viaarxiv icon

Is Value Learning Really the Main Bottleneck in Offline RL?

Add code
Jun 13, 2024
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon