Picture for Aviral Kumar

Aviral Kumar

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

Add code
Aug 15, 2024
Viaarxiv icon

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Add code
Aug 06, 2024
Figure 1 for Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Figure 2 for Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Figure 3 for Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Figure 4 for Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Viaarxiv icon

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Add code
Jul 26, 2024
Figure 1 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 2 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 3 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Figure 4 for Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Viaarxiv icon

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

Add code
Jun 20, 2024
Viaarxiv icon

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Add code
Jun 14, 2024
Viaarxiv icon

Is Value Learning Really the Main Bottleneck in Offline RL?

Add code
Jun 13, 2024
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Add code
Mar 08, 2024
Figure 1 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 2 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 3 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 4 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Viaarxiv icon

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Add code
Mar 06, 2024
Figure 1 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 2 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 3 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 4 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Viaarxiv icon