Picture for Aviral Kumar

Aviral Kumar

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

Add code
Jun 20, 2024
Viaarxiv icon

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Add code
Jun 14, 2024
Viaarxiv icon

Is Value Learning Really the Main Bottleneck in Offline RL?

Add code
Jun 13, 2024
Figure 1 for Is Value Learning Really the Main Bottleneck in Offline RL?
Figure 2 for Is Value Learning Really the Main Bottleneck in Offline RL?
Figure 3 for Is Value Learning Really the Main Bottleneck in Offline RL?
Figure 4 for Is Value Learning Really the Main Bottleneck in Offline RL?
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Add code
Mar 08, 2024
Figure 1 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 2 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 3 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 4 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Viaarxiv icon

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Add code
Mar 06, 2024
Figure 1 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 2 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 3 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 4 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Viaarxiv icon

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Add code
Feb 29, 2024
Figure 1 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 2 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 3 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 4 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Viaarxiv icon

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

Add code
Feb 13, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon