Picture for Aviral Kumar

Aviral Kumar

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Add code
Jun 14, 2024
Viaarxiv icon

Is Value Learning Really the Main Bottleneck in Offline RL?

Add code
Jun 13, 2024
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Add code
Mar 08, 2024
Figure 1 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 2 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 3 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 4 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Viaarxiv icon

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Add code
Mar 06, 2024
Figure 1 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 2 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 3 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 4 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Viaarxiv icon

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Add code
Feb 29, 2024
Viaarxiv icon

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

Add code
Feb 13, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

Add code
Oct 18, 2023
Figure 1 for Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Figure 2 for Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Figure 3 for Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Figure 4 for Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Viaarxiv icon