Picture for Aviral Kumar

Aviral Kumar

Is Value Learning Really the Main Bottleneck in Offline RL?

Add code
Jun 13, 2024
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Add code
Mar 08, 2024
Figure 1 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 2 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 3 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Figure 4 for Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Viaarxiv icon

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Add code
Mar 06, 2024
Figure 1 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 2 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 3 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Figure 4 for Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Viaarxiv icon

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Add code
Feb 29, 2024
Figure 1 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 2 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 3 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Figure 4 for ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Viaarxiv icon

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

Add code
Feb 13, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

Add code
Oct 18, 2023
Figure 1 for Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Figure 2 for Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Figure 3 for Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Figure 4 for Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning
Viaarxiv icon

Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction

Add code
Oct 16, 2023
Viaarxiv icon