Picture for Tengyang Xie

Tengyang Xie

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Viaarxiv icon

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Add code
Apr 04, 2024
Figure 1 for Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Figure 2 for Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Figure 3 for Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Figure 4 for Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Viaarxiv icon

Towards Principled Representation Learning from Videos for Reinforcement Learning

Add code
Mar 20, 2024
Figure 1 for Towards Principled Representation Learning from Videos for Reinforcement Learning
Figure 2 for Towards Principled Representation Learning from Videos for Reinforcement Learning
Figure 3 for Towards Principled Representation Learning from Videos for Reinforcement Learning
Figure 4 for Towards Principled Representation Learning from Videos for Reinforcement Learning
Viaarxiv icon

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

Add code
Feb 20, 2024
Figure 1 for CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
Figure 2 for CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
Figure 3 for CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
Figure 4 for CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples
Viaarxiv icon

Harnessing Density Ratios for Online Reinforcement Learning

Add code
Jan 18, 2024
Viaarxiv icon

Adversarial Model for Offline Reinforcement Learning

Add code
Feb 21, 2023
Figure 1 for Adversarial Model for Offline Reinforcement Learning
Figure 2 for Adversarial Model for Offline Reinforcement Learning
Figure 3 for Adversarial Model for Offline Reinforcement Learning
Figure 4 for Adversarial Model for Offline Reinforcement Learning
Viaarxiv icon

ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data

Add code
Nov 08, 2022
Viaarxiv icon

The Role of Coverage in Online Reinforcement Learning

Add code
Oct 09, 2022
Figure 1 for The Role of Coverage in Online Reinforcement Learning
Viaarxiv icon

Interaction-Grounded Learning with Action-inclusive Feedback

Add code
Jun 16, 2022
Figure 1 for Interaction-Grounded Learning with Action-inclusive Feedback
Figure 2 for Interaction-Grounded Learning with Action-inclusive Feedback
Figure 3 for Interaction-Grounded Learning with Action-inclusive Feedback
Figure 4 for Interaction-Grounded Learning with Action-inclusive Feedback
Viaarxiv icon

Adversarially Trained Actor Critic for Offline Reinforcement Learning

Add code
Feb 05, 2022
Figure 1 for Adversarially Trained Actor Critic for Offline Reinforcement Learning
Figure 2 for Adversarially Trained Actor Critic for Offline Reinforcement Learning
Figure 3 for Adversarially Trained Actor Critic for Offline Reinforcement Learning
Figure 4 for Adversarially Trained Actor Critic for Offline Reinforcement Learning
Viaarxiv icon