Picture for Jonathan D. Chang

Jonathan D. Chang

Critique-out-Loud Reward Models

Add code
Aug 21, 2024
Viaarxiv icon

REBEL: Reinforcement Learning via Regressing Relative Rewards

Add code
Apr 25, 2024
Figure 1 for REBEL: Reinforcement Learning via Regressing Relative Rewards
Figure 2 for REBEL: Reinforcement Learning via Regressing Relative Rewards
Figure 3 for REBEL: Reinforcement Learning via Regressing Relative Rewards
Figure 4 for REBEL: Reinforcement Learning via Regressing Relative Rewards
Viaarxiv icon

Dataset Reset Policy Optimization for RLHF

Add code
Apr 15, 2024
Figure 1 for Dataset Reset Policy Optimization for RLHF
Figure 2 for Dataset Reset Policy Optimization for RLHF
Figure 3 for Dataset Reset Policy Optimization for RLHF
Figure 4 for Dataset Reset Policy Optimization for RLHF
Viaarxiv icon

Adversarial Imitation Learning via Boosting

Add code
Apr 12, 2024
Figure 1 for Adversarial Imitation Learning via Boosting
Figure 2 for Adversarial Imitation Learning via Boosting
Figure 3 for Adversarial Imitation Learning via Boosting
Figure 4 for Adversarial Imitation Learning via Boosting
Viaarxiv icon

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Add code
Mar 25, 2024
Viaarxiv icon

Policy-Gradient Training of Language Models for Ranking

Add code
Oct 06, 2023
Figure 1 for Policy-Gradient Training of Language Models for Ranking
Figure 2 for Policy-Gradient Training of Language Models for Ranking
Figure 3 for Policy-Gradient Training of Language Models for Ranking
Figure 4 for Policy-Gradient Training of Language Models for Ranking
Viaarxiv icon

Learning to Generate Better Than Your LLM

Add code
Jun 20, 2023
Figure 1 for Learning to Generate Better Than Your LLM
Figure 2 for Learning to Generate Better Than Your LLM
Figure 3 for Learning to Generate Better Than Your LLM
Figure 4 for Learning to Generate Better Than Your LLM
Viaarxiv icon

Learning Bellman Complete Representations for Offline Policy Evaluation

Add code
Jul 12, 2022
Figure 1 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 2 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 3 for Learning Bellman Complete Representations for Offline Policy Evaluation
Figure 4 for Learning Bellman Complete Representations for Offline Policy Evaluation
Viaarxiv icon

Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

Add code
Jun 14, 2021
Figure 1 for Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Figure 2 for Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Figure 3 for Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Figure 4 for Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage
Viaarxiv icon