Picture for Yunhao Tang

Yunhao Tang

On scalable oversight with weak LLMs judging strong LLMs

Add code
Jul 05, 2024
Viaarxiv icon

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Add code
Jun 04, 2024
Viaarxiv icon

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Figure 1 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 2 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 3 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 4 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Viaarxiv icon

Understanding the performance gap between online and offline alignment algorithms

Add code
May 14, 2024
Viaarxiv icon

Human Alignment of Large Language Models through Online Preference Optimisation

Add code
Mar 13, 2024
Figure 1 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 2 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 3 for Human Alignment of Large Language Models through Online Preference Optimisation
Figure 4 for Human Alignment of Large Language Models through Online Preference Optimisation
Viaarxiv icon

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Add code
Mar 08, 2024
Viaarxiv icon

A Distributional Analogue to the Successor Representation

Add code
Feb 13, 2024
Viaarxiv icon

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Add code
Feb 12, 2024
Figure 1 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 2 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 3 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Figure 4 for Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model
Viaarxiv icon

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

Add code
Feb 08, 2024
Viaarxiv icon

Generalized Preference Optimization: A Unified Approach to Offline Alignment

Add code
Feb 08, 2024
Figure 1 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 2 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 3 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Figure 4 for Generalized Preference Optimization: A Unified Approach to Offline Alignment
Viaarxiv icon