Picture for Zhihan Liu

Zhihan Liu

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Figure 1 for Toward Optimal LLM Alignments Using Two-Player Games
Figure 2 for Toward Optimal LLM Alignments Using Two-Player Games
Figure 3 for Toward Optimal LLM Alignments Using Two-Player Games
Figure 4 for Toward Optimal LLM Alignments Using Two-Player Games
Viaarxiv icon

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

Can Large Language Models Play Games? A Case Study of A Self-Play Approach

Add code
Mar 08, 2024
Figure 1 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 2 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 3 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Figure 4 for Can Large Language Models Play Games? A Case Study of A Self-Play Approach
Viaarxiv icon

How Can LLM Guide RL? A Value-Based Approach

Add code
Feb 25, 2024
Figure 1 for How Can LLM Guide RL? A Value-Based Approach
Figure 2 for How Can LLM Guide RL? A Value-Based Approach
Figure 3 for How Can LLM Guide RL? A Value-Based Approach
Figure 4 for How Can LLM Guide RL? A Value-Based Approach
Viaarxiv icon

A Principled Framework for Knowledge-enhanced Large Language Model

Add code
Nov 18, 2023
Viaarxiv icon

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Add code
Oct 11, 2023
Figure 1 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 2 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 3 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 4 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Viaarxiv icon

Sample-Efficient Multi-Agent RL: An Optimization Perspective

Add code
Oct 10, 2023
Viaarxiv icon

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration

Add code
May 29, 2023
Figure 1 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 2 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 3 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 4 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Viaarxiv icon

Guarded Policy Optimization with Imperfect Online Demonstrations

Add code
Mar 03, 2023
Figure 1 for Guarded Policy Optimization with Imperfect Online Demonstrations
Figure 2 for Guarded Policy Optimization with Imperfect Online Demonstrations
Figure 3 for Guarded Policy Optimization with Imperfect Online Demonstrations
Figure 4 for Guarded Policy Optimization with Imperfect Online Demonstrations
Viaarxiv icon

Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

Add code
Aug 19, 2021
Viaarxiv icon