Picture for Shenao Zhang

Shenao Zhang

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Add code
May 29, 2024
Viaarxiv icon

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Add code
May 26, 2024
Figure 1 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 2 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 3 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Figure 4 for Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Viaarxiv icon

How Can LLM Guide RL? A Value-Based Approach

Add code
Feb 25, 2024
Figure 1 for How Can LLM Guide RL? A Value-Based Approach
Figure 2 for How Can LLM Guide RL? A Value-Based Approach
Figure 3 for How Can LLM Guide RL? A Value-Based Approach
Figure 4 for How Can LLM Guide RL? A Value-Based Approach
Viaarxiv icon

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Add code
Oct 30, 2023
Viaarxiv icon

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Add code
Oct 11, 2023
Figure 1 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 2 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 3 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Figure 4 for Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Viaarxiv icon

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration

Add code
May 29, 2023
Figure 1 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 2 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 3 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 4 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Viaarxiv icon

Asking Before Action: Gather Information in Embodied Decision Making with Language Models

Add code
May 25, 2023
Figure 1 for Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Figure 2 for Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Figure 3 for Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Figure 4 for Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Viaarxiv icon

Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Add code
Sep 16, 2022
Figure 1 for Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
Figure 2 for Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
Figure 3 for Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
Figure 4 for Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
Viaarxiv icon

Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

Add code
Aug 30, 2021
Figure 1 for Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning
Figure 2 for Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning
Figure 3 for Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning
Figure 4 for Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning
Viaarxiv icon

Structure-Regularized Attention for Deformable Object Representation

Add code
Jun 12, 2021
Figure 1 for Structure-Regularized Attention for Deformable Object Representation
Figure 2 for Structure-Regularized Attention for Deformable Object Representation
Figure 3 for Structure-Regularized Attention for Deformable Object Representation
Figure 4 for Structure-Regularized Attention for Deformable Object Representation
Viaarxiv icon