Picture for Kaiqing Zhang

Kaiqing Zhang

Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Add code
Apr 30, 2024
Figure 1 for Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Viaarxiv icon

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Add code
Mar 25, 2024
Figure 1 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 2 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 3 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 4 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Viaarxiv icon

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Add code
Dec 08, 2023
Viaarxiv icon

Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use

Add code
Oct 02, 2023
Figure 1 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 2 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 3 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 4 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Viaarxiv icon

Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing

Add code
Aug 16, 2023
Figure 1 for Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing
Figure 2 for Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing
Figure 3 for Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing
Viaarxiv icon

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

Add code
Jul 28, 2023
Figure 1 for Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective
Figure 2 for Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective
Viaarxiv icon

Multi-Player Zero-Sum Markov Games with Networked Separable Interactions

Add code
Jul 13, 2023
Figure 1 for Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
Figure 2 for Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
Figure 3 for Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
Figure 4 for Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
Viaarxiv icon

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

Add code
Jun 20, 2023
Figure 1 for Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Figure 2 for Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Figure 3 for Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Figure 4 for Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Viaarxiv icon

Self-Supervised Reinforcement Learning that Transfers using Random Features

Add code
May 26, 2023
Figure 1 for Self-Supervised Reinforcement Learning that Transfers using Random Features
Figure 2 for Self-Supervised Reinforcement Learning that Transfers using Random Features
Figure 3 for Self-Supervised Reinforcement Learning that Transfers using Random Features
Figure 4 for Self-Supervised Reinforcement Learning that Transfers using Random Features
Viaarxiv icon

Learning to Extrapolate: A Transductive Approach

Add code
Apr 27, 2023
Figure 1 for Learning to Extrapolate: A Transductive Approach
Figure 2 for Learning to Extrapolate: A Transductive Approach
Figure 3 for Learning to Extrapolate: A Transductive Approach
Figure 4 for Learning to Extrapolate: A Transductive Approach
Viaarxiv icon