Picture for Kaiqing Zhang

Kaiqing Zhang

Online Learning and Equilibrium Computation with Ranking Feedback

Add code
Mar 19, 2026
Viaarxiv icon

Principled Learning-to-Communicate with Quasi-Classical Information Structures

Add code
Mar 04, 2026
Viaarxiv icon

Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach

Add code
Nov 06, 2025
Viaarxiv icon

MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning

Add code
Feb 25, 2025
Figure 1 for MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
Figure 2 for MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
Figure 3 for MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
Figure 4 for MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
Viaarxiv icon

Provable Partially Observable Reinforcement Learning with Privileged Information

Add code
Dec 01, 2024
Figure 1 for Provable Partially Observable Reinforcement Learning with Privileged Information
Figure 2 for Provable Partially Observable Reinforcement Learning with Privileged Information
Figure 3 for Provable Partially Observable Reinforcement Learning with Privileged Information
Figure 4 for Provable Partially Observable Reinforcement Learning with Privileged Information
Viaarxiv icon

Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Add code
Sep 02, 2024
Figure 1 for Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
Viaarxiv icon

Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Add code
Apr 30, 2024
Figure 1 for Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Figure 2 for Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Viaarxiv icon

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Add code
Mar 25, 2024
Figure 1 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 2 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 3 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 4 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Viaarxiv icon

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Add code
Dec 08, 2023
Viaarxiv icon

Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use

Add code
Oct 02, 2023
Figure 1 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 2 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 3 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Figure 4 for Fleet Policy Learning via Weight Merging and An Application to Robotic Tool-Use
Viaarxiv icon