Picture for Kaiqing Zhang

Kaiqing Zhang

Regret Minimization with Adaptive Opponents in Repeated Games

Add code
Jun 04, 2026
Viaarxiv icon

Online Learning and Equilibrium Computation with Ranking Feedback

Add code
Mar 19, 2026
Viaarxiv icon

Principled Learning-to-Communicate with Quasi-Classical Information Structures

Add code
Mar 04, 2026
Viaarxiv icon

Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach

Add code
Nov 06, 2025
Viaarxiv icon

MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning

Add code
Feb 25, 2025
Figure 1 for MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
Figure 2 for MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
Figure 3 for MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
Figure 4 for MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning
Viaarxiv icon

Provable Partially Observable Reinforcement Learning with Privileged Information

Add code
Dec 01, 2024
Figure 1 for Provable Partially Observable Reinforcement Learning with Privileged Information
Figure 2 for Provable Partially Observable Reinforcement Learning with Privileged Information
Figure 3 for Provable Partially Observable Reinforcement Learning with Privileged Information
Figure 4 for Provable Partially Observable Reinforcement Learning with Privileged Information
Viaarxiv icon

Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Add code
Sep 02, 2024
Figure 1 for Last-Iterate Convergence of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
Viaarxiv icon

Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Add code
Apr 30, 2024
Figure 1 for Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Figure 2 for Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Viaarxiv icon

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Add code
Mar 25, 2024
Figure 1 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 2 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 3 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Figure 4 for Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Viaarxiv icon

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Add code
Dec 08, 2023
Viaarxiv icon