Picture for Haipeng Luo

Haipeng Luo

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

Add code
Jul 15, 2024
Viaarxiv icon

Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms

Add code
Jun 15, 2024
Viaarxiv icon

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Add code
May 31, 2024
Viaarxiv icon

No-Regret Learning for Fair Multi-Agent Social Welfare Optimization

Add code
May 31, 2024
Viaarxiv icon

Optimal Multiclass U-Calibration Error and Beyond

Add code
May 28, 2024
Viaarxiv icon

Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

Add code
May 14, 2024
Viaarxiv icon

Tractable Local Equilibria in Non-Concave Games

Add code
Mar 13, 2024
Figure 1 for Tractable Local Equilibria in Non-Concave Games
Figure 2 for Tractable Local Equilibria in Non-Concave Games
Figure 3 for Tractable Local Equilibria in Non-Concave Games
Figure 4 for Tractable Local Equilibria in Non-Concave Games
Viaarxiv icon

Contextual Multinomial Logit Bandits with General Value Functions

Add code
Feb 18, 2024
Figure 1 for Contextual Multinomial Logit Bandits with General Value Functions
Viaarxiv icon

Efficient Contextual Bandits with Uninformed Feedback Graphs

Add code
Feb 12, 2024
Figure 1 for Efficient Contextual Bandits with Uninformed Feedback Graphs
Figure 2 for Efficient Contextual Bandits with Uninformed Feedback Graphs
Viaarxiv icon

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Add code
Jan 26, 2024
Viaarxiv icon