Picture for Zhihan Xiong

Zhihan Xiong

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Add code
Apr 20, 2025
Viaarxiv icon

Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration

Add code
Dec 13, 2024
Viaarxiv icon

Language Model Preference Evaluation with Multiple Weak Evaluators

Add code
Oct 14, 2024
Figure 1 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 2 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 3 for Language Model Preference Evaluation with Multiple Weak Evaluators
Figure 4 for Language Model Preference Evaluation with Multiple Weak Evaluators
Viaarxiv icon

Dual Approximation Policy Optimization

Add code
Oct 02, 2024
Figure 1 for Dual Approximation Policy Optimization
Figure 2 for Dual Approximation Policy Optimization
Figure 3 for Dual Approximation Policy Optimization
Figure 4 for Dual Approximation Policy Optimization
Viaarxiv icon

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Add code
Jul 27, 2023
Viaarxiv icon

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning

Add code
Jun 12, 2023
Viaarxiv icon

Offline congestion games: How feedback type affects data coverage requirement

Add code
Oct 24, 2022
Viaarxiv icon

Learning in Congestion Games with Bandit Feedback

Add code
Jun 04, 2022
Figure 1 for Learning in Congestion Games with Bandit Feedback
Viaarxiv icon

Selective Sampling for Online Best-arm Identification

Add code
Nov 02, 2021
Figure 1 for Selective Sampling for Online Best-arm Identification
Figure 2 for Selective Sampling for Online Best-arm Identification
Viaarxiv icon

Randomized Exploration is Near-Optimal for Tabular MDP

Add code
Feb 19, 2021
Figure 1 for Randomized Exploration is Near-Optimal for Tabular MDP
Figure 2 for Randomized Exploration is Near-Optimal for Tabular MDP
Figure 3 for Randomized Exploration is Near-Optimal for Tabular MDP
Viaarxiv icon