Alert button
Picture for Han Zhong

Han Zhong

Alert button

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Feb 25, 2024
Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen

Viaarxiv icon

Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity

Dec 28, 2023
Guhao Feng, Han Zhong

Viaarxiv icon

Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF

Dec 18, 2023
Wei Xiong, Hanze Dong, Chenlu Ye, Han Zhong, Nan Jiang, Tong Zhang

Viaarxiv icon

Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

Dec 07, 2023
Jiayi Huang, Han Zhong, Liwei Wang, Lin F. Yang

Viaarxiv icon

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

Oct 30, 2023
Shuang Qiu, Ziyu Dai, Han Zhong, Zhaoran Wang, Zhuoran Yang, Tong Zhang

Viaarxiv icon

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

Oct 19, 2023
Rui Yang, Han Zhong, Jiawei Xu, Amy Zhang, Chongjie Zhang, Lei Han, Tong Zhang

Viaarxiv icon

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

Jun 12, 2023
Jiayi Huang, Han Zhong, Liwei Wang, Lin F. Yang

Figure 1 for Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds
Figure 2 for Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds
Viaarxiv icon

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration

May 29, 2023
Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

Figure 1 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 2 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 3 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Figure 4 for One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration
Viaarxiv icon

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

May 16, 2023
Jose Blanchet, Miao Lu, Tong Zhang, Han Zhong

Figure 1 for Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Viaarxiv icon