Alert button
Picture for Runlong Zhou

Runlong Zhou

Alert button

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

Add code
Bookmark button
Alert button
Feb 20, 2024
Runlong Zhou, Simon S. Du, Beibin Li

Viaarxiv icon

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

Add code
Bookmark button
Alert button
Oct 30, 2023
Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du

Viaarxiv icon

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

Add code
Bookmark button
Alert button
Jan 31, 2023
Runlong Zhou, Zihan Zhang, Simon S. Du

Figure 1 for Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Figure 2 for Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Figure 3 for Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Viaarxiv icon

Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems

Add code
Bookmark button
Alert button
Feb 11, 2022
Runlong Zhou, Yuandong Tian, Yi Wu, Simon S. Du

Figure 1 for Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems
Figure 2 for Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems
Figure 3 for Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems
Figure 4 for Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems
Viaarxiv icon

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Add code
Bookmark button
Alert button
Apr 22, 2021
Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

Figure 1 for Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Figure 2 for Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Viaarxiv icon