Picture for Xiong-Hui Chen

Xiong-Hui Chen

Off-Policy Value-Based Reinforcement Learning for Large Language Models

Add code
Mar 24, 2026
Viaarxiv icon

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Add code
Mar 19, 2026
Viaarxiv icon

Group Sequence Policy Optimization

Add code
Jul 24, 2025
Figure 1 for Group Sequence Policy Optimization
Figure 2 for Group Sequence Policy Optimization
Figure 3 for Group Sequence Policy Optimization
Viaarxiv icon

NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios

Add code
Mar 25, 2025
Figure 1 for NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios
Figure 2 for NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios
Figure 3 for NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios
Figure 4 for NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios
Viaarxiv icon

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

Add code
Apr 14, 2024
Figure 1 for Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
Figure 2 for Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
Figure 3 for Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
Figure 4 for Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
Viaarxiv icon

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

Add code
Oct 09, 2023
Figure 1 for Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments
Figure 2 for Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments
Figure 3 for Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments
Figure 4 for Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments
Viaarxiv icon

Language Model Self-improvement by Reinforcement Learning Contemplation

Add code
May 23, 2023
Figure 1 for Language Model Self-improvement by Reinforcement Learning Contemplation
Figure 2 for Language Model Self-improvement by Reinforcement Learning Contemplation
Figure 3 for Language Model Self-improvement by Reinforcement Learning Contemplation
Figure 4 for Language Model Self-improvement by Reinforcement Learning Contemplation
Viaarxiv icon

Sim2Rec: A Simulator-based Decision-making Approach to Optimize Real-World Long-term User Engagement in Sequential Recommender Systems

Add code
May 03, 2023
Viaarxiv icon

A Survey on Model-based Reinforcement Learning

Add code
Jun 19, 2022
Figure 1 for A Survey on Model-based Reinforcement Learning
Viaarxiv icon

Adversarial Counterfactual Environment Model Learning

Add code
Jun 10, 2022
Figure 1 for Adversarial Counterfactual Environment Model Learning
Figure 2 for Adversarial Counterfactual Environment Model Learning
Figure 3 for Adversarial Counterfactual Environment Model Learning
Figure 4 for Adversarial Counterfactual Environment Model Learning
Viaarxiv icon