Picture for Shaopan Xiong

Shaopan Xiong

Complementary Reinforcement Learning

Add code
Mar 18, 2026
Viaarxiv icon

ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants

Add code
Jan 26, 2026
Viaarxiv icon

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Add code
Dec 31, 2025
Viaarxiv icon

AMAP Agentic Planning Technical Report

Add code
Dec 31, 2025
Viaarxiv icon

RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

Add code
Dec 27, 2025
Viaarxiv icon

LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning

Add code
Oct 09, 2025
Figure 1 for LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning
Figure 2 for LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning
Figure 3 for LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning
Figure 4 for LiveThinking: Enabling Real-Time Efficient Reasoning for AI-Powered Livestreaming via Reinforcement Learning
Viaarxiv icon

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Add code
Aug 11, 2025
Viaarxiv icon

Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

Add code
Jun 06, 2025
Viaarxiv icon

Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment

Add code
Oct 23, 2024
Figure 1 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 2 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 3 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 4 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Viaarxiv icon