Picture for Guojun Yin

Guojun Yin

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards

Add code
Feb 09, 2026
Viaarxiv icon

Your Group-Relative Advantage Is Biased

Add code
Jan 13, 2026
Viaarxiv icon

Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents

Add code
Jan 12, 2026
Viaarxiv icon

AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards

Add code
Dec 23, 2025
Viaarxiv icon

ToolForge: A Data Synthesis Pipeline for Multi-Hop Search without Real-World APIs

Add code
Dec 18, 2025
Viaarxiv icon

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Add code
Dec 08, 2025
Viaarxiv icon

From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory

Add code
Nov 11, 2025
Viaarxiv icon

Promoting Efficient Reasoning with Verifiable Stepwise Reward

Add code
Aug 14, 2025
Viaarxiv icon

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Add code
Jun 24, 2025
Viaarxiv icon

Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems

Add code
May 22, 2025
Viaarxiv icon