Picture for Xuesong Yao

Xuesong Yao

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Add code
Sep 10, 2025
Viaarxiv icon

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

Add code
Aug 12, 2025
Viaarxiv icon

Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

Add code
Apr 01, 2025
Viaarxiv icon

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use

Add code
Jan 07, 2025
Viaarxiv icon