Picture for Tao Gui

Tao Gui

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Add code
Apr 28, 2026
Viaarxiv icon

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Add code
Apr 21, 2026
Viaarxiv icon

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Add code
Apr 15, 2026
Viaarxiv icon

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

Add code
Apr 15, 2026
Viaarxiv icon

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Add code
Apr 15, 2026
Viaarxiv icon

FinToolSyn: A forward synthesis Framework for Financial Tool-Use Dialogue Data with Dynamic Tool Retrieval

Add code
Mar 25, 2026
Viaarxiv icon

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

Add code
Mar 24, 2026
Viaarxiv icon

CCTU: A Benchmark for Tool Use under Complex Constraints

Add code
Mar 16, 2026
Viaarxiv icon

Can RL Improve Generalization of LLM Agents? An Empirical Study

Add code
Mar 12, 2026
Viaarxiv icon

MagicAgent: Towards Generalized Agent Planning

Add code
Feb 22, 2026
Viaarxiv icon