Picture for Houhan Lu

Houhan Lu

T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains

Add code
Jun 09, 2026
Viaarxiv icon