Picture for Xueyuan Hao

Xueyuan Hao

AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation

Add code
Apr 20, 2026
Viaarxiv icon

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Add code
Sep 30, 2025
Viaarxiv icon