Picture for Jiani Hou

Jiani Hou

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Add code
Jun 09, 2026
Viaarxiv icon

LPFQA: A Long-Tail Professional Forum-based Benchmark for LLM Evaluation

Add code
Nov 09, 2025
Viaarxiv icon