Picture for Haotong Tian

Haotong Tian

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

Add code
Jan 28, 2026
Viaarxiv icon

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

Add code
Jun 16, 2025
Figure 1 for xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Figure 2 for xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Figure 3 for xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Figure 4 for xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
Viaarxiv icon