Picture for Shenghan Zheng

Shenghan Zheng

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

Add code
Apr 06, 2026
Viaarxiv icon

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Add code
Feb 13, 2026
Viaarxiv icon