Picture for Tommaso Cerruti

Tommaso Cerruti

Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results

Add code
Jun 12, 2026
Viaarxiv icon

CocoaBench: Evaluating Unified Digital Agents in the Wild

Add code
Apr 14, 2026
Viaarxiv icon