Picture for Felix George

Felix George

IBM

Detecting Silent Failures in Multi-Agentic AI Trajectories

Add code
Nov 06, 2025
Viaarxiv icon

ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks

Add code
Feb 07, 2025
Figure 1 for ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Figure 2 for ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Figure 3 for ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Figure 4 for ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Viaarxiv icon