Picture for Waseem Alshikh

Waseem Alshikh

The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications

Add code
Apr 27, 2026
Viaarxiv icon

Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention

Add code
Feb 03, 2026
Viaarxiv icon

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Add code
Feb 28, 2024
Figure 1 for OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Figure 2 for OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Figure 3 for OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Figure 4 for OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Viaarxiv icon