Picture for Graham Neubig

Graham Neubig

Carnegie Mellon University

How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations

Add code
Oct 26, 2025
Viaarxiv icon

MERLIN: A Testbed for Multilingual Multimodal Entity Recognition and Linking

Add code
Oct 16, 2025
Viaarxiv icon

Grounding Multilingual Multimodal LLMs With Cultural Knowledge

Add code
Aug 12, 2025
Viaarxiv icon

SimuRA: Towards General Goal-Oriented Agent via Simulative Reasoning Architecture with LLM-Based World Model

Add code
Jul 31, 2025
Viaarxiv icon

Checklists Are Better Than Reward Models For Aligning Language Models

Add code
Jul 24, 2025
Viaarxiv icon

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

Add code
Jul 08, 2025
Figure 1 for OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
Figure 2 for OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
Figure 3 for OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
Figure 4 for OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
Viaarxiv icon

ZINA: Multimodal Fine-grained Hallucination Detection and Editing

Add code
Jun 16, 2025
Viaarxiv icon

CAIRe: Cultural Attribution of Images by Retrieval-Augmented Evaluation

Add code
Jun 10, 2025
Viaarxiv icon

FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks

Add code
May 26, 2025
Viaarxiv icon

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Add code
May 15, 2025
Viaarxiv icon