Picture for Sanmi Koyejo

Sanmi Koyejo

Stanford University

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Add code
May 06, 2026
Viaarxiv icon

Stop Automating Peer Review Without Rigorous Evaluation

Add code
May 04, 2026
Viaarxiv icon

SWE-chat: Coding Agent Interactions From Real Users in the Wild

Add code
Apr 22, 2026
Viaarxiv icon

HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks

Add code
Apr 10, 2026
Viaarxiv icon

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

Add code
Mar 10, 2026
Viaarxiv icon

When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation

Add code
Feb 18, 2026
Viaarxiv icon

Discovering Implicit Large Language Model Alignment Objectives

Add code
Feb 17, 2026
Viaarxiv icon

ALMo: Interactive Aim-Limit-Defined, Multi-Objective System for Personalized High-Dose-Rate Brachytherapy Treatment Planning and Visualization for Cervical Cancer

Add code
Feb 14, 2026
Viaarxiv icon

Attention Head Entropy of LLMs Predicts Answer Correctness

Add code
Feb 14, 2026
Viaarxiv icon

Latent Adversarial Regularization for Offline Preference Optimization

Add code
Jan 29, 2026
Viaarxiv icon