Picture for Eric Horvitz

Eric Horvitz

Clinician input steers frontier AI models toward both accurate and harmful decisions

Add code
Mar 14, 2026
Viaarxiv icon

Do LLMs Act Like Rational Agents? Measuring Belief Coherence in Probabilistic Decision Making

Add code
Feb 06, 2026
Viaarxiv icon

CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation

Add code
Sep 09, 2025
Figure 1 for CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Figure 2 for CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Figure 3 for CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Figure 4 for CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Viaarxiv icon

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

Add code
May 26, 2025
Figure 1 for MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Figure 2 for MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Figure 3 for MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Figure 4 for MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Viaarxiv icon

Creating General User Models from Computer Use

Add code
May 19, 2025
Viaarxiv icon

Navigating Rifts in Human-LLM Grounding: Study and Benchmark

Add code
Mar 18, 2025
Viaarxiv icon

Generating Structured Outputs from Language Models: Benchmark and Studies

Add code
Jan 18, 2025
Figure 1 for Generating Structured Outputs from Language Models: Benchmark and Studies
Figure 2 for Generating Structured Outputs from Language Models: Benchmark and Studies
Figure 3 for Generating Structured Outputs from Language Models: Benchmark and Studies
Figure 4 for Generating Structured Outputs from Language Models: Benchmark and Studies
Viaarxiv icon

Superhuman performance of a large language model on the reasoning tasks of a physician

Add code
Dec 14, 2024
Figure 1 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 2 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 3 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 4 for Superhuman performance of a large language model on the reasoning tasks of a physician
Viaarxiv icon

Steering Language Model Refusal with Sparse Autoencoders

Add code
Nov 18, 2024
Figure 1 for Steering Language Model Refusal with Sparse Autoencoders
Figure 2 for Steering Language Model Refusal with Sparse Autoencoders
Figure 3 for Steering Language Model Refusal with Sparse Autoencoders
Figure 4 for Steering Language Model Refusal with Sparse Autoencoders
Viaarxiv icon

From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Add code
Nov 06, 2024
Figure 1 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 2 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 3 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 4 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Viaarxiv icon