Picture for Eric Horvitz

Eric Horvitz

CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation

Add code
Sep 09, 2025
Figure 1 for CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Figure 2 for CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Figure 3 for CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Figure 4 for CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Viaarxiv icon

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

Add code
May 26, 2025
Figure 1 for MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Figure 2 for MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Figure 3 for MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Figure 4 for MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks
Viaarxiv icon

Creating General User Models from Computer Use

Add code
May 19, 2025
Viaarxiv icon

Navigating Rifts in Human-LLM Grounding: Study and Benchmark

Add code
Mar 18, 2025
Viaarxiv icon

Generating Structured Outputs from Language Models: Benchmark and Studies

Add code
Jan 18, 2025
Figure 1 for Generating Structured Outputs from Language Models: Benchmark and Studies
Figure 2 for Generating Structured Outputs from Language Models: Benchmark and Studies
Figure 3 for Generating Structured Outputs from Language Models: Benchmark and Studies
Figure 4 for Generating Structured Outputs from Language Models: Benchmark and Studies
Viaarxiv icon

Superhuman performance of a large language model on the reasoning tasks of a physician

Add code
Dec 14, 2024
Figure 1 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 2 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 3 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 4 for Superhuman performance of a large language model on the reasoning tasks of a physician
Viaarxiv icon

Steering Language Model Refusal with Sparse Autoencoders

Add code
Nov 18, 2024
Figure 1 for Steering Language Model Refusal with Sparse Autoencoders
Figure 2 for Steering Language Model Refusal with Sparse Autoencoders
Figure 3 for Steering Language Model Refusal with Sparse Autoencoders
Figure 4 for Steering Language Model Refusal with Sparse Autoencoders
Viaarxiv icon

From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Add code
Nov 06, 2024
Figure 1 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 2 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 3 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 4 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Viaarxiv icon

Decision-Focused Uncertainty Quantification

Add code
Oct 02, 2024
Viaarxiv icon

How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

Add code
Sep 18, 2024
Figure 1 for How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities
Figure 2 for How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities
Viaarxiv icon