Picture for Eric Horvitz

Eric Horvitz

MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

Add code
May 26, 2025
Viaarxiv icon

Creating General User Models from Computer Use

Add code
May 19, 2025
Viaarxiv icon

Navigating Rifts in Human-LLM Grounding: Study and Benchmark

Add code
Mar 18, 2025
Viaarxiv icon

Generating Structured Outputs from Language Models: Benchmark and Studies

Add code
Jan 18, 2025
Viaarxiv icon

Superhuman performance of a large language model on the reasoning tasks of a physician

Add code
Dec 14, 2024
Figure 1 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 2 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 3 for Superhuman performance of a large language model on the reasoning tasks of a physician
Figure 4 for Superhuman performance of a large language model on the reasoning tasks of a physician
Viaarxiv icon

Steering Language Model Refusal with Sparse Autoencoders

Add code
Nov 18, 2024
Figure 1 for Steering Language Model Refusal with Sparse Autoencoders
Figure 2 for Steering Language Model Refusal with Sparse Autoencoders
Figure 3 for Steering Language Model Refusal with Sparse Autoencoders
Figure 4 for Steering Language Model Refusal with Sparse Autoencoders
Viaarxiv icon

From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Add code
Nov 06, 2024
Figure 1 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 2 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 3 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Figure 4 for From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
Viaarxiv icon

Decision-Focused Uncertainty Quantification

Add code
Oct 02, 2024
Viaarxiv icon

How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

Add code
Sep 18, 2024
Figure 1 for How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities
Figure 2 for How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities
Viaarxiv icon

MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering

Add code
Jun 03, 2024
Figure 1 for MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering
Figure 2 for MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering
Viaarxiv icon