Picture for Kevin Wei

Kevin Wei

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

Add code
Mar 11, 2026
Viaarxiv icon

From Human-Level AI Tales to AI Leveling Human Scales

Add code
Feb 21, 2026
Viaarxiv icon

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

Add code
Nov 06, 2025
Figure 1 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 2 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 3 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Figure 4 for Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
Viaarxiv icon

MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports

Add code
May 16, 2025
Viaarxiv icon

The AI Agent Index

Add code
Feb 03, 2025
Figure 1 for The AI Agent Index
Figure 2 for The AI Agent Index
Figure 3 for The AI Agent Index
Figure 4 for The AI Agent Index
Viaarxiv icon

Infrastructure for AI Agents

Add code
Jan 17, 2025
Figure 1 for Infrastructure for AI Agents
Figure 2 for Infrastructure for AI Agents
Figure 3 for Infrastructure for AI Agents
Figure 4 for Infrastructure for AI Agents
Viaarxiv icon

Visibility into AI Agents

Add code
Feb 04, 2024
Figure 1 for Visibility into AI Agents
Figure 2 for Visibility into AI Agents
Viaarxiv icon

How well do LLMs cite relevant medical references? An evaluation framework and analyses

Add code
Feb 03, 2024
Viaarxiv icon

Black-Box Access is Insufficient for Rigorous AI Audits

Add code
Jan 25, 2024
Figure 1 for Black-Box Access is Insufficient for Rigorous AI Audits
Figure 2 for Black-Box Access is Insufficient for Rigorous AI Audits
Figure 3 for Black-Box Access is Insufficient for Rigorous AI Audits
Viaarxiv icon