Picture for Dan Hendrycks

Dan Hendrycks

UC Berkeley

TextQuests: How Good are LLMs at Text-Based Video Games?

Add code
Jul 31, 2025
Viaarxiv icon

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

Add code
Jul 28, 2025
Viaarxiv icon

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark

Add code
Apr 21, 2025
Figure 1 for Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Figure 2 for Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Figure 3 for Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Figure 4 for Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Viaarxiv icon

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Add code
Mar 19, 2025
Viaarxiv icon

Superintelligence Strategy: Expert Version

Add code
Mar 07, 2025
Viaarxiv icon

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

Add code
Mar 05, 2025
Viaarxiv icon

Beyond Release: Access Considerations for Generative AI Systems

Add code
Feb 23, 2025
Figure 1 for Beyond Release: Access Considerations for Generative AI Systems
Figure 2 for Beyond Release: Access Considerations for Generative AI Systems
Figure 3 for Beyond Release: Access Considerations for Generative AI Systems
Figure 4 for Beyond Release: Access Considerations for Generative AI Systems
Viaarxiv icon

EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges

Add code
Feb 13, 2025
Figure 1 for EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
Figure 2 for EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
Figure 3 for EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
Figure 4 for EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges
Viaarxiv icon