Picture for Mantas Mazeika

Mantas Mazeika

Michael Pokorny

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Add code
Nov 03, 2025
Viaarxiv icon

Remote Labor Index: Measuring AI Automation of Remote Work

Add code
Oct 30, 2025
Viaarxiv icon

TextQuests: How Good are LLMs at Text-Based Video Games?

Add code
Jul 31, 2025
Viaarxiv icon

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

Add code
Mar 05, 2025
Viaarxiv icon

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Add code
Feb 12, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Tamper-Resistant Safeguards for Open-Weight LLMs

Add code
Aug 01, 2024
Figure 1 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 2 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 3 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 4 for Tamper-Resistant Safeguards for Open-Weight LLMs
Viaarxiv icon

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Add code
Jul 31, 2024
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon