Picture for Mantas Mazeika

Mantas Mazeika

Michael Pokorny

Aggressive Compression Enables LLM Weight Theft

Add code
Jan 03, 2026
Viaarxiv icon

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Add code
Nov 03, 2025
Viaarxiv icon

Remote Labor Index: Measuring AI Automation of Remote Work

Add code
Oct 30, 2025
Viaarxiv icon

TextQuests: How Good are LLMs at Text-Based Video Games?

Add code
Jul 31, 2025
Figure 1 for TextQuests: How Good are LLMs at Text-Based Video Games?
Figure 2 for TextQuests: How Good are LLMs at Text-Based Video Games?
Figure 3 for TextQuests: How Good are LLMs at Text-Based Video Games?
Figure 4 for TextQuests: How Good are LLMs at Text-Based Video Games?
Viaarxiv icon

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

Add code
Mar 05, 2025
Figure 1 for The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Figure 2 for The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Figure 3 for The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Figure 4 for The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Viaarxiv icon

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Add code
Feb 12, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Figure 1 for International AI Safety Report
Figure 2 for International AI Safety Report
Figure 3 for International AI Safety Report
Figure 4 for International AI Safety Report
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Tamper-Resistant Safeguards for Open-Weight LLMs

Add code
Aug 01, 2024
Figure 1 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 2 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 3 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 4 for Tamper-Resistant Safeguards for Open-Weight LLMs
Viaarxiv icon

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Add code
Jul 31, 2024
Viaarxiv icon