Picture for Julian Michael

Julian Michael

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Add code
Nov 03, 2025
Viaarxiv icon

Remote Labor Index: Measuring AI Automation of Remote Work

Add code
Oct 30, 2025
Viaarxiv icon

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Figure 1 for Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

FORTRESS: Frontier Risk Evaluation for National Security and Public Safety

Add code
Jun 17, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Viaarxiv icon

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples

Add code
Nov 12, 2024
Figure 1 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 2 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 3 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 4 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Viaarxiv icon

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

Add code
Sep 25, 2024
Figure 1 for Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Figure 2 for Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Figure 3 for Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Figure 4 for Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Viaarxiv icon

Analyzing the Role of Semantic Representations in the Era of Large Language Models

Add code
May 02, 2024
Viaarxiv icon