Picture for Francis Rhys Ward

Francis Rhys Ward

Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Add code
Jun 05, 2026
Viaarxiv icon

How does information access affect LLM monitors' ability to detect sabotage?

Add code
Jan 28, 2026
Viaarxiv icon

CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D

Add code
Nov 18, 2025
Viaarxiv icon

Higher-Order Belief in Incomplete Information MAIDs

Add code
Mar 08, 2025
Figure 1 for Higher-Order Belief in Incomplete Information MAIDs
Figure 2 for Higher-Order Belief in Incomplete Information MAIDs
Figure 3 for Higher-Order Belief in Incomplete Information MAIDs
Viaarxiv icon

The Elicitation Game: Evaluating Capability Elicitation Techniques

Add code
Feb 04, 2025
Figure 1 for The Elicitation Game: Evaluating Capability Elicitation Techniques
Figure 2 for The Elicitation Game: Evaluating Capability Elicitation Techniques
Figure 3 for The Elicitation Game: Evaluating Capability Elicitation Techniques
Figure 4 for The Elicitation Game: Evaluating Capability Elicitation Techniques
Viaarxiv icon

Evaluating Language Model Character Traits

Add code
Oct 05, 2024
Figure 1 for Evaluating Language Model Character Traits
Figure 2 for Evaluating Language Model Character Traits
Figure 3 for Evaluating Language Model Character Traits
Figure 4 for Evaluating Language Model Character Traits
Viaarxiv icon

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Add code
Jun 12, 2024
Figure 1 for AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Figure 2 for AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Figure 3 for AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Figure 4 for AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Viaarxiv icon

The Reasons that Agents Act: Intention and Instrumental Goals

Add code
Feb 15, 2024
Figure 1 for The Reasons that Agents Act: Intention and Instrumental Goals
Figure 2 for The Reasons that Agents Act: Intention and Instrumental Goals
Figure 3 for The Reasons that Agents Act: Intention and Instrumental Goals
Figure 4 for The Reasons that Agents Act: Intention and Instrumental Goals
Viaarxiv icon

Honesty Is the Best Policy: Defining and Mitigating AI Deception

Add code
Dec 03, 2023
Figure 1 for Honesty Is the Best Policy: Defining and Mitigating AI Deception
Figure 2 for Honesty Is the Best Policy: Defining and Mitigating AI Deception
Figure 3 for Honesty Is the Best Policy: Defining and Mitigating AI Deception
Figure 4 for Honesty Is the Best Policy: Defining and Mitigating AI Deception
Viaarxiv icon

Experiments with Detecting and Mitigating AI Deception

Add code
Jun 26, 2023
Viaarxiv icon