Picture for Amelia Glaese

Amelia Glaese

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

Add code
Oct 05, 2025
Viaarxiv icon

Stress Testing Deliberative Alignment for Anti-Scheming Training

Add code
Sep 19, 2025
Viaarxiv icon

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

Add code
Apr 16, 2025
Viaarxiv icon

PaperBench: Evaluating AI's Ability to Replicate AI Research

Add code
Apr 02, 2025
Viaarxiv icon

Trading Inference-Time Compute for Adversarial Robustness

Add code
Jan 31, 2025
Figure 1 for Trading Inference-Time Compute for Adversarial Robustness
Figure 2 for Trading Inference-Time Compute for Adversarial Robustness
Figure 3 for Trading Inference-Time Compute for Adversarial Robustness
Figure 4 for Trading Inference-Time Compute for Adversarial Robustness
Viaarxiv icon

Deliberative Alignment: Reasoning Enables Safer Language Models

Add code
Dec 20, 2024
Viaarxiv icon

Measuring short-form factuality in large language models

Add code
Nov 07, 2024
Figure 1 for Measuring short-form factuality in large language models
Figure 2 for Measuring short-form factuality in large language models
Figure 3 for Measuring short-form factuality in large language models
Figure 4 for Measuring short-form factuality in large language models
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Fine-tuning language models to find agreement among humans with diverse preferences

Add code
Nov 28, 2022
Figure 1 for Fine-tuning language models to find agreement among humans with diverse preferences
Figure 2 for Fine-tuning language models to find agreement among humans with diverse preferences
Figure 3 for Fine-tuning language models to find agreement among humans with diverse preferences
Figure 4 for Fine-tuning language models to find agreement among humans with diverse preferences
Viaarxiv icon

Improving alignment of dialogue agents via targeted human judgements

Add code
Sep 28, 2022
Figure 1 for Improving alignment of dialogue agents via targeted human judgements
Figure 2 for Improving alignment of dialogue agents via targeted human judgements
Figure 3 for Improving alignment of dialogue agents via targeted human judgements
Figure 4 for Improving alignment of dialogue agents via targeted human judgements
Viaarxiv icon