Picture for Arthur Conmy

Arthur Conmy

How do LLMs Compute Verbal Confidence

Add code
Mar 18, 2026
Viaarxiv icon

Automatically Finding Reward Model Biases

Add code
Feb 16, 2026
Viaarxiv icon

Simple LLM Baselines are Competitive for Model Diffing

Add code
Feb 10, 2026
Viaarxiv icon

Fluid Representations in Reasoning Models

Add code
Feb 04, 2026
Viaarxiv icon

Building Production-Ready Probes For Gemini

Add code
Jan 16, 2026
Viaarxiv icon

Thought Anchors: Which LLM Reasoning Steps Matter?

Add code
Jun 23, 2025
Viaarxiv icon

Line of Sight: On Linear Representations in VLLMs

Add code
Jun 05, 2025
Viaarxiv icon

Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning

Add code
May 30, 2025
Viaarxiv icon

Scaling sparse feature circuit finding for in-context learning

Add code
Apr 18, 2025
Viaarxiv icon

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon