Picture for Arthur Conmy

Arthur Conmy

Building Production-Ready Probes For Gemini

Add code
Jan 16, 2026
Viaarxiv icon

Thought Anchors: Which LLM Reasoning Steps Matter?

Add code
Jun 23, 2025
Viaarxiv icon

Line of Sight: On Linear Representations in VLLMs

Add code
Jun 05, 2025
Viaarxiv icon

Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning

Add code
May 30, 2025
Viaarxiv icon

Scaling sparse feature circuit finding for in-context learning

Add code
Apr 18, 2025
Viaarxiv icon

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Add code
Mar 13, 2025
Viaarxiv icon

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Add code
Mar 13, 2025
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Improving Steering Vectors by Targeting Sparse Autoencoder Features

Add code
Nov 04, 2024
Viaarxiv icon