Picture for Arthur Conmy

Arthur Conmy

Thought Anchors: Which LLM Reasoning Steps Matter?

Add code
Jun 23, 2025
Viaarxiv icon

Line of Sight: On Linear Representations in VLLMs

Add code
Jun 05, 2025
Viaarxiv icon

Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning

Add code
May 30, 2025
Viaarxiv icon

Scaling sparse feature circuit finding for in-context learning

Add code
Apr 18, 2025
Viaarxiv icon

An Approach to Technical AGI Safety and Security

Add code
Apr 02, 2025
Viaarxiv icon

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

Add code
Mar 13, 2025
Viaarxiv icon

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Add code
Mar 13, 2025
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Improving Steering Vectors by Targeting Sparse Autoencoder Features

Add code
Nov 04, 2024
Viaarxiv icon

Applying sparse autoencoders to unlearn knowledge in language models

Add code
Oct 25, 2024
Viaarxiv icon