Picture for Tal Haklay

Tal Haklay

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

Add code
May 06, 2026
Viaarxiv icon

Pitfalls in Evaluating Interpretability Agents

Add code
Mar 20, 2026
Viaarxiv icon

MIB: A Mechanistic Interpretability Benchmark

Add code
Apr 17, 2025
Figure 1 for MIB: A Mechanistic Interpretability Benchmark
Figure 2 for MIB: A Mechanistic Interpretability Benchmark
Figure 3 for MIB: A Mechanistic Interpretability Benchmark
Figure 4 for MIB: A Mechanistic Interpretability Benchmark
Viaarxiv icon

Position-aware Automatic Circuit Discovery

Add code
Feb 07, 2025
Viaarxiv icon

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Add code
Feb 22, 2024
Figure 1 for Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Figure 2 for Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Figure 3 for Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Figure 4 for Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Viaarxiv icon

Linearity of Relation Decoding in Transformer Language Models

Add code
Aug 17, 2023
Viaarxiv icon