Picture for Thomas Icard

Thomas Icard

Transcoder Adapters for Reasoning-Model Diffing

Add code
Feb 24, 2026
Viaarxiv icon

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

Add code
May 17, 2025
Figure 1 for Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Figure 2 for Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Figure 3 for Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Figure 4 for Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Viaarxiv icon

Modeling Discrimination with Causal Abstraction

Add code
Jan 14, 2025
Viaarxiv icon

Belief in the Machine: Investigating Epistemological Blind Spots of Language Models

Add code
Oct 28, 2024
Viaarxiv icon

A Reply to Makelov et al. 's "Interpretability Illusion" Arguments

Add code
Jan 23, 2024
Viaarxiv icon

Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions

Add code
Jun 25, 2023
Figure 1 for Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions
Figure 2 for Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions
Viaarxiv icon

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Add code
Mar 05, 2023
Viaarxiv icon

Causal Abstraction for Faithful Model Interpretation

Add code
Jan 11, 2023
Figure 1 for Causal Abstraction for Faithful Model Interpretation
Figure 2 for Causal Abstraction for Faithful Model Interpretation
Figure 3 for Causal Abstraction for Faithful Model Interpretation
Figure 4 for Causal Abstraction for Faithful Model Interpretation
Viaarxiv icon

Causal Abstraction with Soft Interventions

Add code
Nov 22, 2022
Figure 1 for Causal Abstraction with Soft Interventions
Figure 2 for Causal Abstraction with Soft Interventions
Figure 3 for Causal Abstraction with Soft Interventions
Figure 4 for Causal Abstraction with Soft Interventions
Viaarxiv icon

Holistic Evaluation of Language Models

Add code
Nov 16, 2022
Figure 1 for Holistic Evaluation of Language Models
Figure 2 for Holistic Evaluation of Language Models
Figure 3 for Holistic Evaluation of Language Models
Figure 4 for Holistic Evaluation of Language Models
Viaarxiv icon