Picture for Thomas Icard

Thomas Icard

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

Add code
May 17, 2025
Viaarxiv icon

Modeling Discrimination with Causal Abstraction

Add code
Jan 14, 2025
Viaarxiv icon

Belief in the Machine: Investigating Epistemological Blind Spots of Language Models

Add code
Oct 28, 2024
Viaarxiv icon

A Reply to Makelov et al. 's "Interpretability Illusion" Arguments

Add code
Jan 23, 2024
Viaarxiv icon

Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions

Add code
Jun 25, 2023
Viaarxiv icon

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Add code
Mar 05, 2023
Viaarxiv icon

Causal Abstraction for Faithful Model Interpretation

Add code
Jan 11, 2023
Viaarxiv icon

Causal Abstraction with Soft Interventions

Add code
Nov 22, 2022
Viaarxiv icon

Holistic Evaluation of Language Models

Add code
Nov 16, 2022
Figure 1 for Holistic Evaluation of Language Models
Figure 2 for Holistic Evaluation of Language Models
Figure 3 for Holistic Evaluation of Language Models
Figure 4 for Holistic Evaluation of Language Models
Viaarxiv icon

Causal Distillation for Language Models

Add code
Dec 05, 2021
Figure 1 for Causal Distillation for Language Models
Figure 2 for Causal Distillation for Language Models
Figure 3 for Causal Distillation for Language Models
Figure 4 for Causal Distillation for Language Models
Viaarxiv icon