Picture for Sarah-Jane Leslie

Sarah-Jane Leslie

Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers

Add code
May 19, 2025
Viaarxiv icon

Understanding Task Representations in Neural Networks via Bayesian Ablation

Add code
May 19, 2025
Viaarxiv icon

Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language

Add code
Oct 31, 2023
Figure 1 for Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language
Figure 2 for Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language
Figure 3 for Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language
Figure 4 for Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language
Viaarxiv icon

Towards Countering Essentialism through Social Bias Reasoning

Add code
Mar 28, 2023
Viaarxiv icon