Picture for Asma Ghandeharioun

Asma Ghandeharioun

Who's asking? User personas and the mechanics of latent misalignment

Add code
Jun 17, 2024
Figure 1 for Who's asking? User personas and the mechanics of latent misalignment
Figure 2 for Who's asking? User personas and the mechanics of latent misalignment
Figure 3 for Who's asking? User personas and the mechanics of latent misalignment
Figure 4 for Who's asking? User personas and the mechanics of latent misalignment
Viaarxiv icon

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

Add code
Jan 12, 2024
Figure 1 for Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Figure 2 for Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Figure 3 for Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Figure 4 for Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Viaarxiv icon

Interpretability Illusions in the Generalization of Simplified Models

Add code
Dec 06, 2023
Viaarxiv icon

Post Hoc Explanations of Language Models Can Improve Language Models

Add code
May 19, 2023
Figure 1 for Post Hoc Explanations of Language Models Can Improve Language Models
Figure 2 for Post Hoc Explanations of Language Models Can Improve Language Models
Figure 3 for Post Hoc Explanations of Language Models Can Improve Language Models
Figure 4 for Post Hoc Explanations of Language Models Can Improve Language Models
Viaarxiv icon

Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity

Add code
Jan 24, 2023
Figure 1 for Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity
Figure 2 for Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity
Figure 3 for Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity
Figure 4 for Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity
Viaarxiv icon

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Add code
Jan 10, 2023
Figure 1 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 2 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 3 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Figure 4 for Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Viaarxiv icon

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

Add code
May 31, 2021
Figure 1 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Figure 2 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Figure 3 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Figure 4 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Viaarxiv icon

Human-centric Dialog Training via Offline Reinforcement Learning

Add code
Oct 12, 2020
Figure 1 for Human-centric Dialog Training via Offline Reinforcement Learning
Figure 2 for Human-centric Dialog Training via Offline Reinforcement Learning
Figure 3 for Human-centric Dialog Training via Offline Reinforcement Learning
Figure 4 for Human-centric Dialog Training via Offline Reinforcement Learning
Viaarxiv icon

Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias

Add code
Oct 05, 2019
Figure 1 for Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias
Figure 2 for Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias
Figure 3 for Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias
Figure 4 for Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias
Viaarxiv icon

Hierarchical Reinforcement Learning for Open-Domain Dialog

Add code
Sep 18, 2019
Figure 1 for Hierarchical Reinforcement Learning for Open-Domain Dialog
Figure 2 for Hierarchical Reinforcement Learning for Open-Domain Dialog
Figure 3 for Hierarchical Reinforcement Learning for Open-Domain Dialog
Figure 4 for Hierarchical Reinforcement Learning for Open-Domain Dialog
Viaarxiv icon