Alert button
Picture for Hadas Orgad

Hadas Orgad

Alert button

Unified Concept Editing in Diffusion Models

Aug 25, 2023
Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzyńska, David Bau

Text-to-image models suffer from various safety issues that may limit their suitability for deployment. Previous methods have separately addressed individual issues of bias, copyright, and offensive content in text-to-image models. However, in the real world, all of these issues appear simultaneously in the same model. We present a method that tackles all issues with a single approach. Our method, Unified Concept Editing (UCE), edits the model without training using a closed-form solution, and scales seamlessly to concurrent edits on text-conditional diffusion models. We demonstrate scalable simultaneous debiasing, style erasure, and content moderation by editing text-to-image projections, and we present extensive experiments demonstrating improved efficacy and scalability over prior work. Our code is available at https://unified.baulab.info

Viaarxiv icon

ReFACT: Updating Text-to-Image Models by Editing the Text Encoder

Jun 01, 2023
Dana Arad, Hadas Orgad, Yonatan Belinkov

Text-to-image models are trained on extensive amounts of data, leading them to implicitly encode factual knowledge within their parameters. While some facts are useful, others may be incorrect or become outdated (e.g., the current President of the United States). We introduce ReFACT, a novel approach for editing factual knowledge in text-to-image generative models. ReFACT updates the weights of a specific layer in the text encoder, only modifying a tiny portion of the model's parameters, and leaving the rest of the model unaffected. We empirically evaluate ReFACT on an existing benchmark, alongside RoAD, a newly curated dataset. ReFACT achieves superior performance in terms of generalization to related concepts while preserving unrelated concepts. Furthermore, ReFACT maintains image generation quality, making it a valuable tool for updating and correcting factual information in text-to-image models.

Viaarxiv icon

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Mar 14, 2023
Hadas Orgad, Bahjat Kawar, Yonatan Belinkov

Figure 1 for Editing Implicit Assumptions in Text-to-Image Diffusion Models
Figure 2 for Editing Implicit Assumptions in Text-to-Image Diffusion Models
Figure 3 for Editing Implicit Assumptions in Text-to-Image Diffusion Models
Figure 4 for Editing Implicit Assumptions in Text-to-Image Diffusion Models

Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.

* Project page: https://time-diffusion.github.io/ 
Viaarxiv icon

Debiasing NLP Models Without Demographic Information

Dec 20, 2022
Hadas Orgad, Yonatan Belinkov

Figure 1 for Debiasing NLP Models Without Demographic Information
Figure 2 for Debiasing NLP Models Without Demographic Information
Figure 3 for Debiasing NLP Models Without Demographic Information
Figure 4 for Debiasing NLP Models Without Demographic Information

Models trained from real-world data tend to imitate and amplify social biases. Although there are many methods suggested to mitigate biases, they require a preliminary information on the types of biases that should be mitigated (e.g., gender or racial bias) and the social groups associated with each data sample. In this work, we propose a debiasing method that operates without any prior knowledge of the demographics in the dataset, detecting biased examples based on an auxiliary model that predicts the main model's success and down-weights them during the training process. Results on racial and gender bias demonstrate that it is possible to mitigate social biases without having to use a costly demographic annotation process.

Viaarxiv icon

Choose Your Lenses: Flaws in Gender Bias Evaluation

Oct 20, 2022
Hadas Orgad, Yonatan Belinkov

Figure 1 for Choose Your Lenses: Flaws in Gender Bias Evaluation
Figure 2 for Choose Your Lenses: Flaws in Gender Bias Evaluation
Figure 3 for Choose Your Lenses: Flaws in Gender Bias Evaluation
Figure 4 for Choose Your Lenses: Flaws in Gender Bias Evaluation

Considerable efforts to measure and mitigate gender bias in recent years have led to the introduction of an abundance of tasks, datasets, and metrics used in this vein. In this position paper, we assess the current paradigm of gender bias evaluation and identify several flaws in it. First, we highlight the importance of extrinsic bias metrics that measure how a model's performance on some task is affected by gender, as opposed to intrinsic evaluations of model representations, which are less strongly connected to specific harms to people interacting with systems. We find that only a few extrinsic metrics are measured in most studies, although more can be measured. Second, we find that datasets and metrics are often coupled, and discuss how their coupling hinders the ability to obtain reliable conclusions, and how one may decouple them. We then investigate how the choice of the dataset and its composition, as well as the choice of the metric, affect bias measurement, finding significant variations across each of them. Finally, we propose several guidelines for more reliable gender bias evaluation.

* Accepted to the 4th Workshop on Gender Bias in Natural Language Processing 
Viaarxiv icon

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Apr 14, 2022
Hadas Orgad, Seraphina Goldfarb-Tarrant, Yonatan Belinkov

Figure 1 for How Gender Debiasing Affects Internal Model Representations, and Why It Matters
Figure 2 for How Gender Debiasing Affects Internal Model Representations, and Why It Matters
Figure 3 for How Gender Debiasing Affects Internal Model Representations, and Why It Matters
Figure 4 for How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Common studies of gender bias in NLP focus either on extrinsic bias measured by model performance on a downstream task or on intrinsic bias found in models' internal representations. However, the relationship between extrinsic and intrinsic bias is relatively unknown. In this work, we illuminate this relationship by measuring both quantities together: we debias a model during downstream fine-tuning, which reduces extrinsic bias, and measure the effect on intrinsic bias, which is operationalized as bias extractability with information-theoretic probing. Through experiments on two tasks and multiple bias metrics, we show that our intrinsic bias metric is a better indicator of debiasing than (a contextual adaptation of) the standard WEAT metric, and can also expose cases of superficial debiasing. Our framework provides a comprehensive perspective on bias in NLP models, which can be applied to deploy NLP systems in a more informed manner. Our code will be made publicly available.

* Accepted to NAACL 2022 
Viaarxiv icon