Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sharon Levy

Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats

Dec 16, 2024

Kuleen Sasse, Carlos Aguirre, Isabel Cachola, Sharon Levy, Mark Dredze

Abstract:WARNING: This paper contains content that maybe upsetting or offensive to some readers. Dog whistles are coded expressions with dual meanings: one intended for the general public (outgroup) and another that conveys a specific message to an intended audience (ingroup). Often, these expressions are used to convey controversial political opinions while maintaining plausible deniability and slip by content moderation filters. Identification of dog whistles relies on curated lexicons, which have trouble keeping up to date. We introduce \textbf{FETCH!}, a task for finding novel dog whistles in massive social media corpora. We find that state-of-the-art systems fail to achieve meaningful results across three distinct social media case studies. We present \textbf{EarShot}, a novel system that combines the strengths of vector databases and Large Language Models (LLMs) to efficiently and effectively identify new dog whistles.

Via

Access Paper or Ask Questions

LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education

Oct 17, 2024

Iain Weissburg, Sathvika Anand, Sharon Levy, Haewon Jeong

Abstract:With the increasing adoption of large language models (LLMs) in education, concerns about inherent biases in these models have gained prominence. We evaluate LLMs for bias in the personalized educational setting, specifically focusing on the models' roles as "teachers". We reveal significant biases in how models generate and select educational content tailored to different demographic groups, including race, ethnicity, sex, gender, disability status, income, and national origin. We introduce and apply two bias score metrics--Mean Absolute Bias (MAB) and Maximum Difference Bias (MDB)--to analyze 9 open and closed state-of-the-art LLMs. Our experiments, which utilize over 17,000 educational explanations across multiple difficulty levels and topics, uncover that models perpetuate both typical and inverted harmful stereotypes.

* 46 Pages, 55 Figures, dataset release pending publication

Via

Access Paper or Ask Questions

Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts

Oct 14, 2024

Sharon Levy, William D. Adler, Tahilin Sanchez Karver, Mark Dredze, Michelle R. Kaufman

Figure 1 for Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts

Figure 2 for Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts

Figure 3 for Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts

Figure 4 for Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts

Abstract:Large language models (LLMs) acquire beliefs about gender from training data and can therefore generate text with stereotypical gender attitudes. Prior studies have demonstrated model generations favor one gender or exhibit stereotypes about gender, but have not investigated the complex dynamics that can influence model reasoning and decision-making involving gender. We study gender equity within LLMs through a decision-making lens with a new dataset, DeMET Prompts, containing scenarios related to intimate, romantic relationships. We explore nine relationship configurations through name pairs across three name lists (men, women, neutral). We investigate equity in the context of gender roles through numerous lenses: typical and gender-neutral names, with and without model safety enhancements, same and mixed-gender relationships, and egalitarian versus traditional scenarios across various topics. While all models exhibit the same biases (women favored, then those with gender-neutral names, and lastly men), safety guardrails reduce bias. In addition, models tend to circumvent traditional male dominance stereotypes and side with 'traditionally female' individuals more often, suggesting relationships are viewed as a female domain by the models.

* EMNLP Findings 2024

Via

Access Paper or Ask Questions

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Mar 17, 2024

Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang

Figure 1 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Figure 2 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Figure 3 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Figure 4 for Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Abstract:Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroLa), assesses the tangible noun inventory of T2I models by prompting them to generate pictures from a concept list translated to seven languages and comparing the output image populations. Unfortunately, we find that this benchmark contains translation errors of varying severity in Spanish, Japanese, and Chinese. We provide corrections for these errors and analyze how impactful they are on the utility and validity of CoCo-CroLa as a benchmark. We reassess multiple baseline T2I models with the revisions, compare the outputs elicited under the new translations to those conditioned on the old, and show that a correction's impactfulness on the image-domain benchmark results can be predicted in the text domain with similarity scores. Our findings will guide the future development of T2I multilinguality metrics by providing analytical tools for practical translation decisions.

* NAACL 2024 Main Conference

Via

Access Paper or Ask Questions

Evaluating Biases in Context-Dependent Health Questions

Mar 07, 2024

Sharon Levy, Tahilin Sanchez Karver, William D. Adler, Michelle R. Kaufman, Mark Dredze

Abstract:Chat-based large language models have the opportunity to empower individuals lacking high-quality healthcare access to receive personalized information across a variety of topics. However, users may ask underspecified questions that require additional context for a model to correctly answer. We study how large language model biases are exhibited through these contextual questions in the healthcare domain. To accomplish this, we curate a dataset of sexual and reproductive healthcare questions that are dependent on age, sex, and location attributes. We compare models' outputs with and without demographic context to determine group alignment among our contextual questions. Our experiments reveal biases in each of these attributes, where young adult female users are favored.

Via

Access Paper or Ask Questions

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

Oct 14, 2023

Alex Mei, Sharon Levy, William Yang Wang

Figure 1 for ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

Figure 2 for ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

Figure 3 for ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

Figure 4 for ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

Abstract:As large language models are integrated into society, robustness toward a suite of prompts is increasingly important to maintain reliability in a high-variance environment.Robustness evaluations must comprehensively encapsulate the various settings in which a user may invoke an intelligent system. This paper proposes ASSERT, Automated Safety Scenario Red Teaming, consisting of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. For robust safety evaluation, we apply these methods in the critical domain of AI safety to algorithmically generate a test suite of prompts covering diverse robustness settings -- semantic equivalence, related scenarios, and adversarial. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. Despite dedicated safeguards in existing state-of-the-art models, we find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings, raising concerns for users' physical safety.

* In Findings of the 2023 Conference on Empirical Methods in Natural Language Processing

Via

Access Paper or Ask Questions

Comparing Biases and the Impact of Multilingual Training across Multiple Languages

May 18, 2023

Sharon Levy, Neha Anna John, Ling Liu, Yogarshi Vyas, Jie Ma, Yoshinari Fujinuma, Miguel Ballesteros, Vittorio Castelli, Dan Roth

Figure 1 for Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Figure 2 for Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Figure 3 for Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Figure 4 for Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Abstract:Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases compare across languages and how the biases are affected when training a model on multilingual data versus monolingual data. We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task to observe whether specific demographics are viewed more positively. We study bias similarities and differences across these languages and investigate the impact of multilingual vs. monolingual training data. We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender. Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture (e.g. majority religions and nationalities). Additionally, we find an increased variation in predictions across protected groups, indicating bias amplification, after multilingual finetuning in comparison to multilingual pretraining.

Via

Access Paper or Ask Questions

Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI

Dec 19, 2022

Alex Mei, Sharon Levy, William Yang Wang

Abstract:Users' physical safety is an increasing concern as the market for intelligent systems continues to grow, where unconstrained systems may recommend users dangerous actions that can lead to serious injury. Covertly unsafe text, language that contains actionable physical harm, but requires further reasoning to identify such harm, is an area of particular interest, as such texts may arise from everyday scenarios and are challenging to detect as harmful. Qualifying the knowledge required to reason about the safety of various texts and providing human-interpretable rationales can shed light on the risk of systems to specific user groups, helping both stakeholders manage the risks of their systems and policymakers to provide concrete safeguards for consumer safety. We propose FARM, a novel framework that leverages external knowledge for trustworthy rationale generation in the context of safety. In particular, FARM foveates on missing knowledge in specific scenarios, retrieves this knowledge with attribution to trustworthy sources, and uses this to both classify the safety of the original text and generate human-interpretable rationales, combining critically important qualities for sensitive domains such as user safety. Furthermore, FARM obtains state-of-the-art results on the SafeText dataset, improving safety classification accuracy by 5.29 points.

* 9 pages, 3 figures, 5 tables

Via

Access Paper or Ask Questions

SafeText: A Benchmark for Exploring Physical Safety in Language Models

Oct 18, 2022

Sharon Levy, Emily Allaway, Melanie Subbiah, Lydia Chilton, Desmond Patton, Kathleen McKeown, William Yang Wang

Figure 1 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Figure 2 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Figure 3 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Figure 4 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Abstract:Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe. One such type of safety that has been scarcely studied is commonsense physical safety, i.e. text that is not explicitly violent and requires additional commonsense knowledge to comprehend that it leads to physical harm. We create the first benchmark dataset, SafeText, comprising real-life scenarios with paired safe and physically unsafe pieces of advice. We utilize SafeText to empirically study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks. We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice. As a result, we argue for further studies of safety and the assessment of commonsense physical safety in models before release.

* Accepted to EMNLP 2022

Via

Access Paper or Ask Questions

Mitigating Covertly Unsafe Text within Natural Language Systems

Oct 17, 2022

Alex Mei, Anisha Kabir, Sharon Levy, Melanie Subbiah, Emily Allaway, John Judge, Desmond Patton, Bruce Bimber, Kathleen McKeown, William Yang Wang

Figure 1 for Mitigating Covertly Unsafe Text within Natural Language Systems

Figure 2 for Mitigating Covertly Unsafe Text within Natural Language Systems

Figure 3 for Mitigating Covertly Unsafe Text within Natural Language Systems

Figure 4 for Mitigating Covertly Unsafe Text within Natural Language Systems

Abstract:An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences. However, the degree of explicitness of a generated statement that can cause physical harm varies. In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text. Then, we further break down this category with respect to the system's information and discuss solutions to mitigate the generation of text in each of these subcategories. Ultimately, our work defines the problem of covertly unsafe language that causes physical harm and argues that this subtle yet dangerous issue needs to be prioritized by stakeholders and regulators. We highlight mitigation strategies to inspire future researchers to tackle this challenging problem and help improve safety within smart systems.

* To Appear In Findings of the 2022 Conference on Empirical Methods in Natural Language Processing

Via

Access Paper or Ask Questions