Alert button
Picture for Sharon Levy

Sharon Levy

Alert button

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

Oct 14, 2023
Alex Mei, Sharon Levy, William Yang Wang

Figure 1 for ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Figure 2 for ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Figure 3 for ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models
Figure 4 for ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

As large language models are integrated into society, robustness toward a suite of prompts is increasingly important to maintain reliability in a high-variance environment.Robustness evaluations must comprehensively encapsulate the various settings in which a user may invoke an intelligent system. This paper proposes ASSERT, Automated Safety Scenario Red Teaming, consisting of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. For robust safety evaluation, we apply these methods in the critical domain of AI safety to algorithmically generate a test suite of prompts covering diverse robustness settings -- semantic equivalence, related scenarios, and adversarial. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. Despite dedicated safeguards in existing state-of-the-art models, we find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings, raising concerns for users' physical safety.

* In Findings of the 2023 Conference on Empirical Methods in Natural Language Processing 
Viaarxiv icon

Comparing Biases and the Impact of Multilingual Training across Multiple Languages

May 18, 2023
Sharon Levy, Neha Anna John, Ling Liu, Yogarshi Vyas, Jie Ma, Yoshinari Fujinuma, Miguel Ballesteros, Vittorio Castelli, Dan Roth

Figure 1 for Comparing Biases and the Impact of Multilingual Training across Multiple Languages
Figure 2 for Comparing Biases and the Impact of Multilingual Training across Multiple Languages
Figure 3 for Comparing Biases and the Impact of Multilingual Training across Multiple Languages
Figure 4 for Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases compare across languages and how the biases are affected when training a model on multilingual data versus monolingual data. We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task to observe whether specific demographics are viewed more positively. We study bias similarities and differences across these languages and investigate the impact of multilingual vs. monolingual training data. We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender. Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture (e.g. majority religions and nationalities). Additionally, we find an increased variation in predictions across protected groups, indicating bias amplification, after multilingual finetuning in comparison to multilingual pretraining.

Viaarxiv icon

Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI

Dec 19, 2022
Alex Mei, Sharon Levy, William Yang Wang

Figure 1 for Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI
Figure 2 for Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI
Figure 3 for Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI
Figure 4 for Foveate, Attribute, and Rationalize: Towards Safe and Trustworthy AI

Users' physical safety is an increasing concern as the market for intelligent systems continues to grow, where unconstrained systems may recommend users dangerous actions that can lead to serious injury. Covertly unsafe text, language that contains actionable physical harm, but requires further reasoning to identify such harm, is an area of particular interest, as such texts may arise from everyday scenarios and are challenging to detect as harmful. Qualifying the knowledge required to reason about the safety of various texts and providing human-interpretable rationales can shed light on the risk of systems to specific user groups, helping both stakeholders manage the risks of their systems and policymakers to provide concrete safeguards for consumer safety. We propose FARM, a novel framework that leverages external knowledge for trustworthy rationale generation in the context of safety. In particular, FARM foveates on missing knowledge in specific scenarios, retrieves this knowledge with attribution to trustworthy sources, and uses this to both classify the safety of the original text and generate human-interpretable rationales, combining critically important qualities for sensitive domains such as user safety. Furthermore, FARM obtains state-of-the-art results on the SafeText dataset, improving safety classification accuracy by 5.29 points.

* 9 pages, 3 figures, 5 tables 
Viaarxiv icon

SafeText: A Benchmark for Exploring Physical Safety in Language Models

Oct 18, 2022
Sharon Levy, Emily Allaway, Melanie Subbiah, Lydia Chilton, Desmond Patton, Kathleen McKeown, William Yang Wang

Figure 1 for SafeText: A Benchmark for Exploring Physical Safety in Language Models
Figure 2 for SafeText: A Benchmark for Exploring Physical Safety in Language Models
Figure 3 for SafeText: A Benchmark for Exploring Physical Safety in Language Models
Figure 4 for SafeText: A Benchmark for Exploring Physical Safety in Language Models

Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe. One such type of safety that has been scarcely studied is commonsense physical safety, i.e. text that is not explicitly violent and requires additional commonsense knowledge to comprehend that it leads to physical harm. We create the first benchmark dataset, SafeText, comprising real-life scenarios with paired safe and physically unsafe pieces of advice. We utilize SafeText to empirically study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks. We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice. As a result, we argue for further studies of safety and the assessment of commonsense physical safety in models before release.

* Accepted to EMNLP 2022 
Viaarxiv icon

Mitigating Covertly Unsafe Text within Natural Language Systems

Oct 17, 2022
Alex Mei, Anisha Kabir, Sharon Levy, Melanie Subbiah, Emily Allaway, John Judge, Desmond Patton, Bruce Bimber, Kathleen McKeown, William Yang Wang

Figure 1 for Mitigating Covertly Unsafe Text within Natural Language Systems
Figure 2 for Mitigating Covertly Unsafe Text within Natural Language Systems
Figure 3 for Mitigating Covertly Unsafe Text within Natural Language Systems
Figure 4 for Mitigating Covertly Unsafe Text within Natural Language Systems

An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences. However, the degree of explicitness of a generated statement that can cause physical harm varies. In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text. Then, we further break down this category with respect to the system's information and discuss solutions to mitigate the generation of text in each of these subcategories. Ultimately, our work defines the problem of covertly unsafe language that causes physical harm and argues that this subtle yet dangerous issue needs to be prioritized by stakeholders and regulators. We highlight mitigation strategies to inspire future researchers to tackle this challenging problem and help improve safety within smart systems.

* To Appear In Findings of the 2022 Conference on Empirical Methods in Natural Language Processing 
Viaarxiv icon

Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation

May 19, 2022
Samhita Honnavalli, Aesha Parekh, Lily Ou, Sophie Groenwold, Sharon Levy, Vicente Ordonez, William Yang Wang

Figure 1 for Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation
Figure 2 for Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation
Figure 3 for Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation
Figure 4 for Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation

Women are often perceived as junior to their male counterparts, even within the same job titles. While there has been significant progress in the evaluation of gender bias in natural language processing (NLP), existing studies seldom investigate how biases toward gender groups change when compounded with other societal biases. In this work, we investigate how seniority impacts the degree of gender bias exhibited in pretrained neural generation models by introducing a novel framework for probing compound bias. We contribute a benchmark robustness-testing dataset spanning two domains, U.S. senatorship and professorship, created using a distant-supervision method. Our dataset includes human-written text with underlying ground truth and paired counterfactuals. We then examine GPT-2 perplexity and the frequency of gendered language in generated text. Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains. These results suggest that NLP applications built using GPT-2 may harm women in professional capacities.

* 6 pages, LREC 2022 
Viaarxiv icon

HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data

Apr 28, 2022
Kai Nakamura, Sharon Levy, Yi-Lin Tuan, Wenhu Chen, William Yang Wang

Figure 1 for HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data
Figure 2 for HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data
Figure 3 for HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data
Figure 4 for HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data

A pressing challenge in current dialogue systems is to successfully converse with users on topics with information distributed across different modalities. Previous work in multiturn dialogue systems has primarily focused on either text or table information. In more realistic scenarios, having a joint understanding of both is critical as knowledge is typically distributed over both unstructured and structured forms. We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables. The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions. We propose retrieval, system state tracking, and dialogue response generation tasks for our dataset and conduct baseline experiments for each. Our results show that there is still ample opportunity for improvement, demonstrating the importance of building stronger dialogue systems that can reason over the complex setting of information-seeking dialogue grounded on tables and text.

* Findings of ACL 2022 
Viaarxiv icon

Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains

Jan 26, 2022
Alon Albalak, Sharon Levy, William Yang Wang

Figure 1 for Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains
Figure 2 for Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains
Figure 3 for Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains
Figure 4 for Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains

Open-retrieval question answering systems are generally trained and tested on large datasets in well-established domains. However, low-resource settings such as new and emerging domains would especially benefit from reliable question answering systems. Furthermore, multilingual and cross-lingual resources in emergent domains are scarce, leading to few or no such systems. In this paper, we demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19. Our system adopts a corpus of scientific articles to ensure that retrieved documents are reliable. To address the scarcity of cross-lingual training data in emergent domains, we present a method utilizing automatic translation, alignment, and filtering to produce English-to-all datasets. We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting. We illustrate the capabilities of our system with examples and release all code necessary to train and deploy such a system.

* 6 pages, 8 figures 
Viaarxiv icon

Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Oct 13, 2021
Sharon Levy, Kevin Mo, Wenhan Xiong, William Yang Wang

Figure 1 for Open-Domain Question-Answering for COVID-19 and Other Emergent Domains
Figure 2 for Open-Domain Question-Answering for COVID-19 and Other Emergent Domains
Figure 3 for Open-Domain Question-Answering for COVID-19 and Other Emergent Domains
Figure 4 for Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

Since late 2019, COVID-19 has quickly emerged as the newest biomedical domain, resulting in a surge of new information. As with other emergent domains, the discussion surrounding the topic has been rapidly changing, leading to the spread of misinformation. This has created the need for a public space for users to ask questions and receive credible, scientific answers. To fulfill this need, we turn to the task of open-domain question-answering, which we can use to efficiently find answers to free-text questions from a large set of documents. In this work, we present such a system for the emergent domain of COVID-19. Despite the small data size available, we are able to successfully train the system to retrieve answers from a large-scale corpus of published COVID-19 scientific papers. Furthermore, we incorporate effective re-ranking and question-answering techniques, such as document diversity and multiple answer spans. Our open-domain question-answering system can further act as a model for the quick development of similar systems that can be adapted and modified for other developing emergent domains.

* EMNLP 2021 Demo 
Viaarxiv icon

Dimensions of Transparency in NLP Applications

Jan 02, 2021
Michael Saxon, Sharon Levy, Xinyi Wang, Alon Albalak, William Yang Wang

Figure 1 for Dimensions of Transparency in NLP Applications
Figure 2 for Dimensions of Transparency in NLP Applications
Figure 3 for Dimensions of Transparency in NLP Applications
Figure 4 for Dimensions of Transparency in NLP Applications

Broader transparency in descriptions of and communication regarding AI systems is widely considered desirable. This is particularly the case in discussions of fairness and accountability in systems exposed to the general public. However, previous work has suggested that a trade-off exists between greater system transparency and user confusion, where `too much information' clouds a reader's understanding of what a system description means. Unfortunately, transparency is a nebulous concept, difficult to both define and quantify. In this work we address these two issues by proposing a framework for quantifying transparency in system descriptions and apply it to analyze the trade-off between transparency and end-user confusion using NLP conference abstracts.

* 9 pages, 8 figures, 7 tables 
Viaarxiv icon