Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Natalie Parde

Context-Aware Counterfactual Data Augmentation for Gender Bias Mitigation in Language Models

Feb 10, 2026

Shweta Parihar, Liu Guangliang, Natalie Parde, Lu Cheng

Abstract:A challenge in mitigating social bias in fine-tuned language models (LMs) is the potential reduction in language modeling capability, which can harm downstream performance. Counterfactual data augmentation (CDA), a widely used method for fine-tuning, highlights this issue by generating synthetic data that may align poorly with real-world distributions or creating overly simplistic counterfactuals that ignore the social context of altered sensitive attributes (e.g., gender) in the pretraining corpus. To address these limitations, we propose a simple yet effective context-augmented CDA method, Context-CDA, which uses large LMs to enhance the diversity and contextual relevance of the debiasing corpus. By minimizing discrepancies between the debiasing corpus and pretraining data through augmented context, this approach ensures better alignment, enhancing language modeling capability. We then employ uncertainty-based filtering to exclude generated counterfactuals considered low-quality by the target smaller LMs (i.e., LMs to be debiased), further improving the fine-tuning corpus quality. Experimental results on gender bias benchmarks demonstrate that Context-CDA effectively mitigates bias without sacrificing language modeling performance while offering insights into social biases by analyzing distribution shifts in next-token generation probabilities.

Via

Access Paper or Ask Questions

Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Jun 18, 2024

Ankit Aich, Avery Quynh, Pamela Osseyi, Amy Pinkham, Philip Harvey, Brenda Curtis, Colin Depp, Natalie Parde

Figure 1 for Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Figure 2 for Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Figure 3 for Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Figure 4 for Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Abstract:NLP in mental health has been primarily social media focused. Real world practitioners also have high case loads and often domain specific variables, of which modern LLMs lack context. We take a dataset made by recruiting 644 participants, including individuals diagnosed with Bipolar Disorder (BD), Schizophrenia (SZ), and Healthy Controls (HC). Participants undertook tasks derived from a standardized mental health instrument, and the resulting data were transcribed and annotated by experts across five clinical variables. This paper demonstrates the application of contemporary language models in sequence-to-sequence tasks to enhance mental health research. Specifically, we illustrate how these models can facilitate the deployment of mental health instruments, data collection, and data annotation with high accuracy and scalability. We show that small models are capable of annotation for domain-specific clinical variables, data collection for mental-health instruments, and perform better then commercial large models.

Via

Access Paper or Ask Questions

CORI: CJKV Benchmark with Romanization Integration -- A step towards Cross-lingual Transfer Beyond Textual Scripts

Apr 19, 2024

Hoang H. Nguyen, Chenwei Zhang, Ye Liu, Natalie Parde, Eugene Rohrbaugh, Philip S. Yu

Abstract:Naively assuming English as a source language may hinder cross-lingual transfer for many languages by failing to consider the importance of language contact. Some languages are more well-connected than others, and target languages can benefit from transferring from closely related languages; for many languages, the set of closely related languages does not include English. In this work, we study the impact of source language for cross-lingual transfer, demonstrating the importance of selecting source languages that have high contact with the target language. We also construct a novel benchmark dataset for close contact Chinese-Japanese-Korean-Vietnamese (CJKV) languages to further encourage in-depth studies of language contact. To comprehensively capture contact between these languages, we propose to integrate Romanized transcription beyond textual scripts via Contrastive Learning objectives, leading to enhanced cross-lingual representations and effective zero-shot cross-lingual transfer.

* Accepted at LREC-COLING 2024

Via

Access Paper or Ask Questions

Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Apr 01, 2024

Casey Kennington, Malihe Alikhani, Heather Pon-Barry, Katherine Atwell, Yonatan Bisk, Daniel Fried, Felix Gervits, Zhao Han, Mert Inan, Michael Johnston(+13 more)

Figure 1 for Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Figure 2 for Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Figure 3 for Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community

Abstract:The ability to interact with machines using natural human language is becoming not just commonplace, but expected. The next step is not just text interfaces, but speech interfaces and not just with computers, but with all machines including robots. In this paper, we chronicle the recent history of this growing field of spoken dialogue with robots and offer the community three proposals, the first focused on education, the second on benchmarks, and the third on the modeling of language when it comes to spoken interaction with robots. The three proposals should act as white papers for any researcher to take and build upon.

* NSF Report on the "Dialogue with Robots" Workshop held in Pittsburg, PA, April 2023

Via

Access Paper or Ask Questions

Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective

Jun 07, 2023

Mohammad Arvan, A. Seza Doğruöz, Natalie Parde

Figure 1 for Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective

Figure 2 for Investigating Reproducibility at Interspeech Conferences: A Longitudinal and Comparative Perspective

Abstract:Reproducibility is a key aspect for scientific advancement across disciplines, and reducing barriers for open science is a focus area for the theme of Interspeech 2023. Availability of source code is one of the indicators that facilitates reproducibility. However, less is known about the rates of reproducibility at Interspeech conferences in comparison to other conferences in the field. In order to fill this gap, we have surveyed 27,717 papers at seven conferences across speech and language processing disciplines. We find that despite having a close number of accepted papers to the other conferences, Interspeech has up to 40% less source code availability. In addition to reporting the difficulties we have encountered during our research, we also provide recommendations and possible directions to increase reproducibility for further studies.

Via

Access Paper or Ask Questions

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

May 02, 2023

Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Jackie Cheung, Mark Cieliebak, Elizabeth Clark, Kees van Deemter(+29 more)

Figure 1 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Figure 2 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Figure 3 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Figure 4 for Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Abstract:We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP.

* 5 pages plus appendix, 4 tables, 1 figure. To appear at "Workshop on Insights from Negative Results in NLP" (co-located with EACL2023)

Via

Access Paper or Ask Questions

Tracking Turbulence Through Financial News During COVID-19

Sep 09, 2021

Philip Hossu, Natalie Parde

Figure 1 for Tracking Turbulence Through Financial News During COVID-19

Figure 2 for Tracking Turbulence Through Financial News During COVID-19

Figure 3 for Tracking Turbulence Through Financial News During COVID-19

Figure 4 for Tracking Turbulence Through Financial News During COVID-19

Abstract:Grave human toll notwithstanding, the COVID-19 pandemic created uniquely unstable conditions in financial markets. In this work we uncover and discuss relationships involving sentiment in financial publications during the 2020 pandemic-motivated U.S. financial crash. First, we introduce a set of expert annotations of financial sentiment for articles from major American financial news publishers. After an exploratory data analysis, we then describe a CNN-based architecture to address the task of predicting financial sentiment in this anomalous, tumultuous setting. Our best performing model achieves a maximum weighted F1 score of 0.746, establishing a strong performance benchmark. Using predictions from our top performing model, we close by conducting a statistical correlation study with real stock market data, finding interesting and strong relationships between financial news and the S\&P 500 index, trading volume, market volatility, and different single-factor ETFs.

Via

Access Paper or Ask Questions

Latent Neural Differential Equations for Video Generation

Nov 07, 2020

Cade Gordon, Natalie Parde

Figure 1 for Latent Neural Differential Equations for Video Generation

Figure 2 for Latent Neural Differential Equations for Video Generation

Abstract:Generative Adversarial Networks have recently shown promise for video generation, building off of the success of image generation while also addressing a new challenge: time. Although time was analyzed in some early work, the literature has not adequately grown with temporal modeling developments. We propose studying the effects of Neural Differential Equations to model the temporal dynamics of video generation. The paradigm of Neural Differential Equations presents many theoretical strengths including the first continuous representation of time within video generation. In order to address the effects of Neural Differential Equations, we will investigate how changes in temporal models affect generated video quality.

Via

Access Paper or Ask Questions

Enriching Neural Models with Targeted Features for Dementia Detection

Jun 13, 2019

Flavio Di Palo, Natalie Parde

Figure 1 for Enriching Neural Models with Targeted Features for Dementia Detection

Figure 2 for Enriching Neural Models with Targeted Features for Dementia Detection

Figure 3 for Enriching Neural Models with Targeted Features for Dementia Detection

Figure 4 for Enriching Neural Models with Targeted Features for Dementia Detection

Abstract:Alzheimer's disease (AD) is an irreversible brain disease that can dramatically reduce quality of life, most commonly manifesting in older adults and eventually leading to the need for full-time care. Early detection is fundamental to slowing its progression; however, diagnosis can be expensive, time-consuming, and invasive. In this work we develop a neural model based on a CNN-LSTM architecture that learns to detect AD and related dementias using targeted and implicitly-learned features from conversational transcripts. Our approach establishes the new state of the art on the DementiaBank dataset, achieving an F1 score of 0.929 when classifying participants into AD and control groups.

* Accepted to the ACL 2019 Student Research Workshop (ACL SRW)

Via

Access Paper or Ask Questions

AI Meets Austen: Towards Human-Robot Discussions of Literary Metaphor

Apr 07, 2019

Natalie Parde, Rodney D. Nielsen

Figure 1 for AI Meets Austen: Towards Human-Robot Discussions of Literary Metaphor

Figure 2 for AI Meets Austen: Towards Human-Robot Discussions of Literary Metaphor

Figure 3 for AI Meets Austen: Towards Human-Robot Discussions of Literary Metaphor

Abstract:Artificial intelligence is revolutionizing formal education, fueled by innovations in learning assessment, content generation, and instructional delivery. Informal, lifelong learning settings have been the subject of less attention. We provide a proof-of-concept for an embodied book discussion companion, designed to stimulate conversations with readers about particularly creative metaphors in fiction literature. We collect ratings from 26 participants, each of whom discuss Jane Austen's "Pride and Prejudice" with the robot across one or more sessions, and find that participants rate their interactions highly. This suggests that companion robots could be an interesting entryway for the promotion of lifelong learning and cognitive exercise in future applications.

* Accepted to the 20th International Conference on Artificial Intelligence in Education (AIED 2019)

Via

Access Paper or Ask Questions