Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joan Nwatu

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

Jun 30, 2026

Naihao Deng, Yilun Zhu, Joan Nwatu, Clayton Scott, Rada Mihalcea

Abstract:Warning: This paper contains several toxic and offensive statements. While reasoning generally improves fairness in recent large language models (LLMs), failures persist. In this work, we identify a failure mode, deductive stereotyping, in which models apply population-level statistical regularities to individual cases, producing logically coherent yet socially biased inferences. We provide a statistical interpretation of this phenomenon. To steer models toward fairness-aware reasoning, we propose a reasoning-time injection framework. We further introduce Fair-GCG to systematically discover effective injection phrases. Injection phrases discovered by Fair-GCG improve performance across multiple fairness benchmarks, generalize from smaller to larger LLMs, improves reasoning-level fairness, reduces bias in open-ended generation, and transfer to real-world fairness-sensitive tasks.

Via

Access Paper or Ask Questions

The Language-Energy Divide: Measuring Energy Costs of Multilingual LLM Inference

Jun 20, 2026

Naihao Deng, Alissa Shen, Yiming Feng, Joan Nwatu, Jae-Won Chung, Mosharaf Chowdhury, Yulong Chen, Rada Mihalcea

Abstract:Large language models (LLMs) are increasingly deployed in multilingual settings, yet the energy costs of serving these models across different languages remain poorly understood. We present a systematic study of inference energy consumption across languages with ML.Energy framework (Chung et al., 2026). We find striking disparities: energy consumption per output token varies by up to 8.3 times across languages, while total energy for a fixed set of requests varies by up to 179 times between the cheapest (English, 17.6 kJ) and the most expensive (Pashto, 3,147 kJ) languages. Our analysis shows that this disparity is driven by two compounding factors: (1) higher per-token energy costs for languages using complex or rare scripts, and (2) more tokens generated for low-resource languages. Moreover, we find a double cost + performance penalty: languages with the highest energy footprints also tend to achieve the lowest task accuracy. We reveal that the energy divide persists across models, hardware, and tasks, suggesting a systemic energy inequity in multilingual LLM deployment. Finally, we recommend that the community treat energy as a first-class evaluation axis, extend reporting checklists and model cards to include it, and adopt deployment-side mitigations for better energy efficiency.

Via

Access Paper or Ask Questions

Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models

Jul 02, 2024

Joan Nwatu, Oana Ignat, Rada Mihalcea

Figure 1 for Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models

Figure 2 for Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models

Figure 3 for Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models

Figure 4 for Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models

Abstract:To address this issue, we formulate translated non-English, geographic, and socioeconomic integrated prompts and evaluate their impact on VL model performance for data from different countries and income groups. Our findings show that geographic and socioeconomic integrated prompts improve VL performance on lower-income data and favor the retrieval of topic appearances commonly found in data from low-income households. From our analyses, we identify and highlight contexts where these strategies yield the most improvements. Our model analysis code is publicly available at https://github.com/Anniejoan/Uplifting-Lower-income-data .

Via

Access Paper or Ask Questions

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Jun 10, 2024

David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja(+65 more)

Figure 1 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 2 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 3 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Figure 4 for CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Abstract:Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.

Via

Access Paper or Ask Questions

Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Mar 12, 2024

Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea

Figure 1 for Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Figure 2 for Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Figure 3 for Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Figure 4 for Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

Abstract:Current foundation models have shown impressive performance across various tasks. However, several studies have revealed that these models are not effective for everyone due to the imbalanced geographical and economic representation of the data used in the training process. Most of this data comes from Western countries, leading to poor results for underrepresented countries. To address this issue, more data needs to be collected from these countries, but the cost of annotation can be a significant bottleneck. In this paper, we propose methods to identify the data to be annotated to balance model performance and annotation costs. Our approach first involves finding the countries with images of topics (objects and actions) most visually distinct from those already in the training datasets used by current large vision-language foundation models. Next, we identify countries with higher visual similarity for these topics and show that using data from these countries to supplement the training data improves model performance and reduces annotation costs. The resulting lists of countries and corresponding topics are made available at https://github.com/MichiganNLP/visual_diversity_budget.

* accepted at COLING 2024

Via

Access Paper or Ask Questions

Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models

Nov 09, 2023

Joan Nwatu, Oana Ignat, Rada Mihalcea

Abstract:Despite the impressive performance of current AI models reported across various tasks, performance reports often do not include evaluations of how these models perform on the specific groups that will be impacted by these technologies. Among the minority groups under-represented in AI, data from low-income households are often overlooked in data collection and model evaluation. We evaluate the performance of a state-of-the-art vision-language model (CLIP) on a geo-diverse dataset containing household images associated with different income values (Dollar Street) and show that performance inequality exists among households of different income levels. Our results indicate that performance for the poorer groups is consistently lower than the wealthier groups across various topics and countries. We highlight insights that can help mitigate these issues and propose actionable steps for economic-level inclusive AI development. Code is available at https://github.com/MichiganNLP/Bridging_the_Digital_Divide.

* EMNLP 2023

Via

Access Paper or Ask Questions

A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models

May 21, 2023

Oana Ignat, Zhijing Jin, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi(+12 more)

Abstract:Recent progress in large language models has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has in turn made many NLP researchers -- especially those at the beginning of their career -- wonder about what NLP research area they should focus on. This document is a compilation of NLP research directions that are rich for exploration, reflecting the views of a diverse group of PhD students in an academic research lab. While we identify many research areas, many others exist; we do not cover those areas that are currently addressed by LLMs but where LLMs lag behind in performance, or those focused on LLM development. We welcome suggestions for other research directions to include: https://bit.ly/nlp-era-llm

Via

Access Paper or Ask Questions