Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sarah Masud

Scenario-based Probing and Steering Cultural Values in Large Language Models--Extended Version

Jun 09, 2026

Trung Duc Anh Dang, Tung Kieu, Sarah Masud

Abstract:Large Language Models (LLMs) are deployed across cultural contexts but often reflect homogenized values inherited from training data. Evaluations of cultural alignment typically rely on direct prompting with survey-style questions, which frequently elicit neutral or safety-aligned responses and fail to capture underlying model preferences. We propose a framework for probing and steering latent cultural representations in LLMs along the two Inglehart--Welzel axes of the World Values Survey (WVS). By translating social value questions into scenario-based behavioral dilemmas, we extract token-level probabilities to measure implicit values and apply activation steering, optionally combined with country-conditioned prompting, to shift model behavior without retraining. Across three open-source LLMs and four target cultures, we find substantial variation in steerability and identify latent entanglement, where interventions along one cultural dimension induce shifts along another. This coupling mirrors correlations in human WVS data and persists across activation, prompt, and hybrid steering. It constrains axis-independent alignment, though general task performance is largely preserved.

* 18 pages

Via

Access Paper or Ask Questions

Not What, But How: A Framework for Auditing LLM Responses across Positioning, Generalization, Anthromorphism, and Maxims

Jun 03, 2026

Siddhesh Milind Pawar, Sarah Masud, Haneul Yoo, Alice Oh, Isabelle Augenstein

Abstract:Large language models (LLMs) are being increasingly used to answer subjective, information-seeking questions, where users are sensitive to how responses are communicated, not just whether the answers are correct. Existing LLM evaluations for subjective cultural queries largely focus on factual correctness, ignoring how the response is framed. To this end, we introduce FRANZ, an automated FRAmework for respoNse characteriZation to conduct communicative audit of LLM responses along four dimensions: cultural positioning, use of generalizing language, anthropomorphic cues, and adherence to conversational maxims. To enable this evaluation, we contribute SQUARE - a corpus of 376k subjective questions sourced from 57 subreddits, and mapped to 7 countries and 19 question categories. We demonstrate FRANZ's applicability by scoring responses from three open-weight LLMs. We observe that LLMs show statistically significant differences in the frequency with which they employ each response characteristic. Unlike single-dimensional audits, FRANZ reveals that insider positioning and anthropomorphism are positively coupled, with the degree of coupling varying by country, providing a diagnostic lens for identifying framing divergences.

* 34 pages, 19 Figures, 4 Tables

Via

Access Paper or Ask Questions

Not What, But How: A Communicative Audit of LLM Response Framing

Jun 01, 2026

Siddhesh Milind Pawar, Sarah Masud, Haneul Yoo, Alice Oh, Isabelle Augenstein

* 34 main pages, 19 Figures, 4 Tables

Via

Access Paper or Ask Questions

Cultural Value Alignment Via Latent Activation Steering in Large Language Models

May 25, 2026

Trung Duc Anh Dang, Sarah Masud

Abstract:Large Language Models (LLMs) often exhibit homogenized cultural perspectives. While the World Values Survey (WVS) provides a gold standard for mapping human values, traditional direct prompting of LLMs on WVS often fails to access the model's latent cultural depth, leading to safety-aligned refusals or neutral responses. Here, we propose a generalizable framework for cultural evaluation and intervention that transitions from abstract queries to scenario-based behavioral probing. By extracting implicit token probabilities across 300 situational dilemmas, we bypass surface-level alignment to map the latent coordinates of LLMs cultural value. We further introduce activation steering to shift these internal alignments during the forward pass without retraining. Across multiple LLMs, we find substantial variation in adaptability and uncover a consistent phenomenon of latent entanglement, where interventions along one cultural dimension induce shifts along another. These results suggest that cultural values are encoded as coupled structures, limiting precise alignment. This work establishes a computationally efficient framework for cultural steering, highlighting the structural complexities when navigating global value with LLMs.

* ACL 2026 Student Research Workshop (Non-Archival Track)

Via

Access Paper or Ask Questions

Listening Between the Lines: Decoding Podcast Narratives with Language Modeling

Nov 07, 2025

Shreya Gupta, Ojasva Saxena, Arghodeep Nandi, Sarah Masud, Kiran Garimella, Tanmoy Chakraborty

Figure 1 for Listening Between the Lines: Decoding Podcast Narratives with Language Modeling

Figure 2 for Listening Between the Lines: Decoding Podcast Narratives with Language Modeling

Figure 3 for Listening Between the Lines: Decoding Podcast Narratives with Language Modeling

Figure 4 for Listening Between the Lines: Decoding Podcast Narratives with Language Modeling

Abstract:Podcasts have become a central arena for shaping public opinion, making them a vital source for understanding contemporary discourse. Their typically unscripted, multi-themed, and conversational style offers a rich but complex form of data. To analyze how podcasts persuade and inform, we must examine their narrative structures -- specifically, the narrative frames they employ. The fluid and conversational nature of podcasts presents a significant challenge for automated analysis. We show that existing large language models, typically trained on more structured text such as news articles, struggle to capture the subtle cues that human listeners rely on to identify narrative frames. As a result, current approaches fall short of accurately analyzing podcast narratives at scale. To solve this, we develop and evaluate a fine-tuned BERT model that explicitly links narrative frames to specific entities mentioned in the conversation, effectively grounding the abstract frame in concrete details. Our approach then uses these granular frame labels and correlates them with high-level topics to reveal broader discourse trends. The primary contributions of this paper are: (i) a novel frame-labeling methodology that more closely aligns with human judgment for messy, conversational data, and (ii) a new analysis that uncovers the systematic relationship between what is being discussed (the topic) and how it is being presented (the frame), offering a more robust framework for studying influence in digital media.

* 10 pages, 6 Figures, 5 Tables. Under review at IEEE TCSS

Via

Access Paper or Ask Questions

QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Dec 16, 2024

Mohammad Aflah Khan, Neemesh Yadav, Sarah Masud, Md. Shad Akhtar

Figure 1 for QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Figure 2 for QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Figure 3 for QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Figure 4 for QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Abstract:The rise of large language models (LLMs) has created a need for advanced benchmarking systems beyond traditional setups. To this end, we introduce QUENCH, a novel text-based English Quizzing Benchmark manually curated and transcribed from YouTube quiz videos. QUENCH possesses masked entities and rationales for the LLMs to predict via generation. At the intersection of geographical context and common sense reasoning, QUENCH helps assess world knowledge and deduction capabilities of LLMs via a zero-shot, open-domain quizzing setup. We perform an extensive evaluation on 7 LLMs and 4 metrics, investigating the influence of model size, prompting style, geographical context, and gold-labeled rationale generation. The benchmarking concludes with an error analysis to which the LLMs are prone.

* 17 Pages, 6 Figures, 8 Tables, COLING 2025

Via

Access Paper or Ask Questions

Information Anxiety in Large Language Models

Nov 16, 2024

Prasoon Bajpai, Sarah Masud, Tanmoy Chakraborty

Figure 1 for Information Anxiety in Large Language Models

Figure 2 for Information Anxiety in Large Language Models

Figure 3 for Information Anxiety in Large Language Models

Figure 4 for Information Anxiety in Large Language Models

Abstract:Large Language Models (LLMs) have demonstrated strong performance as knowledge repositories, enabling models to understand user queries and generate accurate and context-aware responses. Extensive evaluation setups have corroborated the positive correlation between the retrieval capability of LLMs and the frequency of entities in their pretraining corpus. We take the investigation further by conducting a comprehensive analysis of the internal reasoning and retrieval mechanisms of LLMs. Our work focuses on three critical dimensions - the impact of entity popularity, the models' sensitivity to lexical variations in query formulation, and the progression of hidden state representations across LLM layers. Our preliminary findings reveal that popular questions facilitate early convergence of internal states toward the correct answer. However, as the popularity of a query increases, retrieved attributes across lexical variations become increasingly dissimilar and less accurate. Interestingly, we find that LLMs struggle to disentangle facts, grounded in distinct relations, from their parametric memory when dealing with highly popular subjects. Through a case study, we explore these latent strains within LLMs when processing highly popular queries, a phenomenon we term information anxiety. The emergence of information anxiety in LLMs underscores the adversarial injection in the form of linguistic variations and calls for a more holistic evaluation of frequently occurring entities.

Via

Access Paper or Ask Questions

Hate Personified: Investigating the role of LLMs in content moderation

Oct 03, 2024

Sarah Masud, Sahajpreet Singh, Viktor Hangya, Alexander Fraser, Tanmoy Chakraborty

Figure 1 for Hate Personified: Investigating the role of LLMs in content moderation

Figure 2 for Hate Personified: Investigating the role of LLMs in content moderation

Figure 3 for Hate Personified: Investigating the role of LLMs in content moderation

Figure 4 for Hate Personified: Investigating the role of LLMs in content moderation

Abstract:For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear. By including additional context in prompts, we comprehensively analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected. Our findings on two LLMs, five languages, and six datasets reveal that mimicking persona-based attributes leads to annotation variability. Meanwhile, incorporating geographical signals leads to better regional alignment. We also find that the LLMs are sensitive to numerical anchors, indicating the ability to leverage community-based flagging efforts and exposure to adversaries. Our work provides preliminary guidelines and highlights the nuances of applying LLMs in culturally sensitive cases.

* 17 pages, 6 Figures, 13 Tables, EMNLP'24 Mains

Via

Access Paper or Ask Questions

Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Jun 06, 2024

Neemesh Yadav, Sarah Masud, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty

Figure 1 for Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Figure 2 for Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Figure 3 for Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Figure 4 for Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech

Abstract:Employing language models to generate explanations for an incoming implicit hate post is an active area of research. The explanation is intended to make explicit the underlying stereotype and aid content moderators. The training often combines top-k relevant knowledge graph (KG) tuples to provide world knowledge and improve performance on standard metrics. Interestingly, our study presents conflicting evidence for the role of the quality of KG tuples in generating implicit explanations. Consequently, simpler models incorporating external toxicity signals outperform KG-infused models. Compared to the KG-based setup, we observe a comparable performance for SBIC (LatentHatred) datasets with a performance variation of +0.44 (+0.49), +1.83 (-1.56), and -4.59 (+0.77) in BLEU, ROUGE-L, and BERTScore. Further human evaluation and error analysis reveal that our proposed setup produces more precise explanations than zero-shot GPT-3.5, highlighting the intricate nature of the task.

* 17 Pages, 5 Figures, 13 Tables, ACL Findings 2024

Via

Access Paper or Ask Questions

Probing Critical Learning Dynamics of PLMs for Hate Speech Detection

Feb 03, 2024

Sarah Masud, Mohammad Aflah Khan, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty

Abstract:Despite the widespread adoption, there is a lack of research into how various critical aspects of pretrained language models (PLMs) affect their performance in hate speech detection. Through five research questions, our findings and recommendations lay the groundwork for empirically investigating different aspects of PLMs' use in hate speech detection. We deep dive into comparing different pretrained models, evaluating their seed robustness, finetuning settings, and the impact of pretraining data collection time. Our analysis reveals early peaks for downstream tasks during pretraining, the limited benefit of employing a more recent pretraining corpus, and the significance of specific layers during finetuning. We further call into question the use of domain-specific models and highlight the need for dynamic datasets for benchmarking hate speech detection.

* 20 pages, 9 figures, 14 tables. Accepted at EACL'24

Via

Access Paper or Ask Questions