Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Jurgens

Shammie

Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic

May 01, 2024

Xi Chen, Scott A. Hale, David Jurgens, Mattia Samory, Ethan Zuckerman, Przemyslaw A. Grabowicz

Figure 1 for Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic

Figure 2 for Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic

Figure 3 for Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic

Figure 4 for Global News Synchrony and Diversity During the Start of the COVID-19 Pandemic

Abstract:News coverage profoundly affects how countries and individuals behave in international relations. Yet, we have little empirical evidence of how news coverage varies across countries. To enable studies of global news coverage, we develop an efficient computational methodology that comprises three components: (i) a transformer model to estimate multilingual news similarity; (ii) a global event identification system that clusters news based on a similarity network of news articles; and (iii) measures of news synchrony across countries and news diversity within a country, based on country-specific distributions of news coverage of the global events. Each component achieves state-of-the art performance, scaling seamlessly to massive datasets of millions of news articles. We apply the methodology to 60 million news articles published globally between January 1 and June 30, 2020, across 124 countries and 10 languages, detecting 4357 news events. We identify the factors explaining diversity and synchrony of news coverage across countries. Our study reveals that news media tend to cover a more diverse set of events in countries with larger Internet penetration, more official languages, larger religious diversity, higher economic inequality, and larger populations. Coverage of news events is more synchronized between countries that not only actively participate in commercial and political relations -- such as, pairs of countries with high bilateral trade volume, and countries that belong to the NATO military alliance or BRICS group of major emerging economies -- but also countries that share certain traits: an official language, high GDP, and high democracy indices.

Via

Access Paper or Ask Questions

When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

Dec 04, 2023

Benjamin Litterer, David Jurgens, Dallas Card

Figure 1 for When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

Figure 2 for When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

Figure 3 for When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

Figure 4 for When it Rains, it Pours: Modeling Media Storms and the News Ecosystem

Abstract:Most events in the world receive at most brief coverage by the news media. Occasionally, however, an event will trigger a media storm, with voluminous and widespread coverage lasting for weeks instead of days. In this work, we develop and apply a pairwise article similarity model, allowing us to identify story clusters in corpora covering local and national online news, and thereby create a comprehensive corpus of media storms over a nearly two year period. Using this corpus, we investigate media storms at a new level of granularity, allowing us to validate claims about storm evolution and topical distribution, and provide empirical support for previously hypothesized patterns of influence of storms on media coverage and intermedia agenda setting.

* Findings of EMNLP 2023; 16 pages; 12 figures; 4 tables

Via

Access Paper or Ask Questions

Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

Nov 16, 2023

Mingqian Zheng, Jiaxin Pei, David Jurgens

Figure 1 for Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

Figure 2 for Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

Figure 3 for Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

Figure 4 for Is "A Helpful Assistant" the Best Role for Large Language Models? A Systematic Evaluation of Social Roles in System Prompts

Abstract:Prompting serves as the major way humans interact with Large Language Models (LLM). Commercial AI systems commonly define the role of the LLM in system prompts. For example, ChatGPT uses "You are a helpful assistant" as part of the default system prompt. But is "a helpful assistant" the best role for LLMs? In this study, we present a systematic evaluation of how social roles in system prompts affect model performance. We curate a list of 162 roles covering 6 types of interpersonal relationships and 8 types of occupations. Through extensive analysis of 3 popular LLMs and 2457 questions, we show that adding interpersonal roles in prompts consistently improves the models' performance over a range of questions. Moreover, while we find that using gender-neutral roles and specifying the role as the audience leads to better performances, predicting which role leads to the best performance remains a challenging task, and that frequency, similarity, and perplexity do not fully explain the effect of social roles on model performances. Our results can help inform the design of system prompts for AI systems. Code and data are available at https://github.com/Jiaxin-Pei/Prompting-with-Social-Roles.

Via

Access Paper or Ask Questions

You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments

Nov 16, 2023

Bangzhao Shu, Lechen Zhang, Minje Choi, Lavinia Dunagan, Dallas Card, David Jurgens

Abstract:The versatility of Large Language Models (LLMs) on natural language understanding tasks has made them popular for research in social sciences. In particular, to properly understand the properties and innate personas of LLMs, researchers have performed studies that involve using prompts in the form of questions that ask LLMs of particular opinions. In this study, we take a cautionary step back and examine whether the current format of prompting enables LLMs to provide responses in a consistent and robust manner. We first construct a dataset that contains 693 questions encompassing 39 different instruments of persona measurement on 115 persona axes. Additionally, we design a set of prompts containing minor variations and examine LLM's capabilities to generate accurate answers, as well as consistency variations to examine their consistency towards simple perturbations such as switching the option order. Our experiments on 15 different open-source LLMs reveal that even simple perturbations are sufficient to significantly downgrade a model's question-answering ability, and that most LLMs have low negation consistency. Our results suggest that the currently widespread practice of prompting is insufficient to accurately capture model perceptions, and we discuss potential alternatives to improve such issues.

* 15 pages, 5 figures, 5 tables. First two authors contributed equally

Via

Access Paper or Ask Questions

Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Nov 16, 2023

Huaman Sun, Jiaxin Pei, Minje Choi, David Jurgens

Figure 1 for Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Figure 2 for Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Figure 3 for Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Figure 4 for Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Abstract:Human perception of language depends on personal backgrounds like gender and ethnicity. While existing studies have shown that large language models (LLMs) hold values that are closer to certain societal groups, it is unclear whether their prediction behaviors on subjective NLP tasks also exhibit a similar bias. In this study, leveraging the POPQUORN dataset which contains annotations of diverse demographic backgrounds, we conduct a series of experiments on four popular LLMs to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness. We find that for both tasks, model predictions are closer to the labels from White and female participants. We further explore prompting with the target demographic labels and show that including the target demographic in the prompt actually worsens the model's performance. More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups. Our results suggest that LLMs hold gender and racial biases for subjective NLP tasks and that demographic-infused prompts alone may be insufficient to mitigate such effects. Code and data are available at https://github.com/Jiaxin-Pei/LLM-Group-Bias.

Via

Access Paper or Ask Questions

Social Meme-ing: Measuring Linguistic Variation in Memes

Nov 15, 2023

Naitian Zhou, David Jurgens, David Bamman

Abstract:Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their multimodal structure in doing so. We apply this method to a large collection of meme images from Reddit and make available the resulting \textsc{SemanticMemes} dataset of 3.8M images clustered by their semantic function. We use these clusters to analyze linguistic variation in memes, discovering not only that socially meaningful variation in meme usage exists between subreddits, but that patterns of meme innovation and acculturation within these communities align with previous findings on written language.

Via

Access Paper or Ask Questions

RCT Rejection Sampling for Causal Estimation Evaluation

Jul 27, 2023

Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya

Figure 1 for RCT Rejection Sampling for Causal Estimation Evaluation

Figure 2 for RCT Rejection Sampling for Causal Estimation Evaluation

Figure 3 for RCT Rejection Sampling for Causal Estimation Evaluation

Figure 4 for RCT Rejection Sampling for Causal Estimation Evaluation

Abstract:Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment methods has been challenging and limited. In this work, we build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data: subsampling randomized controlled trials (RCTs) to create confounded observational datasets while using the average causal effects from the RCTs as ground-truth. We contribute a new sampling algorithm, which we call RCT rejection sampling, and provide theoretical guarantees that causal identification holds in the observational data to allow for valid comparisons to the ground-truth RCT. Using synthetic data, we show our algorithm indeed results in low bias when oracle estimators are evaluated on the confounded samples, which is not always the case for a previously proposed algorithm. In addition to this identification result, we highlight several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT -- which we release publicly -- consisting of approximately 70k observations and text data as high-dimensional covariates. Together, these contributions build towards a broader agenda of improved empirical evaluation for causal estimation.

* Code and data at https://github.com/kakeith/rct_rejection_sampling

Via

Access Paper or Ask Questions

Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

Jul 06, 2023

David Jurgens, Agrima Seth, Jackson Sargent, Athena Aghighi, Michael Geraci

Figure 1 for Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

Figure 2 for Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

Figure 3 for Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

Figure 4 for Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships

Abstract:Understanding interpersonal communication requires, in part, understanding the social context and norms in which a message is said. However, current methods for identifying offensive content in such communication largely operate independent of context, with only a few approaches considering community norms or prior conversation as context. Here, we introduce a new approach to identifying inappropriate communication by explicitly modeling the social relationship between the individuals. We introduce a new dataset of contextually-situated judgments of appropriateness and show that large language models can readily incorporate relationship information to accurately identify appropriateness in a given context. Using data from online conversations and movie dialogues, we provide insight into how the relationships themselves function as implicit norms and quantify the degree to which context-sensitivity is needed in different conversation settings. Further, we also demonstrate that contextual-appropriateness judgments are predictive of other social factors expressed in language such as condescension and politeness.

* ACL 2023, 18 pages, 8 figures, 11 tables

Via

Access Paper or Ask Questions

Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics

Jul 06, 2023

Aparna Ananthasubramaniam, Hong Chen, Jason Yan, Kenan Alkiek, Jiaxin Pei, Agrima Seth, Lavinia Dunagan, Minje Choi, Benjamin Litterer, David Jurgens

Figure 1 for Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics

Figure 2 for Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics

Figure 3 for Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics

Figure 4 for Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics

Abstract:Linguistic style matching (LSM) in conversations can be reflective of several aspects of social influence such as power or persuasion. However, how LSM relates to the outcomes of online communication on platforms such as Reddit is an unknown question. In this study, we analyze a large corpus of two-party conversation threads in Reddit where we identify all occurrences of LSM using two types of style: the use of function words and formality. Using this framework, we examine how levels of LSM differ in conversations depending on several social factors within Reddit: post and subreddit features, conversation depth, user tenure, and the controversiality of a comment. Finally, we measure the change of LSM following loss of status after community banning. Our findings reveal the interplay of LSM in Reddit conversations with several community metrics, suggesting the importance of understanding conversation engagement when understanding community dynamics.

* Equal contributions from authors 1-9 (AA, HC, JY, KA, JP, AS, LD, MC, BL)

Via

Access Paper or Ask Questions

When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset

Jun 12, 2023

Jiaxin Pei, David Jurgens

Figure 1 for When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset

Figure 2 for When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset

Figure 3 for When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset

Figure 4 for When Do Annotator Demographics Matter? Measuring the Influence of Annotator Demographics with the POPQUORN Dataset

Abstract:Annotators are not fungible. Their demographics, life experiences, and backgrounds all contribute to how they label data. However, NLP has only recently considered how annotator identity might influence their decisions. Here, we present POPQUORN (the POtato-Prolific dataset for QUestion-Answering, Offensiveness, text Rewriting, and politeness rating with demographic Nuance). POPQUORN contains 45,000 annotations from 1,484 annotators, drawn from a representative sample regarding sex, age, and race as the US population. Through a series of analyses, we show that annotators' background plays a significant role in their judgments. Further, our work shows that backgrounds not previously considered in NLP (e.g., education), are meaningful and should be considered. Our study suggests that understanding the background of annotators and collecting labels from a demographically balanced pool of crowd workers is important to reduce the bias of datasets. The dataset, annotator background, and annotation interface are available at https://github.com/Jiaxin-Pei/potato-prolific-dataset .

Via

Access Paper or Ask Questions