Recent developments in natural language processing have demonstrated the potential of large language models (LLMs) to improve a range of educational and learning outcomes. Of recent chatbots based on LLMs, ChatGPT and Bard have made it clear that artificial intelligence (AI) technology will have significant implications on the way we obtain and search for information. However, these tools sometimes produce text that is convincing, but often incorrect, known as hallucinations. As such, their use can distort scientific facts and spread misinformation. To counter polarizing responses on these tools, it is critical to provide an overview of such responses so stakeholders can determine which topics tend to produce more contentious responses -- key to developing targeted regulatory policy and interventions. In addition, there currently exists no annotated dataset of ChatGPT and Bard responses around possibly polarizing topics, central to the above aims. We address the indicated issues through the following contribution: Focusing on highly polarizing topics in the US, we created and described a dataset of ChatGPT and Bard responses. Broadly, our results indicated a left-leaning bias for both ChatGPT and Bard, with Bard more likely to provide responses around polarizing topics. Bard seemed to have fewer guardrails around controversial topics, and appeared more willing to provide comprehensive, and somewhat human-like responses. Bard may thus be more likely abused by malicious actors. Stakeholders may utilize our findings to mitigate misinformative and/or polarizing responses from LLMs
Explainable AI (XAI) systems are sociotechnical in nature; thus, they are subject to the sociotechnical gap--divide between the technical affordances and the social needs. However, charting this gap is challenging. In the context of XAI, we argue that charting the gap improves our problem understanding, which can reflexively provide actionable insights to improve explainability. Utilizing two case studies in distinct domains, we empirically derive a framework that facilitates systematic charting of the sociotechnical gap by connecting AI guidelines in the context of XAI and elucidating how to use them to address the gap. We apply the framework to a third case in a new domain, showcasing its affordances. Finally, we discuss conceptual implications of the framework, share practical considerations in its operationalization, and offer guidance on transferring it to new contexts. By making conceptual and practical contributions to understanding the sociotechnical gap in XAI, the framework expands the XAI design space.
Advances in Natural Language Processing (NLP) have revolutionized the way researchers and practitioners address crucial societal problems. Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks. However, the development of advanced computational techniques and resources is disproportionately focused on the English language, sidelining a majority of the languages spoken globally. While existing research has developed better multilingual and monolingual language models to bridge this language disparity between English and non-English languages, we explore the promise of incorporating the information contained in images via multimodal machine learning. Our comparative analyses on three detection tasks focusing on crisis information, fake news, and emotion recognition, as well as five high-resource non-English languages, demonstrate that: (a) detection frameworks based on pre-trained large language models like BERT and multilingual-BERT systematically perform better on the English language compared against non-English languages, and (b) including images via multimodal learning bridges this performance gap. We situate our findings with respect to existing work on the pitfalls of large language models, and discuss their theoretical and practical implications. Resources for this paper are available at https://multimodality-language-disparity.github.io/.
Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics. Despite much attention being paid to characterize and detect discriminatory speech, most work has focused on explicit or overt hate speech, failing to address a more pervasive form based on coded or indirect language. To fill this gap, this work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message and its implication. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech, and we discuss key features that challenge existing models. This dataset will continue to serve as a useful benchmark for understanding this multifaceted issue.
Assessment of job performance, personalized health and psychometric measures are domains where data-driven and ubiquitous computing exhibits the potential of a profound impact in the future. Existing techniques use data extracted from questionnaires, sensors (wearable, computer, etc.), or other traits, to assess well-being and cognitive attributes of individuals. However, these techniques can neither predict individual's well-being and psychological traits in a global manner nor consider the challenges associated to processing the data available, that is incomplete and noisy. In this paper, we create a benchmark for predictive analysis of individuals from a perspective that integrates: physical and physiological behavior, psychological states and traits, and job performance. We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests. The study included 757 participants who were knowledge workers in organizations across the USA with varied work roles. We developed a data mining framework to extract the meaningful predictors for each of the 19 variables under consideration. Our model is the first benchmark that combines these various instrument-derived variables in a single framework to understand people's behavior by leveraging real uncurated data from wearable, mobile, and social media sources. We verify our approach experimentally using the data obtained from our longitudinal study. The results show that our framework is consistently reliable and capable of predicting the variables under study better than the baselines when prediction is restricted to the noisy, incomplete data.
Distinctive linguistic practices help communities build solidarity and differentiate themselves from outsiders. In an online community, one such practice is variation in orthography, which includes spelling, punctuation, and capitalization. Using a dataset of over two million Instagram posts, we investigate orthographic variation in a community that shares pro-eating disorder (pro-ED) content. We find that not only does orthographic variation grow more frequent over time, it also becomes more profound or deep, with variants becoming increasingly distant from the original: as, for example, #anarexyia is more distant than #anarexia from the original spelling #anorexia. These changes are driven by newcomers, who adopt the most extreme linguistic practices as they enter the community. Moreover, this behavior correlates with engagement: the newcomers who adopt deeper orthographic variants tend to remain active for longer in the community, and the posts that contain deeper variation receive more positive feedback in the form of "likes." Previous work has linked community membership change with language change, and our work casts this connection in a new light, with newcomers driving an evolving practice, rather than adapting to it. We also demonstrate the utility of orthographic variation as a new lens to study sociolinguistic change in online communities, particularly when the change results from an exogenous force such as a content ban.