Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Manas Gaur

University of Maryland, Baltimore County

Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

Nov 23, 2024

Abhinav Joshi, Shaswati Saha, Divyaksh Shukla, Sriram Vema, Harsh Jhamtani, Manas Gaur, Ashutosh Modi

Figure 1 for Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

Figure 2 for Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

Figure 3 for Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

Figure 4 for Towards Robust Evaluation of Unlearning in LLMs via Data Transformations

Abstract:Large Language Models (LLMs) have shown to be a great success in a wide range of applications ranging from regular NLP-based use cases to AI agents. LLMs have been trained on a vast corpus of texts from various sources; despite the best efforts during the data pre-processing stage while training the LLMs, they may pick some undesirable information such as personally identifiable information (PII). Consequently, in recent times research in the area of Machine Unlearning (MUL) has become active, the main idea is to force LLMs to forget (unlearn) certain information (e.g., PII) without suffering from performance loss on regular tasks. In this work, we examine the robustness of the existing MUL techniques for their ability to enable leakage-proof forgetting in LLMs. In particular, we examine the effect of data transformation on forgetting, i.e., is an unlearned LLM able to recall forgotten information if there is a change in the format of the input? Our findings on the TOFU dataset highlight the necessity of using diverse data formats to quantify unlearning in LLMs more reliably.

* Accepted at EMNLP 2024 Findings; 21 pages (5 page main content + references + appendix)

Via

Access Paper or Ask Questions

A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19

Nov 11, 2024

Vedant Khandelwal, Manas Gaur, Ugur Kursuncu, Valerie Shalin, Amit Sheth

Figure 1 for A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19

Figure 2 for A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19

Figure 3 for A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19

Figure 4 for A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19

Abstract:Monitoring public sentiment via social media is potentially helpful during health crises such as the COVID-19 pandemic. However, traditional frequency-based, data-driven neural network-based approaches can miss newly relevant content due to the evolving nature of language in a dynamically evolving environment. Human-curated symbolic knowledge sources, such as lexicons for standard language and slang terms, can potentially elevate social media signals in evolving language. We introduce a neurosymbolic method that integrates neural networks with symbolic knowledge sources, enhancing the detection and interpretation of mental health-related tweets relevant to COVID-19. Our method was evaluated using a corpus of large datasets (approximately 12 billion tweets, 2.5 million subreddit data, and 700k news articles) and multiple knowledge graphs. This method dynamically adapts to evolving language, outperforming purely data-driven models with an F1 score exceeding 92\%. This approach also showed faster adaptation to new data and lower computational demands than fine-tuning pre-trained large language models (LLMs). This study demonstrates the benefit of neurosymbolic methods in interpreting text in a dynamic environment for tasks such as health surveillance.

* 13 Pages, 5 Figures, 5 Tables, 2024 IEEE International Conference on Big Data, Regular Paper

Via

Access Paper or Ask Questions

Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data

Aug 20, 2024

Atmika Gorti, Manas Gaur, Aman Chadha

Figure 1 for Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data

Figure 2 for Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data

Figure 3 for Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data

Figure 4 for Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data

Abstract:Large Language Models (LLMs) are prone to inheriting and amplifying societal biases embedded within their training data, potentially reinforcing harmful stereotypes related to gender, occupation, and other sensitive categories. This issue becomes particularly problematic as biased LLMs can have far-reaching consequences, leading to unfair practices and exacerbating social inequalities across various domains, such as recruitment, online content moderation, or even the criminal justice system. Although prior research has focused on detecting bias in LLMs using specialized datasets designed to highlight intrinsic biases, there has been a notable lack of investigation into how these findings correlate with authoritative datasets, such as those from the U.S. National Bureau of Labor Statistics (NBLS). To address this gap, we conduct empirical research that evaluates LLMs in a ``bias-out-of-the-box" setting, analyzing how the generated outputs compare with the distributions found in NBLS data. Furthermore, we propose a straightforward yet effective debiasing mechanism that directly incorporates NBLS instances to mitigate bias within LLMs. Our study spans seven different LLMs, including instructable, base, and mixture-of-expert models, and reveals significant levels of bias that are often overlooked by existing bias detection techniques. Importantly, our debiasing method, which does not rely on external datasets, demonstrates a substantial reduction in bias scores, highlighting the efficacy of our approach in creating fairer and more reliable LLMs.

* Accepted in AAAI Spring Symposium 2024

Via

Access Paper or Ask Questions

Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context

Jul 25, 2024

Nilanjana Das, Edward Raff, Manas Gaur

Figure 1 for Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context

Figure 2 for Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context

Figure 3 for Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context

Figure 4 for Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context

Abstract:Previous research on testing the vulnerabilities in Large Language Models (LLMs) using adversarial attacks has primarily focused on nonsensical prompt injections, which are easily detected upon manual or automated review (e.g., via byte entropy). However, the exploration of innocuous human-understandable malicious prompts augmented with adversarial injections remains limited. In this research, we explore converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing. This allows us to show suffix conversion without any gradients, using only LLMs to perform the attacks, and thus better understand the scope of possible risks. We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM. The situations are extracted from the IMDB dataset, and prompts are defined following a few-shot chain-of-thought prompting. Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs. We find that across many LLMs, as few as 1 attempt produces an attack and that these attacks transfer between LLMs.

Via

Access Paper or Ask Questions

IoT-Based Preventive Mental Health Using Knowledge Graphs and Standards for Better Well-Being

Jun 19, 2024

Amelie Gyrard, Seyedali Mohammadi, Manas Gaur, Antonio Kung

Figure 1 for IoT-Based Preventive Mental Health Using Knowledge Graphs and Standards for Better Well-Being

Figure 2 for IoT-Based Preventive Mental Health Using Knowledge Graphs and Standards for Better Well-Being

Figure 3 for IoT-Based Preventive Mental Health Using Knowledge Graphs and Standards for Better Well-Being

Figure 4 for IoT-Based Preventive Mental Health Using Knowledge Graphs and Standards for Better Well-Being

Abstract:Sustainable Development Goals (SDGs) give the UN a road map for development with Agenda 2030 as a target. SDG3 "Good Health and Well-Being" ensures healthy lives and promotes well-being for all ages. Digital technologies can support SDG3. Burnout and even depression could be reduced by encouraging better preventive health. Due to the lack of patient knowledge and focus to take care of their health, it is necessary to help patients before it is too late. New trends such as positive psychology and mindfulness are highly encouraged in the USA. Digital Twin (DT) can help with the continuous monitoring of emotion using physiological signals (e.g., collected via wearables). Digital twins facilitate monitoring and provide constant health insight to improve quality of life and well-being with better personalization. Healthcare DT challenges are standardizing data formats, communication protocols, and data exchange mechanisms. To achieve those data integration and knowledge challenges, we designed the Mental Health Knowledge Graph (ontology and dataset) to boost mental health. The Knowledge Graph (KG) acquires knowledge from ontology-based mental health projects classified within the LOV4IoT ontology catalog (Emotion, Depression, and Mental Health). Furthermore, the KG is mapped to standards (e.g., ontologies) when possible. Standards from ETSI SmartM2M, ITU/WHO, ISO, W3C, NIST, and IEEE are relevant to mental health.

* 20 pages

Via

Access Paper or Ask Questions

WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions

Jun 17, 2024

Seyedali Mohammadi, Edward Raff, Jinendra Malekar, Vedant Palit, Francis Ferraro, Manas Gaur

Figure 1 for WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions

Figure 2 for WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions

Figure 3 for WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions

Figure 4 for WellDunn: On the Robustness and Explainability of Language Models and Large Language Models in Identifying Wellness Dimensions

Abstract:Language Models (LMs) are being proposed for mental health applications where the heightened risk of adverse outcomes means predictive performance may not be a sufficient litmus test of a model's utility in clinical practice. A model that can be trusted for practice should have a correspondence between explanation and clinical determination, yet no prior research has examined the attention fidelity of these models and their effect on ground truth explanations. We introduce an evaluation design that focuses on the robustness and explainability of LMs in identifying Wellness Dimensions (WD). We focus on two mental health and well-being datasets: (a) Multi-label Classification-based MultiWD, and (b) WellXplain for evaluating attention mechanism veracity against expert-labeled explanations. The labels are based on Halbert Dunn's theory of wellness, which gives grounding to our evaluation. We reveal four surprising results about LMs/LLMs: (1) Despite their human-like capabilities, GPT-3.5/4 lag behind RoBERTa, and MedAlpaca, a fine-tuned LLM fails to deliver any remarkable improvements in performance or explanations. (2) Re-examining LMs' predictions based on a confidence-oriented loss function reveals a significant performance drop. (3) Across all LMs/LLMs, the alignment between attention and explanations remains low, with LLMs scoring a dismal 0.0. (4) Most mental health-specific LMs/LLMs overlook domain-specific knowledge and undervalue explanations, causing these discrepancies. This study highlights the need for further research into their consistency and explanations in mental health and well-being.

* 26 pages including reference section and appendix section, 8 figures, 16 tables

Via

Access Paper or Ask Questions

REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

May 03, 2024

Deepa Tilwani, Yash Saxena, Ali Mohammadi, Edward Raff, Amit Sheth, Srinivasan Parthasarathy, Manas Gaur

Figure 1 for REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

Figure 2 for REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

Figure 3 for REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

Figure 4 for REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs

Abstract:Automatic citation generation for sentences in a document or report is paramount for intelligence analysts, cybersecurity, news agencies, and education personnel. In this research, we investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries: (a) Direct Queries, LLMs are asked to provide author names of the given research article, and (b) Indirect Queries, LLMs are asked to provide the title of a mentioned article when given a sentence from a different article. To demonstrate where LLM stands in this task, we introduce a large dataset called REASONS comprising abstracts of the 12 most popular domains of scientific research on arXiv. From around 20K research articles, we make the following deductions on public and proprietary LLMs: (a) State-of-the-art, often called anthropomorphic GPT-4 and GPT-3.5, suffers from high pass percentage (PP) to minimize the hallucination rate (HR). When tested with Perplexity.ai (7B), they unexpectedly made more errors; (b) Augmenting relevant metadata lowered the PP and gave the lowest HR; (c) Advance retrieval-augmented generation (RAG) using Mistral demonstrates consistent and robust citation support on indirect queries and matched performance to GPT-3.5 and GPT-4. The HR across all domains and models decreased by an average of 41.93% and the PP was reduced to 0% in most cases. In terms of generation quality, the average F1 Score and BLEU were 68.09% and 57.51%, respectively; (d) Testing with adversarial samples showed that LLMs, including the Advance RAG Mistral, struggle to understand context, but the extent of this issue was small in Mistral and GPT-4-Preview. Our study con tributes valuable insights into the reliability of RAG for automated citation generation tasks.

* Submitted to ACL ARR April 2024

Via

Access Paper or Ask Questions

COBIAS: Contextual Reliability in Bias Assessment

Feb 22, 2024

Priyanshul Govil, Vamshi Krishna Bonagiri, Manas Gaur, Ponnurangam Kumaraguru, Sanorita Dey

Figure 1 for COBIAS: Contextual Reliability in Bias Assessment

Figure 2 for COBIAS: Contextual Reliability in Bias Assessment

Figure 3 for COBIAS: Contextual Reliability in Bias Assessment

Figure 4 for COBIAS: Contextual Reliability in Bias Assessment

Abstract:Large Language Models (LLMs) are trained on inherently biased data. Previous works on debiasing models rely on benchmark datasets to measure model performance. However, these datasets suffer from several pitfalls due to the extremely subjective understanding of bias, highlighting a critical need for contextual exploration. We propose understanding the context of user inputs with consideration of the diverse situations in which input statements are possible. This approach would allow for frameworks that foster bias awareness rather than guardrails that hurt user engagement. Our contribution is twofold: (i) we create a dataset of 2287 stereotyped statements augmented with points for adding context; (ii) we develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS) to assess statements' contextual reliability in measuring bias. Our metric is a significant predictor of the contextual reliability of bias-benchmark datasets ($\chi^2=71.02, p<2.2 \cdot 10^{-16})$. COBIAS can be used to create reliable datasets, resulting in an improvement in bias mitigation works.

Via

Access Paper or Ask Questions

SaGE: Evaluating Moral Consistency in Large Language Models

Feb 21, 2024

Vamshi Krishna Bonagiri, Sreeram Vennam, Priyanshul Govil, Ponnurangam Kumaraguru, Manas Gaur

Figure 1 for SaGE: Evaluating Moral Consistency in Large Language Models

Figure 2 for SaGE: Evaluating Moral Consistency in Large Language Models

Figure 3 for SaGE: Evaluating Moral Consistency in Large Language Models

Figure 4 for SaGE: Evaluating Moral Consistency in Large Language Models

Abstract:Despite recent advancements showcasing the impressive capabilities of Large Language Models (LLMs) in conversational systems, we show that even state-of-the-art LLMs are morally inconsistent in their generations, questioning their reliability (and trustworthiness in general). Prior works in LLM evaluation focus on developing ground-truth data to measure accuracy on specific tasks. However, for moral scenarios that often lack universally agreed-upon answers, consistency in model responses becomes crucial for their reliability. To address this issue, we propose an information-theoretic measure called Semantic Graph Entropy (SaGE), grounded in the concept of "Rules of Thumb" (RoTs) to measure a model's moral consistency. RoTs are abstract principles learned by a model and can help explain their decision-making strategies effectively. To this extent, we construct the Moral Consistency Corpus (MCC), containing 50K moral questions, responses to them by LLMs, and the RoTs that these models followed. Furthermore, to illustrate the generalizability of SaGE, we use it to investigate LLM consistency on two popular datasets -- TruthfulQA and HellaSwag. Our results reveal that task-accuracy and consistency are independent problems, and there is a dire need to investigate these issues further.

Via

Access Paper or Ask Questions

Measuring Moral Inconsistencies in Large Language Models

Jan 26, 2024

Vamshi Krishna Bonagiri, Sreeram Vennam, Manas Gaur, Ponnurangam Kumaraguru

Figure 1 for Measuring Moral Inconsistencies in Large Language Models

Figure 2 for Measuring Moral Inconsistencies in Large Language Models

Abstract:A Large Language Model~(LLM) is considered consistent if semantically equivalent prompts produce semantically equivalent responses. Despite recent advancements showcasing the impressive capabilities of LLMs in conversational systems, we show that even state-of-the-art LLMs are highly inconsistent in their generations, questioning their reliability. Prior research has tried to measure this with task-specific accuracies. However, this approach is unsuitable for moral scenarios, such as the trolley problem, with no ``correct'' answer. To address this issue, we propose a novel information-theoretic measure called Semantic Graph Entropy~(SGE) to measure the consistency of an LLM in moral scenarios. We leverage ``Rules of Thumb''~(RoTs) to explain a model's decision-making strategies and further enhance our metric. Compared to existing consistency metrics, SGE correlates better with human judgments across five LLMs. In the future, we aim to investigate the root causes of LLM inconsistencies and propose improvements.

* BBNLP 2023

Via

Access Paper or Ask Questions