Alert button
Picture for Srijan Kumar

Srijan Kumar

Alert button

Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries

Oct 23, 2023
Yiqiao Jin, Mohit Chandra, Gaurav Verma, Yibo Hu, Munmun De Choudhury, Srijan Kumar

Large language models (LLMs) are transforming the ways the general public accesses and consumes information. Their influence is particularly pronounced in pivotal sectors like healthcare, where lay individuals are increasingly appropriating LLMs as conversational agents for everyday queries. While LLMs demonstrate impressive language understanding and generation proficiencies, concerns regarding their safety remain paramount in these high-stake domains. Moreover, the development of LLMs is disproportionately focused on English. It remains unclear how these LLMs perform in the context of non-English languages, a gap that is critical for ensuring equity in the real-world use of these systems.This paper provides a framework to investigate the effectiveness of LLMs as multi-lingual dialogue systems for healthcare queries. Our empirically-derived framework XlingEval focuses on three fundamental criteria for evaluating LLM responses to naturalistic human-authored health-related questions: correctness, consistency, and verifiability. Through extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets, and through an amalgamation of algorithmic and human-evaluation strategies, we found a pronounced disparity in LLM responses across these languages, indicating a need for enhanced cross-lingual capabilities. We further propose XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. Our findings underscore the pressing need to bolster the cross-lingual capacities of these models, and to provide an equitable information ecosystem accessible to all.

* 18 pages, 7 figures 
Viaarxiv icon

Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes

Sep 12, 2023
Shreyash Mishra, S Suryavardan, Megha Chakraborty, Parth Patwa, Anku Rani, Aman Chadha, Aishwarya Reganti, Amitava Das, Amit Sheth, Manoj Chinnakotla, Asif Ekbal, Srijan Kumar

Figure 1 for Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes
Figure 2 for Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes
Figure 3 for Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes
Figure 4 for Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes

Analyzing memes on the internet has emerged as a crucial endeavor due to the impact this multi-modal form of content wields in shaping online discourse. Memes have become a powerful tool for expressing emotions and sentiments, possibly even spreading hate and misinformation, through humor and sarcasm. In this paper, we present the overview of the Memotion 3 shared task, as part of the DeFactify 2 workshop at AAAI-23. The task released an annotated dataset of Hindi-English code-mixed memes based on their Sentiment (Task A), Emotion (Task B), and Emotion intensity (Task C). Each of these is defined as an individual task and the participants are ranked separately for each task. Over 50 teams registered for the shared task and 5 made final submissions to the test set of the Memotion 3 dataset. CLIP, BERT modifications, ViT etc. were the most popular models among the participants along with approaches such as Student-Teacher model, Fusion, and Ensembling. The best final F1 score for Task A is 34.41, Task B is 79.77 and Task C is 59.82.

* Defactify2 @AAAI 2023 
Viaarxiv icon

Findings of Factify 2: Multimodal Fake News Detection

Jul 19, 2023
S Suryavardan, Shreyash Mishra, Megha Chakraborty, Parth Patwa, Anku Rani, Aman Chadha, Aishwarya Reganti, Amitava Das, Amit Sheth, Manoj Chinnakotla, Asif Ekbal, Srijan Kumar

Figure 1 for Findings of Factify 2: Multimodal Fake News Detection
Figure 2 for Findings of Factify 2: Multimodal Fake News Detection
Figure 3 for Findings of Factify 2: Multimodal Fake News Detection
Figure 4 for Findings of Factify 2: Multimodal Fake News Detection

With social media usage growing exponentially in the past few years, fake news has also become extremely prevalent. The detrimental impact of fake news emphasizes the need for research focused on automating the detection of false information and verifying its accuracy. In this work, we present the outcome of the Factify 2 shared task, which provides a multi-modal fact verification and satire news dataset, as part of the DeFactify 2 workshop at AAAI'23. The data calls for a comparison based approach to the task by pairing social media claims with supporting documents, with both text and image, divided into 5 classes based on multi-modal relations. In the second iteration of this task we had over 60 participants and 9 final test-set submissions. The best performances came from the use of DeBERTa for text and Swinv2 and CLIP for image. The highest F1 score averaged for all five classes was 81.82%.

* Defactify2 @AAAI 2023 
Viaarxiv icon

Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding

Jun 21, 2023
Venkata Prabhakara Sarath Nookala, Gaurav Verma, Subhabrata Mukherjee, Srijan Kumar

Figure 1 for Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding
Figure 2 for Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding
Figure 3 for Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding
Figure 4 for Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding

State-of-the-art few-shot learning (FSL) methods leverage prompt-based fine-tuning to obtain remarkable results for natural language understanding (NLU) tasks. While much of the prior FSL methods focus on improving downstream task performance, there is a limited understanding of the adversarial robustness of such methods. In this work, we conduct an extensive study of several state-of-the-art FSL methods to assess their robustness to adversarial perturbations. To better understand the impact of various factors towards robustness (or the lack of it), we evaluate prompt-based FSL methods against fully fine-tuned models for aspects such as the use of unlabeled data, multiple prompts, number of few-shot examples, model size and type. Our results on six GLUE tasks indicate that compared to fully fine-tuned models, vanilla FSL methods lead to a notable relative drop in task performance (i.e., are less robust) in the face of adversarial perturbations. However, using (i) unlabeled data for prompt-based FSL and (ii) multiple prompts flip the trend. We further demonstrate that increasing the number of few-shot examples and model size lead to increased adversarial robustness of vanilla FSL methods. Broadly, our work sheds light on the adversarial robustness evaluation of prompt-based FSL methods for NLU tasks.

* Accepted full paper at Findings of ACL 2023; Code available at https://github.com/claws-lab/few-shot-adversarial-robustness 
Viaarxiv icon

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning

Jun 19, 2023
Shivaen Ramshetty, Gaurav Verma, Srijan Kumar

Figure 1 for Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
Figure 2 for Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
Figure 3 for Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
Figure 4 for Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning

The robustness of multimodal deep learning models to realistic changes in the input text is critical for their applicability to important tasks such as text-to-image retrieval and cross-modal entailment. To measure robustness, several existing approaches edit the text data, but do so without leveraging the cross-modal information present in multimodal data. Information from the visual modality, such as color, size, and shape, provide additional attributes that users can include in their inputs. Thus, we propose cross-modal attribute insertions as a realistic perturbation strategy for vision-and-language data that inserts visual attributes of the objects in the image into the corresponding text (e.g., "girl on a chair" to "little girl on a wooden chair"). Our proposed approach for cross-modal attribute insertions is modular, controllable, and task-agnostic. We find that augmenting input text using cross-modal insertions causes state-of-the-art approaches for text-to-image retrieval and cross-modal entailment to perform poorly, resulting in relative drops of 15% in MRR and 20% in $F_1$ score, respectively. Crowd-sourced annotations demonstrate that cross-modal insertions lead to higher quality augmentations for multimodal data than augmentations using text-only data, and are equivalent in quality to original examples. We release the code to encourage robustness evaluations of deep vision-and-language models: https://github.com/claws-lab/multimodal-robustness-xmai.

* Accepted full paper at ACL 2023; 15 pages, 7 figures 
Viaarxiv icon

Factify 2: A Multimodal Fake News and Satire News Dataset

Apr 08, 2023
S Suryavardan, Shreyash Mishra, Parth Patwa, Megha Chakraborty, Anku Rani, Aishwarya Reganti, Aman Chadha, Amitava Das, Amit Sheth, Manoj Chinnakotla, Asif Ekbal, Srijan Kumar

Figure 1 for Factify 2: A Multimodal Fake News and Satire News Dataset
Figure 2 for Factify 2: A Multimodal Fake News and Satire News Dataset
Figure 3 for Factify 2: A Multimodal Fake News and Satire News Dataset
Figure 4 for Factify 2: A Multimodal Fake News and Satire News Dataset

The internet gives the world an open platform to express their views and share their stories. While this is very valuable, it makes fake news one of our society's most pressing problems. Manual fact checking process is time consuming, which makes it challenging to disprove misleading assertions before they cause significant harm. This is he driving interest in automatic fact or claim verification. Some of the existing datasets aim to support development of automating fact-checking techniques, however, most of them are text based. Multi-modal fact verification has received relatively scant attention. In this paper, we provide a multi-modal fact-checking dataset called FACTIFY 2, improving Factify 1 by using new data sources and adding satire articles. Factify 2 has 50,000 new data instances. Similar to FACTIFY 1.0, we have three broad categories - support, no-evidence, and refute, with sub-categories based on the entailment of visual and textual data. We also provide a BERT and Vison Transformer based baseline, which acheives 65% F1 score in the test set. The baseline codes and the dataset will be made available at https://github.com/surya1701/Factify-2.0.

* Defactify@AAAI2023 
Viaarxiv icon

Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes

Mar 23, 2023
Shreyash Mishra, S Suryavardan, Parth Patwa, Megha Chakraborty, Anku Rani, Aishwarya Reganti, Aman Chadha, Amitava Das, Amit Sheth, Manoj Chinnakotla, Asif Ekbal, Srijan Kumar

Figure 1 for Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes
Figure 2 for Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes
Figure 3 for Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes
Figure 4 for Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes

Memes are the new-age conveyance mechanism for humor on social media sites. Memes often include an image and some text. Memes can be used to promote disinformation or hatred, thus it is crucial to investigate in details. We introduce Memotion 3, a new dataset with 10,000 annotated memes. Unlike other prevalent datasets in the domain, including prior iterations of Memotion, Memotion 3 introduces Hindi-English Codemixed memes while prior works in the area were limited to only the English memes. We describe the Memotion task, the data collection and the dataset creation methodologies. We also provide a baseline for the task. The baseline code and dataset will be made available at https://github.com/Shreyashm16/Memotion-3.0

* Defactify2 @AAAI 
Viaarxiv icon

Memotion 3: Dataset on sentiment and emotion analysis of codemixed Hindi-English Memes

Mar 17, 2023
Shreyash Mishra, S Suryavardan, Parth Patwa, Megha Chakraborty, Anku Rani, Aishwarya Reganti, Aman Chadha, Amitava Das, Amit Sheth, Manoj Chinnakotla, Asif Ekbal, Srijan Kumar

Figure 1 for Memotion 3: Dataset on sentiment and emotion analysis of codemixed Hindi-English Memes
Figure 2 for Memotion 3: Dataset on sentiment and emotion analysis of codemixed Hindi-English Memes
Figure 3 for Memotion 3: Dataset on sentiment and emotion analysis of codemixed Hindi-English Memes
Figure 4 for Memotion 3: Dataset on sentiment and emotion analysis of codemixed Hindi-English Memes

Memes are the new-age conveyance mechanism for humor on social media sites. Memes often include an image and some text. Memes can be used to promote disinformation or hatred, thus it is crucial to investigate in details. We introduce Memotion 3, a new dataset with 10,000 annotated memes. Unlike other prevalent datasets in the domain, including prior iterations of Memotion, Memotion 3 introduces Hindi-English Codemixed memes while prior works in the area were limited to only the English memes. We describe the Memotion task, the data collection and the dataset creation methodologies. We also provide a baseline for the task. The baseline code and dataset will be made available at https://github.com/Shreyashm16/Memotion-3.0

* Defactify2 @AAAI 
Viaarxiv icon

Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine Misinformation

Mar 11, 2023
Bing He, Mustaque Ahamad, Srijan Kumar

Figure 1 for Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine Misinformation
Figure 2 for Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine Misinformation
Figure 3 for Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine Misinformation
Figure 4 for Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine Misinformation

The spread of online misinformation threatens public health, democracy, and the broader society. While professional fact-checkers form the first line of defense by fact-checking popular false claims, they do not engage directly in conversations with misinformation spreaders. On the other hand, non-expert ordinary users act as eyes-on-the-ground who proactively counter misinformation -- recent research has shown that 96% counter-misinformation responses are made by ordinary users. However, research also found that 2/3 times, these responses are rude and lack evidence. This work seeks to create a counter-misinformation response generation model to empower users to effectively correct misinformation. This objective is challenging due to the absence of datasets containing ground-truth of ideal counter-misinformation responses, and the lack of models that can generate responses backed by communication theories. In this work, we create two novel datasets of misinformation and counter-misinformation response pairs from in-the-wild social media and crowdsourcing from college-educated students. We annotate the collected data to distinguish poor from ideal responses that are factual, polite, and refute misinformation. We propose MisinfoCorrect, a reinforcement learning-based framework that learns to generate counter-misinformation responses for an input misinformation post. The model rewards the generator to increase the politeness, factuality, and refutation attitude while retaining text fluency and relevancy. Quantitative and qualitative evaluation shows that our model outperforms several baselines by generating high-quality counter-responses. This work illustrates the promise of generative text models for social good -- here, to help create a safe and reliable information ecosystem. The code and data is accessible on https://github.com/claws-lab/MisinfoCorrect.

* Published at: ACM Web Conference 2023 (WWW'23). Code and data at: https://github.com/claws-lab/MisinfoCorrect 
Viaarxiv icon

A Survey of Graph Neural Networks for Social Recommender Systems

Dec 12, 2022
Kartik Sharma, Yeon-Chang Lee, Sivagami Nambi, Aditya Salian, Shlok Shah, Sang-Wook Kim, Srijan Kumar

Figure 1 for A Survey of Graph Neural Networks for Social Recommender Systems
Figure 2 for A Survey of Graph Neural Networks for Social Recommender Systems
Figure 3 for A Survey of Graph Neural Networks for Social Recommender Systems
Figure 4 for A Survey of Graph Neural Networks for Social Recommender Systems

Social recommender systems (SocialRS) simultaneously leverage user-to-item interactions as well as user-to-user social relations for the task of generating item recommendations to users. Additionally exploiting social relations is clearly effective in understanding users' tastes due to the effects of homophily and social influence. For this reason, SocialRS has increasingly attracted attention. In particular, with the advance of Graph Neural Networks (GNN), many GNN-based SocialRS methods have been developed recently. Therefore, we conduct a comprehensive and systematic review of the literature on GNN-based SocialRS. In this survey, we first identify 80 papers on GNN-based SocialRS after annotating 2151 papers by following the PRISMA framework (Preferred Reporting Items for Systematic Reviews and Meta-Analysis). Then, we comprehensively review them in terms of their inputs and architectures to propose a novel taxonomy: (1) input taxonomy includes 5 groups of input type notations and 7 groups of input representation notations; (2) architecture taxonomy includes 8 groups of GNN encoder, 2 groups of decoder, and 12 groups of loss function notations. We classify the GNN-based SocialRS methods into several categories as per the taxonomy and describe their details. Furthermore, we summarize the benchmark datasets and metrics widely used to evaluate the GNN-based SocialRS methods. Finally, we conclude this survey by presenting some future research directions.

* GitHub repository with the curated list of papers: https://github.com/claws-lab/awesome-GNN-social-recsys 
Viaarxiv icon