Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Animesh Mukherjee

A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Jul 20, 2023

Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee

Figure 1 for A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Figure 2 for A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Figure 3 for A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Figure 4 for A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

Abstract:Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them.

Via

Access Paper or Ask Questions

Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

May 23, 2023

Mithun Das, Saurabh Kumar Pandey, Animesh Mukherjee

Figure 1 for Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Figure 2 for Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Figure 3 for Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Figure 4 for Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection

Abstract:Hate speech is a severe issue that affects many online platforms. So far, several studies have been performed to develop robust hate speech detection systems. Large language models like ChatGPT have recently shown a great promise in performing several tasks, including hate speech detection. However, it is crucial to comprehend the limitations of these models to build robust hate speech detection systems. To bridge this gap, our study aims to evaluate the strengths and weaknesses of the ChatGPT model in detecting hate speech at a granular level across 11 languages. Our evaluation employs a series of functionality tests that reveals various intricate failures of the model which the aggregate metrics like macro F1 or accuracy are not able to unfold. In addition, we investigate the influence of complex emotions, such as the use of emojis in hate speech, on the performance of the ChatGPT model. Our analysis highlights the shortcomings of the generative models in detecting certain types of hate speech and highlighting the need for further research and improvements in the workings of these models.

Via

Access Paper or Ask Questions

HateMM: A Multi-Modal Dataset for Hate Video Classification

May 06, 2023

Mithun Das, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta, Animesh Mukherjee

Figure 1 for HateMM: A Multi-Modal Dataset for Hate Video Classification

Figure 2 for HateMM: A Multi-Modal Dataset for Hate Video Classification

Figure 3 for HateMM: A Multi-Modal Dataset for Hate Video Classification

Figure 4 for HateMM: A Multi-Modal Dataset for Hate Video Classification

Abstract:Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world. Due to this, hate speech research has recently gained a lot of traction. However, most of the work has primarily focused on text media with relatively little work on images and even lesser on videos. Thus, early stage automated video moderation techniques are needed to handle the videos that are being uploaded to keep the platform safe and healthy. With a view to detect and remove hateful content from the video sharing platforms, our work focuses on hate video detection using multi-modalities. To this end, we curate ~43 hours of videos from BitChute and manually annotate them as hate or non-hate, along with the frame spans which could explain the labelling decision. To collect the relevant videos we harnessed search keywords from hate lexicons. We observe various cues in images and audio of hateful videos. Further, we build deep learning multi-modal models to classify the hate videos and observe that using all the modalities of the videos improves the overall hate speech detection performance (accuracy=0.798, macro F1-score=0.790) by ~5.7% compared to the best uni-modal model in terms of macro F1 score. In summary, our work takes the first step toward understanding and modeling hateful videos on video hosting platforms such as BitChute.

* Accepted at ICWSM 2023(dataset track)

Via

Access Paper or Ask Questions

On the rise of fear speech in online social media

Mar 18, 2023

Punyajoy Saha, Kiran Garimella, Narla Komal Kalyan, Saurabh Kumar Pandey, Pauras Mangesh Meher, Binny Mathew, Animesh Mukherjee

Figure 1 for On the rise of fear speech in online social media

Figure 2 for On the rise of fear speech in online social media

Figure 3 for On the rise of fear speech in online social media

Figure 4 for On the rise of fear speech in online social media

Abstract:Recently, social media platforms are heavily moderated to prevent the spread of online hate speech, which is usually fertile in toxic words and is directed toward an individual or a community. Owing to such heavy moderation, newer and more subtle techniques are being deployed. One of the most striking among these is fear speech. Fear speech, as the name suggests, attempts to incite fear about a target community. Although subtle, it might be highly effective, often pushing communities toward a physical conflict. Therefore, understanding their prevalence in social media is of paramount importance. This article presents a large-scale study to understand the prevalence of 400K fear speech and over 700K hate speech posts collected from Gab.com. Remarkably, users posting a large number of fear speech accrue more followers and occupy more central positions in social networks than users posting a large number of hate speech. They can also reach out to benign users more effectively than hate speech users through replies, reposts, and mentions. This connects to the fact that, unlike hate speech, fear speech has almost zero toxic content, making it look plausible. Moreover, while fear speech topics mostly portray a community as a perpetrator using a (fake) chain of argumentation, hate speech topics hurl direct multitarget insults, thus pointing to why general users could be more gullible to fear speech. Our findings transcend even to other platforms (Twitter and Facebook) and thus necessitate using sophisticated moderation policies and mass awareness to combat fear speech.

* 16 pages, 9 tables, 15 figures, accepted in Proceedings of the National Academy of Sciences of the United States of America

Via

Access Paper or Ask Questions

Diversity matters: Robustness of bias measurements in Wikidata

Feb 27, 2023

Paramita Das, Sai Keerthana Karnam, Anirban Panda, Bhanu Prakash Reddy Guda, Soumya Sarkar, Animesh Mukherjee

Abstract:With the widespread use of knowledge graphs (KG) in various automated AI systems and applications, it is very important to ensure that information retrieval algorithms leveraging them are free from societal biases. Previous works have depicted biases that persist in KGs, as well as employed several metrics for measuring the biases. However, such studies lack the systematic exploration of the sensitivity of the bias measurements, through varying sources of data, or the embedding algorithms used. To address this research gap, in this work, we present a holistic analysis of bias measurement on the knowledge graph. First, we attempt to reveal data biases that surface in Wikidata for thirteen different demographics selected from seven continents. Next, we attempt to unfold the variance in the detection of biases by two different knowledge graph embedding algorithms - TransE and ComplEx. We conduct our extensive experiments on a large number of occupations sampled from the thirteen demographics with respect to the sensitive attribute, i.e., gender. Our results show that the inherent data bias that persists in KG can be altered by specific algorithm bias as incorporated by KG embedding learning algorithms. Further, we show that the choice of the state-of-the-art KG embedding algorithm has a strong impact on the ranking of biased occupations irrespective of gender. We observe that the similarity of the biased occupations across demographics is minimal which reflects the socio-cultural differences around the globe. We believe that this full-scale audit of the bias measurement pipeline will raise awareness among the community while deriving insights related to design choices of data and algorithms both and refrain from the popular dogma of ``one-size-fits-all''.

* 15th ACM Web Science 2023
* 11 pages

Via

Access Paper or Ask Questions

HateProof: Are Hateful Meme Detection Systems really Robust?

Feb 11, 2023

Piush Aggarwal, Pranit Chawla, Mithun Das, Punyajoy Saha, Binny Mathew, Torsten Zesch, Animesh Mukherjee

Abstract:Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems' vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model's robustness using contrastive learning as well as an adversarial training-based method - VILLA. Using an ensemble of the above two approaches, in two of our high resolution datasets, we are able to (re)gain back the performance to a large extent for certain attacks. We believe that ours is a first step toward addressing this crucial problem in an adversarial setting and would inspire more such investigations in the future.

* Accepted at TheWebConf'2023 (WWW'2023)

Via

Access Paper or Ask Questions

RAFT: Rationale adaptor for few-shot abusive language detection

Nov 30, 2022

Punyajoy Saha, Divyanshu Sheth, Kushal Kedia, Binny Mathew, Animesh Mukherjee

Figure 1 for RAFT: Rationale adaptor for few-shot abusive language detection

Figure 2 for RAFT: Rationale adaptor for few-shot abusive language detection

Figure 3 for RAFT: Rationale adaptor for few-shot abusive language detection

Figure 4 for RAFT: Rationale adaptor for few-shot abusive language detection

Abstract:Abusive language is a concerning problem in online social media. Past research on detecting abusive language covers different platforms, languages, demographies, etc. However, models trained using these datasets do not perform well in cross-domain evaluation settings. To overcome this, a common strategy is to use a few samples from the target domain to train models to get better performance in that domain (cross-domain few-shot training). However, this might cause the models to overfit the artefacts of those samples. A compelling solution could be to guide the models toward rationales, i.e., spans of text that justify the text's label. This method has been found to improve model performance in the in-domain setting across various NLP tasks. In this paper, we propose RAFT (Rationale Adaptor for Few-shoT classification) for abusive language detection. We first build a multitask learning setup to jointly learn rationales, targets, and labels, and find a significant improvement of 6% macro F1 on the rationale detection task over training solely rationale classifiers. We introduce two rationale-integrated BERT-based architectures (the RAFT models) and evaluate our systems over five different abusive language datasets, finding that in the few-shot classification setting, RAFT-based models outperform baseline models by about 7% in macro F1 scores and perform competitively to models finetuned on other source domains. Furthermore, RAFT-based models outperform LIME/SHAP-based approaches in terms of plausibility and are close in performance in terms of faithfulness.

* 9 pages, 6 tables, 2 figures

Via

Access Paper or Ask Questions

Hate Speech and Offensive Language Detection in Bengali

Oct 07, 2022

Mithun Das, Somnath Banerjee, Punyajoy Saha, Animesh Mukherjee

Figure 1 for Hate Speech and Offensive Language Detection in Bengali

Figure 2 for Hate Speech and Offensive Language Detection in Bengali

Figure 3 for Hate Speech and Offensive Language Detection in Bengali

Figure 4 for Hate Speech and Offensive Language Detection in Bengali

Abstract:Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a current trend on social media is the use of Romanized Bengali for regular interactions. To overcome the existing research's limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement several baseline models for the classification of such hateful posts. We further explore the interlingual transfer mechanism to boost classification performance. Finally, we perform an in-depth error analysis by looking into the misclassified posts by the models. While training actual and Romanized datasets separately, we observe that XLM-Roberta performs the best. Further, we witness that on joint training and few-shot training, MuRIL outperforms other models by interpreting the semantic expressions better. We make our code and dataset public for others.

* Accepted at AACL-IJCNLP 2022

Via

Access Paper or Ask Questions

Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

Sep 23, 2022

Souvic Chakraborty, Pawan Goyal, Animesh Mukherjee

Figure 1 for Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

Figure 2 for Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

Figure 3 for Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

Figure 4 for Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction

Abstract:With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations. The caveat here is that only influential people can make such an online polling and reach out at a mass scale. Further, in such cases, the distribution of voters is not controllable and may be, in fact, biased. On the other hand,if we can interpret the publicly available data over social media to probe the political inclination of users, we will be able to have controllable insights about the survey population, keep the cost of survey low and also collect publicly available data without involving the concerned persons. Hence we introduce a self-attentive semi-supervised framework for political inclination detection to further that objective. The advantage of our model is that it neither needs huge training data nor does it need to store social network parameters. Nevertheless, it achieves an accuracy of 93.7\% with no annotated data; further, with only a few annotated examples per class it achieves competitive performance. We found that the model is highly efficient even in resource-constrained settings, and insights drawn from its predictions match the manual survey outcomes when applied to diverse real-life scenarios.

* Accepted to ICADL'22

Via

Access Paper or Ask Questions

Decoding Demographic un-fairness from Indian Names

Sep 07, 2022

Medidoddi Vahini, Jalend Bantupalli, Souvic Chakraborty, Animesh Mukherjee

Figure 1 for Decoding Demographic un-fairness from Indian Names

Figure 2 for Decoding Demographic un-fairness from Indian Names

Figure 3 for Decoding Demographic un-fairness from Indian Names

Figure 4 for Decoding Demographic un-fairness from Indian Names

Abstract:Demographic classification is essential in fairness assessment in recommender systems or in measuring unintended bias in online networks and voting systems. Important fields like education and politics, which often lay a foundation for the future of equality in society, need scrutiny to design policies that can better foster equality in resource distribution constrained by the unbalanced demographic distribution of people in the country. We collect three publicly available datasets to train state-of-the-art classifiers in the domain of gender and caste classification. We train the models in the Indian context, where the same name can have different styling conventions (Jolly Abraham/Kumar Abhishikta in one state may be written as Abraham Jolly/Abishikta Kumar in the other). Finally, we also perform cross-testing (training and testing on different datasets) to understand the efficacy of the above models. We also perform an error analysis of the prediction models. Finally, we attempt to assess the bias in the existing Indian system as case studies and find some intriguing patterns manifesting in the complex demographic layout of the sub-continent across the dimensions of gender and caste.

* SocInfo'22: International Conference on Social Informatics 2022
* Accepted to SocInfo'22; code hosted at https://github.com/vahini01/IndianDemographics

Via

Access Paper or Ask Questions