Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhijnan Chakraborty

ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues

Apr 02, 2026

Bhaskara Hanuma Vedula, Darshan Anghan, Ishita Goyal, Ponnurangam Kumaraguru, Abhijnan Chakraborty

Abstract:Large Language Models increasingly suppress biased outputs when demographic identity is stated explicitly, yet may still exhibit implicit biases when identity is conveyed indirectly. Existing benchmarks use name based proxies to detect implicit biases, which carry weak associations with many social demographics and cannot extend to dimensions like age or socioeconomic status. We introduce ImplicitBBQ, a QA benchmark that evaluates implicit bias through characteristic based cues, culturally associated attributes that signal implicitly, across age, gender, region, religion, caste, and socioeconomic status. Evaluating 11 models, we find that implicit bias in ambiguous contexts is over six times higher than explicit bias in open weight models. Safety prompting and chain-of-thought reasoning fail to substantially close this gap; even few-shot prompting, which reduces implicit bias by 84%, leaves caste bias at four times the level of any other dimension. These findings indicate that current alignment and prompting strategies address the surface of bias evaluation while leaving culturally grounded stereotypic associations largely unresolved. We publicly release our code and dataset for model providers and researchers to benchmark potential mitigation techniques.

Via

Access Paper or Ask Questions

SMITE: Enhancing Fairness in LLMs through Optimal In-Context Example Selection via Dynamic Validation

Aug 25, 2025

Garima Chhikara, Kripabandhu Ghosh, Abhijnan Chakraborty

Abstract:Large Language Models (LLMs) are widely used for downstream tasks such as tabular classification, where ensuring fairness in their outputs is critical for inclusivity, equal representation, and responsible AI deployment. This study introduces a novel approach to enhancing LLM performance and fairness through the concept of a dynamic validation set, which evolves alongside the test set, replacing the traditional static validation approach. We also propose an iterative algorithm, SMITE, to select optimal in-context examples, with each example set validated against its corresponding dynamic validation set. The in-context set with the lowest total error is used as the final demonstration set. Our experiments across four different LLMs show that our proposed techniques significantly improve both predictive accuracy and fairness compared to baseline methods. To our knowledge, this is the first study to apply dynamic validation in the context of in-context learning for LLMs.

Via

Access Paper or Ask Questions

Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?

Jun 15, 2025

Daman Deep Singh, Ramanuj Bhattacharjee, Abhijnan Chakraborty

Figure 1 for Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?

Figure 2 for Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?

Figure 3 for Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?

Figure 4 for Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?

Abstract:Hate speech detection across contemporary social media presents unique challenges due to linguistic diversity and the informal nature of online discourse. These challenges are further amplified in settings involving code-mixing, transliteration, and culturally nuanced expressions. While fine-tuned transformer models, such as BERT, have become standard for this task, we argue that recent large language models (LLMs) not only surpass them but also redefine the landscape of hate speech detection more broadly. To support this claim, we introduce IndoHateMix, a diverse, high-quality dataset capturing Hindi-English code-mixing and transliteration in the Indian context, providing a realistic benchmark to evaluate model robustness in complex multilingual scenarios where existing NLP methods often struggle. Our extensive experiments show that cutting-edge LLMs (such as LLaMA-3.1) consistently outperform task-specific BERT-based models, even when fine-tuned on significantly less data. With their superior generalization and adaptability, LLMs offer a transformative approach to mitigating online hate in diverse environments. This raises the question of whether future works should prioritize developing specialized models or focus on curating richer and more varied datasets to further enhance the effectiveness of LLMs.

Via

Access Paper or Ask Questions

Sometimes the Model doth Preach: Quantifying Religious Bias in Open LLMs through Demographic Analysis in Asian Nations

Mar 10, 2025

Hari Shankar, Vedanta S P, Tejas Cavale, Ponnurangam Kumaraguru, Abhijnan Chakraborty

Abstract:Large Language Models (LLMs) are capable of generating opinions and propagating bias unknowingly, originating from unrepresentative and non-diverse data collection. Prior research has analysed these opinions with respect to the West, particularly the United States. However, insights thus produced may not be generalized in non-Western populations. With the widespread usage of LLM systems by users across several different walks of life, the cultural sensitivity of each generated output is of crucial interest. Our work proposes a novel method that quantitatively analyzes the opinions generated by LLMs, improving on previous work with regards to extracting the social demographics of the models. Our method measures the distance from an LLM's response to survey respondents, through Hamming Distance, to infer the demographic characteristics reflected in the model's outputs. We evaluate modern, open LLMs such as Llama and Mistral on surveys conducted in various global south countries, with a focus on India and other Asian nations, specifically assessing the model's performance on surveys related to religious tolerance and identity. Our analysis reveals that most open LLMs match a single homogeneous profile, varying across different countries/territories, which in turn raises questions about the risks of LLMs promoting a hegemonic worldview, and undermining perspectives of different minorities. Our framework may also be useful for future research investigating the complex intersection between training data, model architecture, and the resulting biases reflected in LLM outputs, particularly concerning sensitive topics like religious tolerance and identity.

Via

Access Paper or Ask Questions

Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions

Jan 28, 2025

Garima Chhikara, Abhishek Kumar, Abhijnan Chakraborty

Figure 1 for Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions

Figure 2 for Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions

Figure 3 for Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions

Figure 4 for Through the Prism of Culture: Evaluating LLMs' Understanding of Indian Subcultures and Traditions

Abstract:Large Language Models (LLMs) have shown remarkable advancements but also raise concerns about cultural bias, often reflecting dominant narratives at the expense of under-represented subcultures. In this study, we evaluate the capacity of LLMs to recognize and accurately respond to the Little Traditions within Indian society, encompassing localized cultural practices and subcultures such as caste, kinship, marriage, and religion. Through a series of case studies, we assess whether LLMs can balance the interplay between dominant Great Traditions and localized Little Traditions. We explore various prompting strategies and further investigate whether using prompts in regional languages enhances the models cultural sensitivity and response quality. Our findings reveal that while LLMs demonstrate an ability to articulate cultural nuances, they often struggle to apply this understanding in practical, context-specific scenarios. To the best of our knowledge, this is the first study to analyze LLMs engagement with Indian subcultures, offering critical insights into the challenges of embedding cultural diversity in AI systems.

Via

Access Paper or Ask Questions

Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Jul 01, 2024

Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Krishna P. Gummadi

Figure 1 for Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Figure 2 for Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Figure 3 for Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Figure 4 for Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Abstract:E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or shipping services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the marketplace helps a customer in selecting an offer (by a seller) through (a) a default offer selection algorithm, (b) showing features about each of the offers and the corresponding sellers (price, seller performance metrics, seller's number of ratings etc.), and (c) finally evaluating the sellers along these features. In this paper, we perform an end-to-end investigation into how the above apparatus can nudge customers toward the Related Sellers on Amazon's four different marketplaces in India, USA, Germany and France. We find that given explicit choices, customers' preferred offers and algorithmically selected offers can be significantly different. We highlight that Amazon is adopting different performance metric evaluation policies for different sellers, potentially benefiting Related Sellers. For instance, such policies result in notable discrepancy between the actual performance metric and the presented performance metric of Related Sellers. We further observe that among the seller-centric features visible to customers, sellers' number of ratings influences their decisions the most, yet it may not reflect the true quality of service by the seller, rather reflecting the scale at which the seller operates, thereby implicitly steering customers toward larger Related Sellers. Moreover, when customers are shown the rectified metrics for the different sellers, their preference toward Related Sellers is almost halved.

* This work has been accepted for presentation at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2024. It will appear in Proceedings of the ACM on Human-Computer Interaction

Via

Access Paper or Ask Questions

LaMSUM: A Novel Framework for Extractive Summarization of User Generated Content using LLMs

Jun 22, 2024

Garima Chhikara, Anurag Sharma, V. Gurucharan, Kripabandhu Ghosh, Abhijnan Chakraborty

Figure 1 for LaMSUM: A Novel Framework for Extractive Summarization of User Generated Content using LLMs

Figure 2 for LaMSUM: A Novel Framework for Extractive Summarization of User Generated Content using LLMs

Figure 3 for LaMSUM: A Novel Framework for Extractive Summarization of User Generated Content using LLMs

Figure 4 for LaMSUM: A Novel Framework for Extractive Summarization of User Generated Content using LLMs

Abstract:Large Language Models (LLMs) have demonstrated impressive performance across a wide range of NLP tasks, including summarization. Inherently LLMs produce abstractive summaries, and the task of achieving extractive summaries through LLMs still remains largely unexplored. To bridge this gap, in this work, we propose a novel framework LaMSUM to generate extractive summaries through LLMs for large user-generated text by leveraging voting algorithms. Our evaluation on three popular open-source LLMs (Llama 3, Mixtral and Gemini) reveal that the LaMSUM outperforms state-of-the-art extractive summarization methods. We further attempt to provide the rationale behind the output summary produced by LLMs. Overall, this is one of the early attempts to achieve extractive summarization for large user-generated text by utilizing LLMs, and likely to generate further interest in the community.

* Under review

Via

Access Paper or Ask Questions

Artificial Intelligence (AI) in Legal Data Mining

May 23, 2024

Aniket Deroy, Naksatra Kumar Bailung, Kripabandhu Ghosh, Saptarshi Ghosh, Abhijnan Chakraborty

Abstract:Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same. It is important to organise the legal information in a way that is useful for practitioners and downstream automation tasks. The word ontology was used by Greek philosophers to discuss concepts of existence, being, becoming and reality. Today, scientists use this term to describe the relation between concepts, data, and entities. A great example for a working ontology was developed by Dhani and Bhatt. This ontology deals with Indian court cases on intellectual property rights (IPR) The future of legal ontologies is likely to be handled by computer experts and legal experts alike.

* Book name-Technology and Analytics for Law and Justice, Page no-273-297, Chapter no-14

Via

Access Paper or Ask Questions

Antitrust, Amazon, and Algorithmic Auditing

Mar 27, 2024

Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Jens Frankenreiter, Stefan Bechtold, Krishna P. Gummadi

Figure 1 for Antitrust, Amazon, and Algorithmic Auditing

Figure 2 for Antitrust, Amazon, and Algorithmic Auditing

Figure 3 for Antitrust, Amazon, and Algorithmic Auditing

Figure 4 for Antitrust, Amazon, and Algorithmic Auditing

Abstract:In digital markets, antitrust law and special regulations aim to ensure that markets remain competitive despite the dominating role that digital platforms play today in everyone's life. Unlike traditional markets, market participant behavior is easily observable in these markets. We present a series of empirical investigations into the extent to which Amazon engages in practices that are typically described as self-preferencing. We discuss how the computer science tools used in this paper can be used in a regulatory environment that is based on algorithmic auditing and requires regulating digital markets at scale.

* The paper has been accepted to appear at Journal of Institutional and Theoretical Economics (JITE) 2024

Via

Access Paper or Ask Questions