Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tarannum Shaila Zaman

SECite: Analyzing and Summarizing Citations in Software Engineering Literature

Jan 12, 2026

Shireesh Reddy Pyreddy, Khaja Valli Pathan, Hasan Masum, Tarannum Shaila Zaman

Abstract:Identifying the strengths and limitations of a research paper is a core component of any literature review. However, traditional summaries reflect only the authors' self-presented perspective. Analyzing how other researchers discuss and cite the paper can offer a deeper, more practical understanding of its contributions and shortcomings. In this research, we introduce SECite, a novel approach for evaluating scholarly impact through sentiment analysis of citation contexts. We develop a semi-automated pipeline to extract citations referencing nine research papers and apply advanced natural language processing (NLP) techniques with unsupervised machine learning to classify these citation statements as positive or negative. Beyond sentiment classification, we use generative AI to produce sentiment-specific summaries that capture the strengths and limitations of each target paper, derived both from clustered citation groups and from the full text. Our findings reveal meaningful patterns in how the academic community perceives these works, highlighting areas of alignment and divergence between external citation feedback and the authors' own presentation. By integrating citation sentiment analysis with LLM-based summarization, this study provides a comprehensive framework for assessing scholarly contributions.

* Accepted at IEEE CCWC 2026

Via

Access Paper or Ask Questions

OLAF: Towards Robust LLM-Based Annotation Framework in Empirical Software Engineering

Dec 17, 2025

Mia Mohammad Imran, Tarannum Shaila Zaman

Abstract:Large Language Models (LLMs) are increasingly used in empirical software engineering (ESE) to automate or assist annotation tasks such as labeling commits, issues, and qualitative artifacts. Yet the reliability and reproducibility of such annotations remain underexplored. Existing studies often lack standardized measures for reliability, calibration, and drift, and frequently omit essential configuration details. We argue that LLM-based annotation should be treated as a measurement process rather than a purely automated activity. In this position paper, we outline the \textbf{Operationalization for LLM-based Annotation Framework (OLAF)}, a conceptual framework that organizes key constructs: \textit{reliability, calibration, drift, consensus, aggregation}, and \textit{transparency}. The paper aims to motivate methodological discussion and future empirical work toward more transparent and reproducible LLM-based annotation in software engineering research.

* 3rd International Workshop on Methodological Issues with Empirical Studies in Software Engineering (WSESE) 2026

Via

Access Paper or Ask Questions

EmoBang: Detecting Emotion From Bengali Texts

Nov 10, 2025

Abdullah Al Maruf, Aditi Golder, Zakaria Masud Jiyad, Abdullah Al Numan, Tarannum Shaila Zaman

Figure 1 for EmoBang: Detecting Emotion From Bengali Texts

Figure 2 for EmoBang: Detecting Emotion From Bengali Texts

Figure 3 for EmoBang: Detecting Emotion From Bengali Texts

Figure 4 for EmoBang: Detecting Emotion From Bengali Texts

Abstract:Emotion detection from text seeks to identify an individual's emotional or mental state - positive, negative, or neutral - based on linguistic cues. While significant progress has been made for English and other high-resource languages, Bengali remains underexplored despite being the world's fourth most spoken language. The lack of large, standardized datasets classifies Bengali as a low-resource language for emotion detection. Existing studies mainly employ classical machine learning models with traditional feature engineering, yielding limited performance. In this paper, we introduce a new Bengali emotion dataset annotated across eight emotion categories and propose two models for automatic emotion detection: (i) a hybrid Convolutional Recurrent Neural Network (CRNN) model (EmoBangHybrid) and (ii) an AdaBoost-Bidirectional Encoder Representations from Transformers (BERT) ensemble model (EmoBangEnsemble). Additionally, we evaluate six baseline models with five feature engineering techniques and assess zero-shot and few-shot large language models (LLMs) on the dataset. To the best of our knowledge, this is the first comprehensive benchmark for Bengali emotion detection. Experimental results show that EmoBangH and EmoBangE achieve accuracies of 92.86% and 93.69%, respectively, outperforming existing methods and establishing strong baselines for future research.

Via

Access Paper or Ask Questions

LLM-ProS: Analyzing Large Language Models' Performance in Competitive Problem Solving

Feb 04, 2025

Md Sifat Hossain, Anika Tabassum, Md. Fahim Arefin, Tarannum Shaila Zaman

Abstract:The rapid advancement of large language models has opened new avenues for automating complex problem-solving tasks such as algorithmic coding and competitive programming. This paper introduces a novel evaluation technique, LLM-ProS, to assess the performance of state-of-the-art LLMs on International Collegiate Programming Contest (ICPC) problems. Using a curated dataset of 166 World Finals problems from 2011 to 2024, we benchmark the models' reasoning, accuracy, and efficiency. We evaluate the five models-GPT-4o, Mistral Large, Llama-3.1-405B, and the o1 family, consisting of o1-mini and o1-preview, across critical metrics like correctness, resource utilization, and response calibration. Our results reveal significant differences in the models' abilities to generalize, adapt, and solve novel problems. We also investigated the impact of training methodologies, dataset contamination, and chain-of-thought reasoning on model performance. The findings provide new insights into optimizing LLMs for algorithmic tasks, highlighting both strengths and limitations of current models.

* To be published in LLM4Code 2025 workshop proceedings

Via

Access Paper or Ask Questions

EmoXpt: Analyzing Emotional Variances in Human Comments and LLM-Generated Responses

Jan 11, 2025

Shireesh Reddy Pyreddy, Tarannum Shaila Zaman

Figure 1 for EmoXpt: Analyzing Emotional Variances in Human Comments and LLM-Generated Responses

Figure 2 for EmoXpt: Analyzing Emotional Variances in Human Comments and LLM-Generated Responses

Figure 3 for EmoXpt: Analyzing Emotional Variances in Human Comments and LLM-Generated Responses

Figure 4 for EmoXpt: Analyzing Emotional Variances in Human Comments and LLM-Generated Responses

Abstract:The widespread adoption of generative AI has generated diverse opinions, with individuals expressing both support and criticism of its applications. This study investigates the emotional dynamics surrounding generative AI by analyzing human tweets referencing terms such as ChatGPT, OpenAI, Copilot, and LLMs. To further understand the emotional intelligence of ChatGPT, we examine its responses to selected tweets, highlighting differences in sentiment between human comments and LLM-generated responses. We introduce EmoXpt, a sentiment analysis framework designed to assess both human perspectives on generative AI and the sentiment embedded in ChatGPT's responses. Unlike prior studies that focus exclusively on human sentiment, EmoXpt uniquely evaluates the emotional expression of ChatGPT. Experimental results demonstrate that LLM-generated responses are notably more efficient, cohesive, and consistently positive than human responses.

* 7 pages, 10 figures, 5 tables. This paper has been accepted and presented at the 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC)

Via

Access Paper or Ask Questions