Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gauri Sharma

Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks

Oct 16, 2025

Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu

Abstract:Large Language Model (LLM) agents are powering a growing share of interactive web applications, yet remain vulnerable to misuse and harm. Prior jailbreak research has largely focused on single-turn prompts, whereas real harassment often unfolds over multi-turn interactions. In this work, we present the Online Harassment Agentic Benchmark consisting of: (i) a synthetic multi-turn harassment conversation dataset, (ii) a multi-agent (e.g., harasser, victim) simulation informed by repeated game theory, (iii) three jailbreak methods attacking agents across memory, planning, and fine-tuning, and (iv) a mixed-methods evaluation framework. We utilize two prominent LLMs, LLaMA-3.1-8B-Instruct (open-source) and Gemini-2.0-flash (closed-source). Our results show that jailbreak tuning makes harassment nearly guaranteed with an attack success rate of 95.78--96.89% vs. 57.25--64.19% without tuning in Llama, and 99.33% vs. 98.46% without tuning in Gemini, while sharply reducing refusal rate to 1-2% in both models. The most prevalent toxic behaviors are Insult with 84.9--87.8% vs. 44.2--50.8% without tuning, and Flaming with 81.2--85.1% vs. 31.5--38.8% without tuning, indicating weaker guardrails compared to sensitive categories such as sexual or racial harassment. Qualitative evaluation further reveals that attacked agents reproduce human-like aggression profiles, such as Machiavellian/psychopathic patterns under planning, and narcissistic tendencies with memory. Counterintuitively, closed-source and open-source models exhibit distinct escalation trajectories across turns, with closed-source models showing significant vulnerability. Overall, our findings show that multi-turn and theory-grounded attacks not only succeed at high rates but also mimic human-like harassment dynamics, motivating the development of robust safety guardrails to ultimately keep online platforms safe and responsible.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions

Equity in Healthcare: Analyzing Disparities in Machine Learning Predictions of Diabetic Patient Readmissions

Mar 27, 2024

Zainab Al-Zanbouri, Gauri Sharma, Shaina Raza

Figure 1 for Equity in Healthcare: Analyzing Disparities in Machine Learning Predictions of Diabetic Patient Readmissions

Figure 2 for Equity in Healthcare: Analyzing Disparities in Machine Learning Predictions of Diabetic Patient Readmissions

Figure 3 for Equity in Healthcare: Analyzing Disparities in Machine Learning Predictions of Diabetic Patient Readmissions

Figure 4 for Equity in Healthcare: Analyzing Disparities in Machine Learning Predictions of Diabetic Patient Readmissions

Abstract:This study investigates how machine learning (ML) models can predict hospital readmissions for diabetic patients fairly and accurately across different demographics (age, gender, race). We compared models like Deep Learning, Generalized Linear Models, Gradient Boosting Machines (GBM), and Naive Bayes. GBM stood out with an F1-score of 84.3% and accuracy of 82.2%, accurately predicting readmissions across demographics. A fairness analysis was conducted across all the models. GBM minimized disparities in predictions, achieving balanced results across genders and races. It showed low False Discovery Rates (FDR) (6-7%) and False Positive Rates (FPR) (5%) for both genders. Additionally, FDRs remained low for racial groups, such as African Americans (8%) and Asians (7%). Similarly, FPRs were consistent across age groups (4%) for both patients under 40 and those above 40, indicating its precision and ability to reduce bias. These findings emphasize the importance of choosing ML models carefully to ensure both accuracy and fairness for all patients. By showcasing effectiveness of various models with fairness metrics, this study promotes personalized medicine and the need for fair ML algorithms in healthcare. This can ultimately reduce disparities and improve outcomes for diabetic patients of all backgrounds.

Via

Access Paper or Ask Questions