Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jessica Woodgate

Ignore All Previous Instructions: Jailbreaking as a de-escalatory peace building practise to resist LLM social media bots

Mar 02, 2026

Huw Day, Adrianna Jezierska, Jessica Woodgate

Abstract:Large Language Models have intensified the scale and strategic manipulation of political discourse on social media, leading to conflict escalation. The existing literature largely focuses on platform-led moderation as a countermeasure. In this paper, we propose a user-centric view of "jailbreaking" as an emergent, non-violent de-escalation practice. Online users engage with suspected LLM-powered accounts to circumvent large language model safeguards, exposing automated behaviour and disrupting the circulation of misleading narratives.

* Accepted to ICLR 2026 AI for peace workshop

Via

Access Paper or Ask Questions

Operationalising Rawlsian Ethics for Fairness in Norm-Learning Agents

Dec 19, 2024

Jessica Woodgate, Paul Marshall, Nirav Ajmeri

Figure 1 for Operationalising Rawlsian Ethics for Fairness in Norm-Learning Agents

Figure 2 for Operationalising Rawlsian Ethics for Fairness in Norm-Learning Agents

Figure 3 for Operationalising Rawlsian Ethics for Fairness in Norm-Learning Agents

Figure 4 for Operationalising Rawlsian Ethics for Fairness in Norm-Learning Agents

Abstract:Social norms are standards of behaviour common in a society. However, when agents make decisions without considering how others are impacted, norms can emerge that lead to the subjugation of certain agents. We present RAWL-E, a method to create ethical norm-learning agents. RAWL-E agents operationalise maximin, a fairness principle from Rawlsian ethics, in their decision-making processes to promote ethical norms by balancing societal well-being with individual goals. We evaluate RAWL-E agents in simulated harvesting scenarios. We find that norms emerging in RAWL-E agent societies enhance social welfare, fairness, and robustness, and yield higher minimum experience compared to those that emerge in agent societies that do not implement Rawlsian ethics.

* 14 pages, 7 figures, 8 tables (and supplementary material with reproducibility and additional results), accepted at AAAI 2025

Via

Access Paper or Ask Questions