Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sahil Wadhwa

Adaptive Instruction Composition for Automated LLM Red-Teaming

Apr 22, 2026

Jesse Zymet, Andy Luo, Swapnil Shinde, Sahil Wadhwa, Emily Chen

Abstract:Many approaches to LLM red-teaming leverage an attacker LLM to discover jailbreaks against a target. Several of them task the attacker with identifying effective strategies through trial and error, resulting in a semantically limited range of successes. Another approach discovers diverse attacks by combining crowdsourced harmful queries and tactics into instructions for the attacker, but does so at random, limiting effectiveness. This article introduces a novel framework, Adaptive Instruction Composition, that combines crowdsourced texts according to an adaptive mechanism trained to jointly optimize effectiveness with diversity. We use reinforcement learning to balance exploration with exploitation in a combinatorial space of instructions to guide the attacker toward diverse generations tailored to target vulnerabilities. We demonstrate that our approach substantially outperforms random combination on a set of effectiveness and diversity metrics, even under model transfer. Further, we show that it surpasses a host of recent adaptive approaches on Harmbench. We employ a lightweight neural contextual bandit that adapts to contrastive embedding inputs, and provide ablations suggesting that the contrastive pretraining enables the network to rapidly generalize and scale to the massive space as it learns.

* Accepted to ACL 2026 Main Conference

Via

Access Paper or Ask Questions

Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

Mar 05, 2025

Alberto Purpura, Sahil Wadhwa, Jesse Zymet, Akshay Gupta, Andy Luo, Melissa Kazemi Rad, Swapnil Shinde, Mohammad Shahed Sorower

Figure 1 for Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

Figure 2 for Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models

Abstract:The rapid growth of Large Language Models (LLMs) presents significant privacy, security, and ethical concerns. While much research has proposed methods for defending LLM systems against misuse by malicious actors, researchers have recently complemented these efforts with an offensive approach that involves red teaming, i.e., proactively attacking LLMs with the purpose of identifying their vulnerabilities. This paper provides a concise and practical overview of the LLM red teaming literature, structured so as to describe a multi-component system end-to-end. To motivate red teaming we survey the initial safety needs of some high-profile LLMs, and then dive into the different components of a red teaming system as well as software packages for implementing them. We cover various attack methods, strategies for attack-success evaluation, metrics for assessing experiment outcomes, as well as a host of other considerations. Our survey will be useful for any reader who wants to rapidly obtain a grasp of the major red teaming concepts for their own use in practical applications.

Via

Access Paper or Ask Questions

Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization

Dec 19, 2024

Sahil Wadhwa, Chengtian Xu, Haoming Chen, Aakash Mahalingam, Akankshya Kar, Divya Chaudhary

Figure 1 for Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization

Figure 2 for Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization

Figure 3 for Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization

Figure 4 for Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization

Abstract:The automatic generation of counter-speech (CS) is a critical strategy for addressing hate speech by providing constructive and informed responses. However, existing methods often fail to generate high-quality, impactful, and scalable CS, particularly across diverse linguistic contexts. In this paper, we propose a novel methodology to enhance CS generation by aligning Large Language Models (LLMs) using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Our approach leverages DPO to align LLM outputs with human preferences, ensuring contextually appropriate and linguistically adaptable responses. Additionally, we incorporate knowledge grounding to enhance the factual accuracy and relevance of generated CS. Experimental results demonstrate that DPO-aligned models significantly outperform SFT baselines on CS benchmarks while scaling effectively to multiple languages. These findings highlight the potential of preference-based alignment techniques to advance CS generation across varied linguistic settings. The model supervision and alignment is done in English and the same model is used for reporting metrics across other languages like Basque, Italian, and Spanish.

* The First Workshop on Multilingual Counterspeech Generation (MCG) at The 31st International Conference on Computational Linguistics (COLING 2025)
* 10 pages, 6 tables, 1 figure, The First Workshop on Multilingual Counterspeech Generation (MCG) at The 31st International Conference on Computational Linguistics (COLING 2025)

Via

Access Paper or Ask Questions

FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Aug 20, 2023

Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, Xu Zhang, Yue Wu, Rakesh Chada, Pradeep Natarajan, Henrik I. Christensen

Figure 1 for FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Figure 2 for FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Figure 3 for FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Figure 4 for FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Abstract:Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in this work. Further analysis of the model in a real-world interactive setting demonstrates two important capabilities of our model -- memory retention across turns, and agnosticity to turn order for non-contradictory feedback. Finally, user study results show that images retrieved by FashionNTM were favored by 83.1% over other multi-turn models. Project page: https://sites.google.com/eng.ucsd.edu/fashionntm

* Paper accepted at ICCV-2023

Via

Access Paper or Ask Questions

YELM: End-to-End Contextualized Entity Linking

Nov 10, 2019

Haotian Chen, Sahil Wadhwa, Xi David Li, Andrej Zukov-Gregoric

Figure 1 for YELM: End-to-End Contextualized Entity Linking

Figure 2 for YELM: End-to-End Contextualized Entity Linking

Abstract:We propose yet another entity linking model (YELM) which links words to entities instead of spans. This overcomes any difficulties associated with the selection of good candidate mention spans and makes the joint training of mention detection (MD) and entity disambiguation (ED) easily possible. Our model is based on BERT and produces contextualized word embeddings which are trained against a joint MD and ED objective. We achieve state-of-the-art results on several standard entity linking (EL) datasets.

* 5 pages, 2 tables

Via

Access Paper or Ask Questions

Evaluating the Readability of Force Directed Graph Layouts: A Deep Learning Approach

Aug 02, 2018

Hammad Haleem, Yong Wang, Abishek Puri, Sahil Wadhwa, Huamin Qu

Figure 1 for Evaluating the Readability of Force Directed Graph Layouts: A Deep Learning Approach

Figure 2 for Evaluating the Readability of Force Directed Graph Layouts: A Deep Learning Approach

Figure 3 for Evaluating the Readability of Force Directed Graph Layouts: A Deep Learning Approach

Figure 4 for Evaluating the Readability of Force Directed Graph Layouts: A Deep Learning Approach

Abstract:Existing graph layout algorithms are usually not able to optimize all the aesthetic properties desired in a graph layout. To evaluate how well the desired visual features are reflected in a graph layout, many readability metrics have been proposed in the past decades. However, the calculation of these readability metrics often requires access to the node and edge coordinates and is usually computationally inefficient, especially for dense graphs. Importantly, when the node and edge coordinates are not accessible, it becomes impossible to evaluate the graph layouts quantitatively. In this paper, we present a novel deep learning-based approach to evaluate the readability of graph layouts by directly using graph images. A convolutional neural network architecture is proposed and trained on a benchmark dataset of graph images, which is composed of synthetically-generated graphs and graphs created by sampling from real large networks. Multiple representative readability metrics (including edge crossing, node spread, and group overlap) are considered in the proposed approach. We quantitatively compare our approach to traditional methods and qualitatively evaluate our approach using a case study and visualizing convolutional layers. This work is a first step towards using deep learning based methods to evaluate images from the visualization field quantitatively.

* A version of this work is under review

Via

Access Paper or Ask Questions