Adversarial Text


Adversarial text refers to a specialized text sequence that is designed specifically to influence the prediction of a language model. Generally, adversarial text attacks are carried out on large language models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.

LiteToken: Removing Intermediate Merge Residues From BPE Tokenizers

Add code
Feb 04, 2026
Viaarxiv icon

Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases

Add code
Feb 04, 2026
Viaarxiv icon

Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective

Add code
Feb 03, 2026
Viaarxiv icon

Unifying Adversarial Robustness and Training Across Text Scoring Models

Add code
Jan 31, 2026
Viaarxiv icon

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Add code
Feb 03, 2026
Viaarxiv icon

SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models

Add code
Feb 01, 2026
Viaarxiv icon

ReLAPSe: Reinforcement-Learning-trained Adversarial Prompt Search for Erased concepts in unlearned diffusion models

Add code
Jan 30, 2026
Viaarxiv icon

Text is All You Need for Vision-Language Model Jailbreaking

Add code
Jan 31, 2026
Viaarxiv icon

One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking

Add code
Jan 28, 2026
Viaarxiv icon