Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Yang Tan

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Mar 23, 2026

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

Abstract:Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template jailbreaks that embed harmful goals inside simple three-panel visual narratives and prompt the model to role-play and "complete the comic." Building on JailbreakBench and JailbreakV, we introduce ComicJailbreak, a comic-based jailbreak benchmark with 1,167 attack instances spanning 10 harm categories and 5 task setups. Across 15 state-of-the-art MLLMs (six commercial and nine open-source), comic-based attacks achieve success rates comparable to strong rule-based jailbreaks and substantially outperform plain-text and random-image baselines, with ensemble success rates exceeding 90% on several commercial models. Then, with the existing defense methodologies, we show that these methods are effective against the harmful comics, they will induce a high refusal rate when prompted with benign prompts. Finally, using automatic judging and targeted human evaluation, we show that current safety evaluators can be unreliable on sensitive but non-harmful content. Our findings highlight the need for safety alignment robust to narrative-driven multimodal jailbreaks.

* 31 pages

Via

Access Paper or Ask Questions

Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection

Jan 26, 2025

Han Wang, Rui Yang Tan, Roy Ka-Wei Lee

Figure 1 for Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection

Figure 2 for Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection

Figure 3 for Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection

Figure 4 for Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection

Abstract:Detecting hate speech in online content is essential to ensuring safer digital spaces. While significant progress has been made in text and meme modalities, video-based hate speech detection remains under-explored, hindered by a lack of annotated datasets and the high cost of video annotation. This gap is particularly problematic given the growing reliance on large models, which demand substantial amounts of training data. To address this challenge, we leverage meme datasets as both a substitution and an augmentation strategy for training hateful video detection models. Our approach introduces a human-assisted reannotation pipeline to align meme dataset labels with video datasets, ensuring consistency with minimal labeling effort. Using two state-of-the-art vision-language models, we demonstrate that meme data can substitute for video data in resource-scarce scenarios and augment video datasets to achieve further performance gains. Our results consistently outperform state-of-the-art benchmarks, showcasing the potential of cross-modal transfer learning for advancing hateful video detection. Dataset and code are available at https://github.com/Social-AI-Studio/CrossModalTransferLearning.

* 10 pages, 4 figures, THE WEB CONFERENCE 2025

Via

Access Paper or Ask Questions