Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Apr 25, 2025

Jianyu Liu, Hangyu Guo, Ranjie Duan, Xingyuan Bu, Yancheng He, Shilong Li, Hui Huang, Jiaheng Liu, Yucheng Wang, Chenchen Jing(+7 more)

Figure 1 for DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Figure 2 for DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Figure 3 for DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Figure 4 for DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Share this with someone who'll enjoy it:

Abstract:Multimodal Large Language Models (MLLMs) pose unique safety challenges due to their integration of visual and textual data, thereby introducing new dimensions of potential attacks and complex risk combinations. In this paper, we begin with a detailed analysis aimed at disentangling risks through step-by-step reasoning within multimodal inputs. We find that systematic multimodal risk disentanglement substantially enhances the risk awareness of MLLMs. Via leveraging the strong discriminative abilities of multimodal risk disentanglement, we further introduce \textbf{DREAM} (\textit{\textbf{D}isentangling \textbf{R}isks to \textbf{E}nhance Safety \textbf{A}lignment in \textbf{M}LLMs}), a novel approach that enhances safety alignment in MLLMs through supervised fine-tuning and iterative Reinforcement Learning from AI Feedback (RLAIF). Experimental results show that DREAM significantly boosts safety during both inference and training phases without compromising performance on normal tasks (namely oversafety), achieving a 16.17\% improvement in the SIUO safe\&effective score compared to GPT-4V. The data and code are available at https://github.com/Kizna1ver/DREAM.

* [NAACL 2025] The first four authors contribute equally, 23 pages, repo at https://github.com/Kizna1ver/DREAM

View paper on

Share this with someone who'll enjoy it:

Title:DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Paper and Code