Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Panjia Qiu

Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

Apr 30, 2025

Guanghao Zhou, Panjia Qiu, Cen Chen, Jie Wang, Zheming Yang, Jian Xu, Minghui Qiu

Figure 1 for Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

Figure 2 for Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

Figure 3 for Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

Figure 4 for Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

Abstract:The integration of reinforcement learning (RL) into the reasoning capabilities of Multimodal Large Language Models (MLLMs) has rapidly emerged as a transformative research direction. While MLLMs significantly extend Large Language Models (LLMs) to handle diverse modalities such as vision, audio, and video, enabling robust reasoning across multimodal inputs remains a major challenge. This survey systematically reviews recent advances in RL-based reasoning for MLLMs, covering key algorithmic designs, reward mechanism innovations, and practical applications. We highlight two main RL paradigms--value-free and value-based methods--and analyze how RL enhances reasoning abilities by optimizing reasoning trajectories and aligning multimodal information. Furthermore, we provide an extensive overview of benchmark datasets, evaluation protocols, and existing limitations, and propose future research directions to address current bottlenecks such as sparse rewards, inefficient cross-modal reasoning, and real-world deployment constraints. Our goal is to offer a comprehensive and structured guide to researchers interested in advancing RL-based reasoning in the multimodal era.

Via

Access Paper or Ask Questions

CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models

Feb 17, 2025

Guanghao Zhou, Panjia Qiu, Mingyuan Fan, Cen Chen, Mingyuan Chu, Xin Zhang, Jun Zhou

Figure 1 for CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models

Figure 2 for CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models

Figure 3 for CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models

Figure 4 for CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models

Abstract:Despite explicit alignment efforts for large language models (LLMs), they can still be exploited to trigger unintended behaviors, a phenomenon known as "jailbreaking." Current jailbreak attack methods mainly focus on discrete prompt manipulations targeting closed-source LLMs, relying on manually crafted prompt templates and persuasion rules. However, as the capabilities of open-source LLMs improve, ensuring their safety becomes increasingly crucial. In such an environment, the accessibility of model parameters and gradient information by potential attackers exacerbates the severity of jailbreak threats. To address this research gap, we propose a novel \underline{C}ontext-\underline{C}oherent \underline{J}ailbreak \underline{A}ttack (CCJA). We define jailbreak attacks as an optimization problem within the embedding space of masked language models. Through combinatorial optimization, we effectively balance the jailbreak attack success rate with semantic coherence. Extensive evaluations show that our method not only maintains semantic consistency but also surpasses state-of-the-art baselines in attack effectiveness. Additionally, by integrating semantically coherent jailbreak prompts generated by our method into widely used black-box methodologies, we observe a notable enhancement in their success rates when targeting closed-source commercial LLMs. This highlights the security threat posed by open-source LLMs to commercial counterparts. We will open-source our code if the paper is accepted.

Via

Access Paper or Ask Questions