Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning

Jun 07, 2025

Chaoyang Wang, Zeyu Zhang, Haiyun Jiang

Figure 1 for Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning

Figure 2 for Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning

Figure 3 for Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning

Figure 4 for Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning

Share this with someone who'll enjoy it:

Abstract:Visual reasoning is crucial for understanding complex multimodal data and advancing Artificial General Intelligence. Existing methods enhance the reasoning capability of Multimodal Large Language Models (MLLMs) through Reinforcement Learning (RL) fine-tuning (e.g., GRPO). However, current RL approaches sample action groups solely from the policy model itself, which limits the upper boundary of the model's reasoning capability and leads to inefficient training. To address these limitations, this paper proposes a novel RL framework called \textbf{Vision-EKIPL}. The core of this framework lies in introducing high-quality actions generated by external auxiliary models during the RL training process to guide the optimization of the policy model. The policy learning with knowledge infusion from external models significantly expands the model's exploration space, effectively improves the reasoning boundary, and substantially accelerates training convergence speed and efficiency. Experimental results demonstrate that our proposed Vision-EKIPL achieved up to a 5\% performance improvement on the Reason-RFT-CoT Benchmark compared to the state-of-the-art (SOTA). It reveals that Vision-EKIPL can overcome the limitations of traditional RL methods, significantly enhance the visual reasoning performance of MLLMs, and provide a new effective paradigm for research in this field.

View paper on

Share this with someone who'll enjoy it:

Title:Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning

Paper and Code