Picture for Junkang Wu

Junkang Wu

Beyond Where to Look: Trajectory-Guided Reinforcement Learning for Multimodal RLVR

Add code
Mar 27, 2026
Viaarxiv icon

Bridging Perception and Reasoning: Token Reweighting for RLVR in Multimodal LLMs

Add code
Mar 26, 2026
Viaarxiv icon

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Add code
Mar 23, 2026
Viaarxiv icon

Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention

Add code
Nov 13, 2025
Viaarxiv icon

Quantile Advantage Estimation for Entropy-Safe Reasoning

Add code
Sep 26, 2025
Viaarxiv icon

AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization

Add code
Apr 22, 2025
Viaarxiv icon

Aligning Multimodal LLM with Human Preference: A Survey

Add code
Mar 18, 2025
Figure 1 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 2 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 3 for Aligning Multimodal LLM with Human Preference: A Survey
Figure 4 for Aligning Multimodal LLM with Human Preference: A Survey
Viaarxiv icon

RePO: ReLU-based Preference Optimization

Add code
Mar 10, 2025
Viaarxiv icon

DAMO: Data- and Model-aware Alignment of Multi-modal LLMs

Add code
Feb 04, 2025
Figure 1 for DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
Figure 2 for DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
Figure 3 for DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
Figure 4 for DAMO: Data- and Model-aware Alignment of Multi-modal LLMs
Viaarxiv icon

$α$-DPO: Adaptive Reward Margin is What Direct Preference Optimization Needs

Add code
Oct 14, 2024
Figure 1 for $α$-DPO: Adaptive Reward Margin is What Direct Preference Optimization Needs
Figure 2 for $α$-DPO: Adaptive Reward Margin is What Direct Preference Optimization Needs
Figure 3 for $α$-DPO: Adaptive Reward Margin is What Direct Preference Optimization Needs
Figure 4 for $α$-DPO: Adaptive Reward Margin is What Direct Preference Optimization Needs
Viaarxiv icon