Picture for Yuhang Zang

Yuhang Zang

Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation

Add code
Jul 03, 2025
Viaarxiv icon

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Add code
Jun 24, 2025
Viaarxiv icon

Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings

Add code
Jun 05, 2025
Viaarxiv icon

Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning

Add code
May 20, 2025
Viaarxiv icon

Visual Agentic Reinforcement Fine-Tuning

Add code
May 20, 2025
Viaarxiv icon

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Add code
May 06, 2025
Viaarxiv icon

MM-IFEngine: Towards Multimodal Instruction Following

Add code
Apr 10, 2025
Viaarxiv icon

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Add code
Apr 08, 2025
Viaarxiv icon

Unified Reward Model for Multimodal Understanding and Generation

Add code
Mar 07, 2025
Viaarxiv icon

Visual-RFT: Visual Reinforcement Fine-Tuning

Add code
Mar 03, 2025
Viaarxiv icon