Picture for Zhiwu Lu

Zhiwu Lu

LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

Add code
May 22, 2025
Viaarxiv icon

Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation

Add code
May 19, 2025
Viaarxiv icon

CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval

Add code
Feb 28, 2025
Viaarxiv icon

Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval

Add code
Dec 15, 2024
Viaarxiv icon

Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Add code
Nov 16, 2024
Viaarxiv icon

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Add code
Aug 08, 2024
Figure 1 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Figure 2 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Figure 3 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Figure 4 for MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Viaarxiv icon

CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning

Add code
Mar 07, 2024
Viaarxiv icon

Improvable Gap Balancing for Multi-Task Learning

Add code
Jul 28, 2023
Viaarxiv icon

VDT: An Empirical Study on Video Diffusion with Transformers

Add code
May 22, 2023
Viaarxiv icon

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Add code
Feb 13, 2023
Viaarxiv icon