Picture for Haonan Lu

Haonan Lu

When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

Add code
Mar 22, 2026
Viaarxiv icon

Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA

Add code
Mar 19, 2026
Viaarxiv icon

Thinking in Streaming Video

Add code
Mar 13, 2026
Viaarxiv icon

Learning from Prompt itself: the Hierarchical Attribution Prompt Optimization

Add code
Jan 06, 2026
Viaarxiv icon

Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation

Add code
Nov 15, 2025
Figure 1 for Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Figure 2 for Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Figure 3 for Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Figure 4 for Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Viaarxiv icon

OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward

Add code
Aug 27, 2025
Figure 1 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 2 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 3 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 4 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Viaarxiv icon

Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation

Add code
Aug 12, 2025
Viaarxiv icon

X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning

Add code
Aug 11, 2025
Viaarxiv icon

Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM

Add code
May 26, 2025
Figure 1 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 2 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 3 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 4 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Viaarxiv icon

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Add code
Apr 01, 2025
Viaarxiv icon