Picture for Yanhao Zhang

Yanhao Zhang

Click-to-Ask: An AI Live Streaming Assistant with Offline Copywriting and Online Interactive QA

Add code
Mar 19, 2026
Viaarxiv icon

Thinking in Streaming Video

Add code
Mar 13, 2026
Viaarxiv icon

PIGEON: VLM-Driven Object Navigation via Points of Interest Selection

Add code
Nov 17, 2025
Figure 1 for PIGEON: VLM-Driven Object Navigation via Points of Interest Selection
Figure 2 for PIGEON: VLM-Driven Object Navigation via Points of Interest Selection
Figure 3 for PIGEON: VLM-Driven Object Navigation via Points of Interest Selection
Figure 4 for PIGEON: VLM-Driven Object Navigation via Points of Interest Selection
Viaarxiv icon

Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation

Add code
Nov 15, 2025
Figure 1 for Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Figure 2 for Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Figure 3 for Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Figure 4 for Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation
Viaarxiv icon

Non-Rigid Structure-from-Motion via Differential Geometry with Recoverable Conformal Scale

Add code
Oct 02, 2025
Viaarxiv icon

OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward

Add code
Aug 27, 2025
Figure 1 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 2 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 3 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 4 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Viaarxiv icon

Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM

Add code
May 26, 2025
Figure 1 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 2 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 3 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 4 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Viaarxiv icon

SPP-SBL: Space-Power Prior Sparse Bayesian Learning for Block Sparse Recovery

Add code
May 13, 2025
Viaarxiv icon

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Add code
Apr 01, 2025
Viaarxiv icon

H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding

Add code
Mar 31, 2025
Viaarxiv icon