Picture for Yanhao Zhang

Yanhao Zhang

PIGEON: VLM-Driven Object Navigation via Points of Interest Selection

Add code
Nov 17, 2025
Viaarxiv icon

Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation

Add code
Nov 15, 2025
Viaarxiv icon

Non-Rigid Structure-from-Motion via Differential Geometry with Recoverable Conformal Scale

Add code
Oct 02, 2025
Viaarxiv icon

OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward

Add code
Aug 27, 2025
Figure 1 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 2 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 3 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Figure 4 for OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward
Viaarxiv icon

Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM

Add code
May 26, 2025
Figure 1 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 2 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 3 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Figure 4 for Dynamic-I2V: Exploring Image-to-Video Generaion Models via Multimodal LLM
Viaarxiv icon

SPP-SBL: Space-Power Prior Sparse Bayesian Learning for Block Sparse Recovery

Add code
May 13, 2025
Viaarxiv icon

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Add code
Apr 01, 2025
Viaarxiv icon

H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding

Add code
Mar 31, 2025
Viaarxiv icon

Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens

Add code
Mar 12, 2025
Figure 1 for Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
Figure 2 for Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
Figure 3 for Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
Figure 4 for Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
Viaarxiv icon

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator

Add code
Nov 26, 2024
Figure 1 for HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Figure 2 for HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Figure 3 for HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Figure 4 for HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Viaarxiv icon