Picture for Zhi Gao

Zhi Gao

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China

VUDG: A Dataset for Video Understanding Domain Generalization

Add code
May 30, 2025
Viaarxiv icon

When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways

Add code
May 30, 2025
Viaarxiv icon

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Viaarxiv icon

Memory-Centric Embodied Question Answer

Add code
May 20, 2025
Viaarxiv icon

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Add code
May 06, 2025
Viaarxiv icon

Iterative Trajectory Exploration for Multimodal Agents

Add code
Apr 30, 2025
Viaarxiv icon

TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials

Add code
Apr 17, 2025
Viaarxiv icon

Building LLM Agents by Incorporating Insights from Computer Systems

Add code
Apr 06, 2025
Viaarxiv icon

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

Add code
Feb 27, 2025
Viaarxiv icon

Large-Scale Riemannian Meta-Optimization via Subspace Adaptation

Add code
Jan 25, 2025
Viaarxiv icon