Picture for Zhi Gao

Zhi Gao

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China

From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration

Add code
Dec 23, 2025
Viaarxiv icon

Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds

Add code
Oct 31, 2025
Figure 1 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 2 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 3 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 4 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Viaarxiv icon

GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks

Add code
Oct 30, 2025
Viaarxiv icon

KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints

Add code
Oct 22, 2025
Viaarxiv icon

Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning

Add code
Oct 06, 2025
Figure 1 for Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Figure 2 for Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Figure 3 for Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Figure 4 for Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Viaarxiv icon

Curvature Learning for Generalization of Hyperbolic Neural Networks

Add code
Aug 24, 2025
Viaarxiv icon

Hyperbolic Dual Feature Augmentation for Open-Environment

Add code
Jun 10, 2025
Viaarxiv icon

When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways

Add code
May 30, 2025
Viaarxiv icon

VUDG: A Dataset for Video Understanding Domain Generalization

Add code
May 30, 2025
Viaarxiv icon

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Figure 1 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 2 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 3 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 4 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Viaarxiv icon