Picture for Zhi Gao

Zhi Gao

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China

Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds

Add code
Oct 31, 2025
Figure 1 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 2 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 3 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Figure 4 for Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Viaarxiv icon

GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks

Add code
Oct 30, 2025
Viaarxiv icon

KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints

Add code
Oct 22, 2025
Viaarxiv icon

Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning

Add code
Oct 06, 2025
Figure 1 for Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Figure 2 for Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Figure 3 for Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Figure 4 for Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning
Viaarxiv icon

Curvature Learning for Generalization of Hyperbolic Neural Networks

Add code
Aug 24, 2025
Viaarxiv icon

Hyperbolic Dual Feature Augmentation for Open-Environment

Add code
Jun 10, 2025
Viaarxiv icon

When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways

Add code
May 30, 2025
Viaarxiv icon

VUDG: A Dataset for Video Understanding Domain Generalization

Add code
May 30, 2025
Viaarxiv icon

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Figure 1 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 2 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 3 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Figure 4 for Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL
Viaarxiv icon

Memory-Centric Embodied Question Answer

Add code
May 20, 2025
Figure 1 for Memory-Centric Embodied Question Answer
Figure 2 for Memory-Centric Embodied Question Answer
Figure 3 for Memory-Centric Embodied Question Answer
Figure 4 for Memory-Centric Embodied Question Answer
Viaarxiv icon