Picture for Zhiqi Ge

Zhiqi Ge

On Path to Multimodal Generalist: General-Level and General-Bench

Add code
May 07, 2025
Viaarxiv icon

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Add code
Dec 13, 2024
Figure 1 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 2 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 3 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Figure 4 for Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining
Viaarxiv icon

Unified Generative and Discriminative Training for Multi-modal Large Language Models

Add code
Nov 01, 2024
Figure 1 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 2 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 3 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 4 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Viaarxiv icon

WorldGPT: Empowering LLM as Multimodal World Model

Add code
Apr 28, 2024
Figure 1 for WorldGPT: Empowering LLM as Multimodal World Model
Figure 2 for WorldGPT: Empowering LLM as Multimodal World Model
Figure 3 for WorldGPT: Empowering LLM as Multimodal World Model
Figure 4 for WorldGPT: Empowering LLM as Multimodal World Model
Viaarxiv icon

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

Add code
Aug 10, 2023
Viaarxiv icon