Picture for Mingze Zhou

Mingze Zhou

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

Add code
May 18, 2025
Viaarxiv icon

Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

Add code
May 12, 2025
Viaarxiv icon

WorldGPT: Empowering LLM as Multimodal World Model

Add code
Apr 28, 2024
Figure 1 for WorldGPT: Empowering LLM as Multimodal World Model
Figure 2 for WorldGPT: Empowering LLM as Multimodal World Model
Figure 3 for WorldGPT: Empowering LLM as Multimodal World Model
Figure 4 for WorldGPT: Empowering LLM as Multimodal World Model
Viaarxiv icon

Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks

Add code
Nov 09, 2023
Viaarxiv icon