Picture for Zhaokai Wang

Zhaokai Wang

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing

Add code
Mar 12, 2026
Viaarxiv icon

InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

Add code
Mar 10, 2026
Viaarxiv icon

GenExam: A Multidisciplinary Text-to-Image Exam

Add code
Sep 17, 2025
Figure 1 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 2 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 3 for GenExam: A Multidisciplinary Text-to-Image Exam
Figure 4 for GenExam: A Multidisciplinary Text-to-Image Exam
Viaarxiv icon

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Add code
Aug 25, 2025
Figure 1 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 2 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 3 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Figure 4 for InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Viaarxiv icon

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

Add code
Aug 06, 2025
Figure 1 for OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Figure 2 for OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Figure 3 for OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Figure 4 for OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Viaarxiv icon

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Add code
Apr 18, 2025
Viaarxiv icon

Vision-to-Music Generation: A Survey

Add code
Mar 27, 2025
Viaarxiv icon

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Add code
Mar 10, 2025
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Add code
Dec 12, 2024
Viaarxiv icon