Picture for Dongzhi Jiang

Dongzhi Jiang

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

Add code
Aug 13, 2025
Viaarxiv icon

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Add code
Jun 05, 2025
Viaarxiv icon

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Add code
May 01, 2025
Viaarxiv icon

ADT: Tuning Diffusion Models with Adversarial Supervision

Add code
Apr 15, 2025
Viaarxiv icon

PiSA: A Self-Augmented Data Engine and Training Strategy for 3D Understanding with Large Models

Add code
Mar 13, 2025
Viaarxiv icon

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems

Add code
Mar 13, 2025
Viaarxiv icon

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Add code
Feb 13, 2025
Figure 1 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 2 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 3 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Figure 4 for MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Viaarxiv icon

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Figure 1 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 2 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 3 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 4 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Viaarxiv icon

MAVIS: Mathematical Visual Instruction Tuning

Add code
Jul 11, 2024
Figure 1 for MAVIS: Mathematical Visual Instruction Tuning
Figure 2 for MAVIS: Mathematical Visual Instruction Tuning
Figure 3 for MAVIS: Mathematical Visual Instruction Tuning
Figure 4 for MAVIS: Mathematical Visual Instruction Tuning
Viaarxiv icon