Picture for Zhaopan Xu

Zhaopan Xu

MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams

Add code
Aug 09, 2025
Viaarxiv icon

Sekai: A Video Dataset towards World Exploration

Add code
Jun 18, 2025
Viaarxiv icon

Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans

Add code
May 16, 2025
Viaarxiv icon

AI Idea Bench 2025: AI Research Idea Generation Benchmark

Add code
Apr 19, 2025
Viaarxiv icon

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Add code
Apr 08, 2025
Viaarxiv icon

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification

Add code
Mar 16, 2025
Figure 1 for MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
Figure 2 for MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
Figure 3 for MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
Figure 4 for MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
Viaarxiv icon

PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models

Add code
Mar 16, 2025
Viaarxiv icon

ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

Add code
Mar 09, 2025
Viaarxiv icon

Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation

Add code
Dec 18, 2024
Figure 1 for Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation
Figure 2 for Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation
Figure 3 for Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation
Figure 4 for Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation
Viaarxiv icon

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Add code
Dec 01, 2024
Viaarxiv icon