Picture for Haodong Duan

Haodong Duan

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

Add code
May 11, 2026
Viaarxiv icon

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification

Add code
Apr 02, 2026
Viaarxiv icon

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

Add code
Mar 29, 2026
Viaarxiv icon

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

Add code
Mar 25, 2026
Viaarxiv icon

MIBench: Evaluating LMMs on Multimodal Interaction

Add code
Mar 13, 2026
Viaarxiv icon

RISE-Video: Can Video Generators Decode Implicit World Rules?

Add code
Feb 05, 2026
Viaarxiv icon

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Add code
Dec 30, 2025
Viaarxiv icon

Think Visually, Reason Textually: Vision-Language Synergy in ARC

Add code
Nov 19, 2025
Figure 1 for Think Visually, Reason Textually: Vision-Language Synergy in ARC
Figure 2 for Think Visually, Reason Textually: Vision-Language Synergy in ARC
Figure 3 for Think Visually, Reason Textually: Vision-Language Synergy in ARC
Figure 4 for Think Visually, Reason Textually: Vision-Language Synergy in ARC
Viaarxiv icon

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Add code
Nov 18, 2025
Viaarxiv icon

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning

Add code
Oct 31, 2025
Viaarxiv icon