Picture for Dingming Li

Dingming Li

WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics

Add code
Mar 11, 2026
Viaarxiv icon

CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

Add code
Mar 09, 2026
Viaarxiv icon

STEP3-VL-10B Technical Report

Add code
Jan 15, 2026
Viaarxiv icon

OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks

Add code
Aug 07, 2025
Viaarxiv icon

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

Add code
May 27, 2025
Figure 1 for ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
Figure 2 for ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
Figure 3 for ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
Figure 4 for ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
Viaarxiv icon