Picture for Hongsheng Li

Hongsheng Li

Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation

Add code
Nov 17, 2025
Viaarxiv icon

RelightMaster: Precise Video Relighting with Multi-plane Light Images

Add code
Nov 09, 2025
Viaarxiv icon

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

Add code
Oct 16, 2025
Viaarxiv icon

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

Add code
Oct 14, 2025
Viaarxiv icon

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Add code
Oct 06, 2025
Viaarxiv icon

VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing

Add code
Sep 26, 2025
Viaarxiv icon

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

Add code
Sep 26, 2025
Viaarxiv icon

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Add code
Sep 11, 2025
Viaarxiv icon

One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning

Add code
Sep 09, 2025
Viaarxiv icon