Picture for Hongsheng Li

Hongsheng Li

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

Add code
Feb 03, 2026
Viaarxiv icon

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Add code
Feb 01, 2026
Viaarxiv icon

PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios

Add code
Jan 30, 2026
Viaarxiv icon

SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics

Add code
Jan 14, 2026
Viaarxiv icon

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Add code
Jan 04, 2026
Viaarxiv icon

ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving

Add code
Dec 31, 2025
Viaarxiv icon

Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation

Add code
Nov 17, 2025
Figure 1 for Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
Figure 2 for Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
Figure 3 for Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
Figure 4 for Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
Viaarxiv icon

RelightMaster: Precise Video Relighting with Multi-plane Light Images

Add code
Nov 09, 2025
Viaarxiv icon

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Add code
Oct 30, 2025
Viaarxiv icon

MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

Add code
Oct 16, 2025
Viaarxiv icon