Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fangdi Li

Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models

May 27, 2025

Zesen Lyu, Dandan Zhang, Wei Ye, Fangdi Li, Zhihang Jiang, Yao Yang

Abstract:Spatial reasoning is a core component of human cognition, enabling individuals to perceive, comprehend, and interact with the physical world. It relies on a nuanced understanding of spatial structures and inter-object relationships, serving as the foundation for complex reasoning and decision-making. To investigate whether current vision-language models (VLMs) exhibit similar capability, we introduce Jigsaw-Puzzles, a novel benchmark consisting of 1,100 carefully curated real-world images with high spatial complexity. Based on this dataset, we design five tasks to rigorously evaluate VLMs' spatial perception, structural understanding, and reasoning capabilities, while deliberately minimizing reliance on domain-specific knowledge to better isolate and assess the general spatial reasoning capability. We conduct a comprehensive evaluation across 24 state-of-the-art VLMs. The results show that even the strongest model, Gemini-2.5-Pro, achieves only 77.14% overall accuracy and performs particularly poorly on the Order Generation task, with only 30.00% accuracy, far below the performance exceeding 90% achieved by human participants. This persistent gap underscores the need for continued progress, positioning Jigsaw-Puzzles as a challenging and diagnostic benchmark for advancing spatial reasoning research in VLMs.

Via

Access Paper or Ask Questions

A Cooperative Positioning Flamework for Robot and Smart Phone Based on Visible Light Communication

Aug 25, 2022

Junye Chen, Fangdi Li, Futong An, Chen Yang, Hongzhan Song, Shangsheng Wen, Weipeng Guan

Abstract:A cooperative positioning flamework of human and robots based on visible light communication (VLC) is proposed. Based on the experiment system, we demonstrated it has high accuracy and real-time performance.

* high accuracy, cooperative positioning system

Via

Access Paper or Ask Questions