Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiayin Zheng

SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images

May 12, 2026

Zishan Liu, Ruoxi Zang, Yanglin Zhang, Wei Liu, Yin Zhang, Jian Yao, Jiayin Zheng, Zhengzhe Liu

Abstract:Recent advancements in Large Vision-Language Models (VLMs) have demonstrated exceptional semantic understanding, yet these models consistently struggle with spatial reasoning, often failing at fundamental geometric tasks such as depth ordering and precise coordinate grounding. Recent efforts introduce spatial supervision from scene-centric datasets (e.g., multi-view scans or indoor video), but are constrained by the limited number of underlying scenes. As a result, the scale and diversity of such data remain significantly smaller than those of web-scale 2D image collections. To address this limitation, we propose SpatialForge, a scalable data synthesis pipeline that transforms in-the-wild 2D images into spatial reasoning supervision. Our approach decomposes spatial reasoning into perception and relation, and constructs structured supervision signals covering depth, layout, and viewpoint-dependent reasoning, with automatic verification to ensure data quality. Based on this pipeline, we build SpatialForge-10M, a large-scale dataset containing 10 million spatial QA pairs. Extensive experiments across multiple spatial reasoning benchmarks demonstrate that training on SpatialForge-10M significantly improves the spatial reasoning ability of standard VLMs, highlighting the effectiveness of scaling 2D data for 3D-aware spatial reasoning.

Via

Access Paper or Ask Questions

MHSnet: Multi-head and Spatial Attention Network with False-Positive Reduction for Pulmonary Nodules Detection

Feb 16, 2022

Juanyun Mai, Minghao Wang, Jiayin Zheng, Yanbo Shao, Zhaoqi Diao, Xinliang Fu, Yulong Chen, Jianyu Xiao, Jian You, Airu Yin(+5 more)

Figure 1 for MHSnet: Multi-head and Spatial Attention Network with False-Positive Reduction for Pulmonary Nodules Detection

Figure 2 for MHSnet: Multi-head and Spatial Attention Network with False-Positive Reduction for Pulmonary Nodules Detection

Figure 3 for MHSnet: Multi-head and Spatial Attention Network with False-Positive Reduction for Pulmonary Nodules Detection

Figure 4 for MHSnet: Multi-head and Spatial Attention Network with False-Positive Reduction for Pulmonary Nodules Detection

Abstract:The mortality of lung cancer has ranked high among cancers for many years. Early detection of lung cancer is critical for disease prevention, cure, and mortality rate reduction. However, existing detection methods on pulmonary nodules introduce an excessive number of false positive proposals in order to achieve high sensitivity, which is not practical in clinical situations. In this paper, we propose the multi-head detection and spatial squeeze-and-attention network, MHSnet, to detect pulmonary nodules, in order to aid doctors in the early diagnosis of lung cancers. Specifically, we first introduce multi-head detectors and skip connections to customize for the variety of nodules in sizes, shapes and types and capture multi-scale features. Then, we implement a spatial attention module to enable the network to focus on different regions differently inspired by how experienced clinicians screen CT images, which results in fewer false positive proposals. Lastly, we present a lightweight but effective false positive reduction module with the Linear Regression model to cut down the number of false positive proposals, without any constraints on the front network. Extensive experimental results compared with the state-of-the-art models have shown the superiority of the MHSnet in terms of the average FROC, sensitivity and especially false discovery rate (2.98% and 2.18% improvement in terms of average FROC and sensitivity, 5.62% and 28.33% decrease in terms of false discovery rate and average candidates per scan). The false positive reduction module significantly decreases the average number of candidates generated per scan by 68.11% and the false discovery rate by 13.48%, which is promising to reduce distracted proposals for the downstream tasks based on the detection results.

Via

Access Paper or Ask Questions

A Coarse-to-fine Morphological Approach With Knowledge-based Rules and Self-adapting Correction for Lung Nodules Segmentation

Feb 07, 2022

Xinliang Fu, Jiayin Zheng, Juanyun Mai, Yanbo Shao, Minghao Wang, Linyu Li, Zhaoqi Diao, Yulong Chen, Jianyu Xiao, Jian You(+6 more)

Figure 1 for A Coarse-to-fine Morphological Approach With Knowledge-based Rules and Self-adapting Correction for Lung Nodules Segmentation

Figure 2 for A Coarse-to-fine Morphological Approach With Knowledge-based Rules and Self-adapting Correction for Lung Nodules Segmentation

Figure 3 for A Coarse-to-fine Morphological Approach With Knowledge-based Rules and Self-adapting Correction for Lung Nodules Segmentation

Figure 4 for A Coarse-to-fine Morphological Approach With Knowledge-based Rules and Self-adapting Correction for Lung Nodules Segmentation

Abstract:The segmentation module which precisely outlines the nodules is a crucial step in a computer-aided diagnosis(CAD) system. The most challenging part of such a module is how to achieve high accuracy of the segmentation, especially for the juxtapleural, non-solid and small nodules. In this research, we present a coarse-to-fine methodology that greatly improves the thresholding method performance with a novel self-adapting correction algorithm and effectively removes noisy pixels with well-defined knowledge-based principles. Compared with recent strong morphological baselines, our algorithm, by combining dataset features, achieves state-of-the-art performance on both the public LIDC-IDRI dataset (DSC 0.699) and our private LC015 dataset (DSC 0.760) which closely approaches the SOTA deep learning-based models' performances. Furthermore, unlike most available morphological methods that can only segment the isolated and well-circumscribed nodules accurately, the precision of our method is totally independent of the nodule type or diameter, proving its applicability and generality.

Via

Access Paper or Ask Questions