Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanglin Zhang

SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images

May 12, 2026

Zishan Liu, Ruoxi Zang, Yanglin Zhang, Wei Liu, Yin Zhang, Jian Yao, Jiayin Zheng, Zhengzhe Liu

Abstract:Recent advancements in Large Vision-Language Models (VLMs) have demonstrated exceptional semantic understanding, yet these models consistently struggle with spatial reasoning, often failing at fundamental geometric tasks such as depth ordering and precise coordinate grounding. Recent efforts introduce spatial supervision from scene-centric datasets (e.g., multi-view scans or indoor video), but are constrained by the limited number of underlying scenes. As a result, the scale and diversity of such data remain significantly smaller than those of web-scale 2D image collections. To address this limitation, we propose SpatialForge, a scalable data synthesis pipeline that transforms in-the-wild 2D images into spatial reasoning supervision. Our approach decomposes spatial reasoning into perception and relation, and constructs structured supervision signals covering depth, layout, and viewpoint-dependent reasoning, with automatic verification to ensure data quality. Based on this pipeline, we build SpatialForge-10M, a large-scale dataset containing 10 million spatial QA pairs. Extensive experiments across multiple spatial reasoning benchmarks demonstrate that training on SpatialForge-10M significantly improves the spatial reasoning ability of standard VLMs, highlighting the effectiveness of scaling 2D data for 3D-aware spatial reasoning.

Via

Access Paper or Ask Questions

Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions

Mar 17, 2025

Quanyuan Ruan, Jiabao Lei, Wenhao Yuan, Yanglin Zhang, Dekun Lu, Guiliang Liu, Kui Jia

Figure 1 for Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions

Figure 2 for Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions

Figure 3 for Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions

Figure 4 for Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions

Abstract:Differentiable rendering has gained significant attention in the field of robotics, with differentiable robot rendering emerging as an effective paradigm for learning robotic actions from image-space supervision. However, the lack of physical world perception in this approach may lead to potential collisions during action optimization. In this work, we introduce a novel improvement on previous efforts by incorporating physical awareness of collisions through the learning of a neural robotic collision classifier. This enables the optimization of actions that avoid collisions with static, non-interactable environments as well as the robot itself. To facilitate effective gradient optimization with the classifier, we identify the underlying issue and propose leveraging Eikonal regularization to ensure consistent gradients for optimization. Our solution can be seamlessly integrated into existing differentiable robot rendering frameworks, utilizing gradients for optimization and providing a foundation for future applications of differentiable rendering in robotics with improved reliability of interactions with the physical world. Both qualitative and quantitative experiments demonstrate the necessity and effectiveness of our method compared to previous solutions.

Via

Access Paper or Ask Questions

SCORE: Saturated Consensus Relocalization in Semantic Line Maps

Mar 05, 2025

Haodong Jiang, Xiang Zheng, Yanglin Zhang, Qingcheng Zeng, Yiqian Li, Ziyang Hong, Junfeng Wu

Figure 1 for SCORE: Saturated Consensus Relocalization in Semantic Line Maps

Figure 2 for SCORE: Saturated Consensus Relocalization in Semantic Line Maps

Figure 3 for SCORE: Saturated Consensus Relocalization in Semantic Line Maps

Figure 4 for SCORE: Saturated Consensus Relocalization in Semantic Line Maps

Abstract:This is the arxiv version for our paper submitted to IEEE/RSJ IROS 2025. We propose a scene-agnostic and light-weight visual relocalization framework that leverages semantically labeled 3D lines as a compact map representation. In our framework, the robot localizes itself by capturing a single image, extracting 2D lines, associating them with semantically similar 3D lines in the map, and solving a robust perspective-n-line problem. To address the extremely high outlier ratios~(exceeding 99.5\%) caused by one-to-many ambiguities in semantic matching, we introduce the Saturated Consensus Maximization~(Sat-CM) formulation, which enables accurate pose estimation when the classic Consensus Maximization framework fails. We further propose a fast global solver to the formulated Sat-CM problems, leveraging rigorous interval analysis results to ensure both accuracy and computational efficiency. Additionally, we develop a pipeline for constructing semantic 3D line maps using posed depth images. To validate the effectiveness of our framework, which integrates our innovations in robust estimation and practical engineering insights, we conduct extensive experiments on the ScanNet++ dataset.

* 11 pages, 14 figurs, arxiv version for paper submitted to IROS 2025

Via

Access Paper or Ask Questions

$S^2$NeRF: Privacy-preserving Training Framework for NeRF

Sep 03, 2024

Bokang Zhang, Yanglin Zhang, Zhikun Zhang, Jinglan Yang, Lingying Huang, Junfeng Wu

Abstract:Neural Radiance Fields (NeRF) have revolutionized 3D computer vision and graphics, facilitating novel view synthesis and influencing sectors like extended reality and e-commerce. However, NeRF's dependence on extensive data collection, including sensitive scene image data, introduces significant privacy risks when users upload this data for model training. To address this concern, we first propose SplitNeRF, a training framework that incorporates split learning (SL) techniques to enable privacy-preserving collaborative model training between clients and servers without sharing local data. Despite its benefits, we identify vulnerabilities in SplitNeRF by developing two attack methods, Surrogate Model Attack and Scene-aided Surrogate Model Attack, which exploit the shared gradient data and a few leaked scene images to reconstruct private scene information. To counter these threats, we introduce $S^2$NeRF, secure SplitNeRF that integrates effective defense mechanisms. By introducing decaying noise related to the gradient norm into the shared gradient information, $S^2$NeRF preserves privacy while maintaining a high utility of the NeRF model. Our extensive evaluations across multiple datasets demonstrate the effectiveness of $S^2$NeRF against privacy breaches, confirming its viability for secure NeRF training in sensitive applications.

* To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

Via

Access Paper or Ask Questions