Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Steve Marschner

ArchSym: Detecting 3D-Grounded Architectural Symmetries in the Wild

Apr 24, 2026

Hanyu Chen, Ruojin Cai, Steve Marschner, Noah Snavely

Abstract:Symmetry detection is a fundamental problem in computer vision, and symmetries serve as powerful priors for downstream tasks. However, existing learning-based methods for detecting 3D symmetries from single images have been almost exclusively trained and evaluated on object-centric or synthetic datasets, and thus fail to generalize to real-world scenes. Furthermore, due to the inherent scale ambiguity of monocular inputs, which makes localizing the 3D plane an ill-posed problem, many existing works only predict the plane's orientation. In this paper, we address these limitations by presenting the first framework for detecting 3D-grounded reflectional symmetries from single, in-the-wild RGB images, focusing on architectural landmarks. We introduce two key innovations: (1) a scalable data annotation pipeline to automatically curate a large-scale dataset of architectural symmetries, ArchSym, from SfM reconstructions by leveraging cross-view image matching; and building on the dataset, (2) a single-view symmetry detector that accurately localizes symmetries in 3D by parameterizing them as signed distance maps defined relative to predicted scene geometry. We validate our symmetry annotation pipeline against geometry-based alternatives and demonstrate that our symmetry detector significantly outperforms state-of-the-art baselines on our new benchmark.

* project page: https://hanyuc.com/archsym/

Via

Access Paper or Ask Questions

Seeing Fast and Slow: Learning the Flow of Time in Videos

Apr 23, 2026

Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma

Abstract:How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual concept and develop models for reasoning about and manipulating the flow of time in videos. We first exploit the multimodal cues and temporal structure naturally present in videos to learn, in a self-supervised manner, to detect speed changes and estimate playback speed. We then show that these learned temporal reasoning models enable us to curate the largest slow-motion video dataset to date from noisy in-the-wild sources. Such slow-motion footage, typically filmed by high-speed cameras, contains substantially richer temporal detail than standard videos. Using this data, we further develop models capable of temporal control, including speed-conditioned video generation, which produces motion at specified playback speed, and temporal super-resolution, which tranforms low-FPS, blurry videos into high-FPS sequences with fine-grained temporal details. Our findings highlight time as a manipulable, perceptual dimension in video learning, opening doors to temporally controllable video generation, temporal forensics detection, and potentially richer world-models that understand how events unfold over time.

* Project page: https://seeing-fast-and-slow.github.io/

Via

Access Paper or Ask Questions

Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment

Dec 21, 2025

Youming Deng, Songyou Peng, Junyi Zhang, Kathryn Heal, Tiancheng Sun, John Flynn, Steve Marschner, Lucy Chai

Abstract:Novel View Synthesis (NVS) has traditionally relied on models with explicit 3D inductive biases combined with known camera parameters from Structure-from-Motion (SfM) beforehand. Recent vision foundation models like VGGT take an orthogonal approach -- 3D knowledge is gained implicitly through training data and loss objectives, enabling feed-forward prediction of both camera parameters and 3D representations directly from a set of uncalibrated images. While flexible, VGGT features lack explicit multi-view geometric consistency, and we find that improving such 3D feature consistency benefits both NVS and pose estimation tasks. We introduce Selfi, a self-improving 3D reconstruction pipeline via feature alignment, transforming a VGGT backbone into a high-fidelity 3D reconstruction engine by leveraging its own outputs as pseudo-ground-truth. Specifically, we train a lightweight feature adapter using a reprojection-based consistency loss, which distills VGGT outputs into a new geometrically-aligned feature space that captures spatial proximity in 3D. This enables state-of-the-art performance in both NVS and camera pose estimation, demonstrating that feature alignment is a highly beneficial step for downstream 3D reasoning.

* Project Page: https://denghilbert.github.io/selfi/

Via

Access Paper or Ask Questions

Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction

Feb 13, 2025

Youming Deng, Wenqi Xian, Guandao Yang, Leonidas Guibas, Gordon Wetzstein, Steve Marschner, Paul Debevec

Figure 1 for Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction

Figure 2 for Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction

Figure 3 for Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction

Figure 4 for Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction

Abstract:In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. In particular, our technique enables high-quality scene reconstruction from Large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. Our approach introduces a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortion, and demonstrates state-of-the-art performance on both synthetic and real-world datasets.

* Project Page: https://denghilbert.github.io/self-cali/

Via

Access Paper or Ask Questions

ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects

Jul 26, 2024

Gemmechu Hassena, Jonathan Moon, Ryan Fujii, Andrew Yuen, Noah Snavely, Steve Marschner, Bharath Hariharan

Figure 1 for ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects

Figure 2 for ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects

Figure 3 for ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects

Figure 4 for ObjectCarver: Semi-automatic segmentation, reconstruction and separation of 3D objects

Abstract:Implicit neural fields have made remarkable progress in reconstructing 3D surfaces from multiple images; however, they encounter challenges when it comes to separating individual objects within a scene. Previous work has attempted to tackle this problem by introducing a framework to train separate signed distance fields (SDFs) simultaneously for each of N objects and using a regularization term to prevent objects from overlapping. However, all of these methods require segmentation masks to be provided, which are not always readily available. We introduce our method, ObjectCarver, to tackle the problem of object separation from just click input in a single view. Given posed multi-view images and a set of user-input clicks to prompt segmentation of the individual objects, our method decomposes the scene into separate objects and reconstructs a high-quality 3D surface for each one. We introduce a loss function that prevents floaters and avoids inappropriate carving-out due to occlusion. In addition, we introduce a novel scene initialization method that significantly speeds up the process while preserving geometric details compared to previous approaches. Despite requiring neither ground truth masks nor monocular cues, our method outperforms baselines both qualitatively and quantitatively. In addition, we introduce a new benchmark dataset for evaluation.

* Project page is: https://objectcarver.github.io/

Via

Access Paper or Ask Questions

A Simple Approach to Differentiable Rendering of SDFs

May 14, 2024

Zichen Wang, Xi Deng, Ziyi Zhang, Wenzel Jakob, Steve Marschner

Figure 1 for A Simple Approach to Differentiable Rendering of SDFs

Figure 2 for A Simple Approach to Differentiable Rendering of SDFs

Figure 3 for A Simple Approach to Differentiable Rendering of SDFs

Figure 4 for A Simple Approach to Differentiable Rendering of SDFs

Abstract:We present a simple algorithm for differentiable rendering of surfaces represented by Signed Distance Fields (SDF), which makes it easy to integrate rendering into gradient-based optimization pipelines. To tackle visibility-related derivatives that make rendering non-differentiable, existing physically based differentiable rendering methods often rely on elaborate guiding data structures or reparameterization with a global impact on variance. In this article, we investigate an alternative that embraces nonzero bias in exchange for low variance and architectural simplicity. Our method expands the lower-dimensional boundary integral into a thin band that is easy to sample when the underlying surface is represented by an SDF. We demonstrate the performance and robustness of our formulation in end-to-end inverse rendering tasks, where it obtains results that are competitive with or superior to existing work.

Via

Access Paper or Ask Questions

Accurate Differential Operators for Hybrid Neural Fields

Dec 10, 2023

Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan

Figure 1 for Accurate Differential Operators for Hybrid Neural Fields

Figure 2 for Accurate Differential Operators for Hybrid Neural Fields

Figure 3 for Accurate Differential Operators for Hybrid Neural Fields

Figure 4 for Accurate Differential Operators for Hybrid Neural Fields

Abstract:Neural fields have become widely used in various fields, from shape representation to neural rendering, and for solving partial differential equations (PDEs). With the advent of hybrid neural field representations like Instant NGP that leverage small MLPs and explicit representations, these models train quickly and can fit large scenes. Yet in many applications like rendering and simulation, hybrid neural fields can cause noticeable and unreasonable artifacts. This is because they do not yield accurate spatial derivatives needed for these downstream applications. In this work, we propose two ways to circumvent these challenges. Our first approach is a post hoc operator that uses local polynomial-fitting to obtain more accurate derivatives from pre-trained hybrid neural fields. Additionally, we also propose a self-supervised fine-tuning approach that refines the neural field to yield accurate derivatives directly while preserving the initial signal. We show the application of our method on rendering, collision simulation, and solving PDEs. We observe that using our approach yields more accurate derivatives, reducing artifacts and leading to more accurate simulations in downstream applications.

Via

Access Paper or Ask Questions