Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenjing Bian

Scene Coordinate Reconstruction Priors

Oct 14, 2025

Wenjing Bian, Axel Barroso-Laguna, Tommaso Cavallari, Victor Adrian Prisacariu, Eric Brachmann

Abstract:Scene coordinate regression (SCR) models have proven to be powerful implicit scene representations for 3D vision, enabling visual relocalization and structure-from-motion. SCR models are trained specifically for one scene. If training images imply insufficient multi-view constraints SCR models degenerate. We present a probabilistic reinterpretation of training SCR models, which allows us to infuse high-level reconstruction priors. We investigate multiple such priors, ranging from simple priors over the distribution of reconstructed depth values to learned priors over plausible scene coordinate configurations. For the latter, we train a 3D point cloud diffusion model on a large corpus of indoor scans. Our priors push predicted 3D scene points towards plausible geometry at each training step to increase their likelihood. On three indoor datasets our priors help learning better scene representations, resulting in more coherent scene point clouds, higher registration rates and better camera poses, with a positive effect on down-stream tasks such as novel view synthesis and camera relocalization.

* ICCV 2025, Project page: https://nianticspatial.github.io/scr-priors/

Via

Access Paper or Ask Questions

Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset

Jun 04, 2025

Zirui Wang, Wenjing Bian, Xinghui Li, Yifu Tao, Jianeng Wang, Maurice Fallon, Victor Adrian Prisacariu

Figure 1 for Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset

Figure 2 for Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset

Figure 3 for Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset

Figure 4 for Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset

Abstract:We introduce Oxford Day-and-Night, a large-scale, egocentric dataset for novel view synthesis (NVS) and visual relocalisation under challenging lighting conditions. Existing datasets often lack crucial combinations of features such as ground-truth 3D geometry, wide-ranging lighting variation, and full 6DoF motion. Oxford Day-and-Night addresses these gaps by leveraging Meta ARIA glasses to capture egocentric video and applying multi-session SLAM to estimate camera poses, reconstruct 3D point clouds, and align sequences captured under varying lighting conditions, including both day and night. The dataset spans over 30 $\mathrm{km}$ of recorded trajectories and covers an area of 40,000 $\mathrm{m}^2$, offering a rich foundation for egocentric 3D vision research. It supports two core benchmarks, NVS and relocalisation, providing a unique platform for evaluating models in realistic and diverse environments.

* Project page: https://oxdan.active.vision/

Via

Access Paper or Ask Questions

CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Aug 22, 2024

Wenjing Bian, Zirui Wang, Andrea Vedaldi

Figure 1 for CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Figure 2 for CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Figure 3 for CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Figure 4 for CatFree3D: Category-agnostic 3D Object Detection with Diffusion

Abstract:Image-based 3D object detection is widely employed in applications such as autonomous vehicles and robotics, yet current systems struggle with generalisation due to complex problem setup and limited training data. We introduce a novel pipeline that decouples 3D detection from 2D detection and depth prediction, using a diffusion-based approach to improve accuracy and support category-agnostic detection. Additionally, we introduce the Normalised Hungarian Distance (NHD) metric for an accurate evaluation of 3D detection results, addressing the limitations of traditional IoU and GIoU metrics. Experimental results demonstrate that our method achieves state-of-the-art accuracy and strong generalisation across various object categories and datasets.

* Project page: https://bianwenjing.github.io/CatFree3D

Via

Access Paper or Ask Questions

CrossScore: Towards Multi-View Image Evaluation and Scoring

Apr 22, 2024

Zirui Wang, Wenjing Bian, Omkar Parkhi, Yuheng Ren, Victor Adrian Prisacariu

Abstract:We introduce a novel cross-reference image quality assessment method that effectively fills the gap in the image assessment landscape, complementing the array of established evaluation schemes -- ranging from full-reference metrics like SSIM, no-reference metrics such as NIQE, to general-reference metrics including FID, and Multi-modal-reference metrics, e.g., CLIPScore. Utilising a neural network with the cross-attention mechanism and a unique data collection pipeline from NVS optimisation, our method enables accurate image quality assessment without requiring ground truth references. By comparing a query image against multiple views of the same scene, our method addresses the limitations of existing metrics in novel view synthesis (NVS) and similar tasks where direct reference images are unavailable. Experimental results show that our method is closely correlated to the full-reference metric SSIM, while not requiring ground truth references.

* Project page see https://crossscore.active.vision

Via

Access Paper or Ask Questions

PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction

Oct 12, 2023

Jia-Wang Bian, Wenjing Bian, Victor Adrian Prisacariu, Philip Torr

Abstract:Neural surface reconstruction is sensitive to the camera pose noise, even if state-of-the-art pose estimators like COLMAP or ARKit are used. More importantly, existing Pose-NeRF joint optimisation methods have struggled to improve pose accuracy in challenging real-world scenarios. To overcome the challenges, we introduce the pose residual field (\textbf{PoRF}), a novel implicit representation that uses an MLP for regressing pose updates. This is more robust than the conventional pose parameter optimisation due to parameter sharing that leverages global information over the entire sequence. Furthermore, we propose an epipolar geometry loss to enhance the supervision that leverages the correspondences exported from COLMAP results without the extra computational overhead. Our method yields promising results. On the DTU dataset, we reduce the rotation error by 78\% for COLMAP poses, leading to the decreased reconstruction Chamfer distance from 3.48mm to 0.85mm. On the MobileBrick dataset that contains casually captured unbounded 360-degree videos, our method refines ARKit poses and improves the reconstruction F1 score from 69.18 to 75.67, outperforming that with the dataset provided ground-truth pose (75.14). These achievements demonstrate the efficacy of our approach in refining camera poses and improving the accuracy of neural surface reconstruction in real-world scenarios.

* Under review

Via

Access Paper or Ask Questions

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Dec 14, 2022

Wenjing Bian, Zirui Wang, Kejie Li, Jia-Wang Bian, Victor Adrian Prisacariu

Figure 1 for NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Figure 2 for NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Figure 3 for NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Figure 4 for NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Abstract:Training a Neural Radiance Field (NeRF) without pre-computed camera poses is challenging. Recent advances in this direction demonstrate the possibility of jointly optimising a NeRF and camera poses in forward-facing scenes. However, these methods still face difficulties during dramatic camera movement. We tackle this challenging problem by incorporating undistorted monocular depth priors. These priors are generated by correcting scale and shift parameters during training, with which we are then able to constrain the relative poses between consecutive frames. This constraint is achieved using our proposed novel loss functions. Experiments on real-world indoor and outdoor scenes show that our method can handle challenging camera trajectories and outperforms existing methods in terms of novel view rendering quality and pose estimation accuracy.

Via

Access Paper or Ask Questions

Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Jul 05, 2021

Wenjing Bian, Zirui Wang, Kejie Li, Victor Adrian Prisacariu

Figure 1 for Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Figure 2 for Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Figure 3 for Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Figure 4 for Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

Abstract:We propose Ray-ONet to reconstruct detailed 3D models from monocular images efficiently. By predicting a series of occupancy probabilities along a ray that is back-projected from a pixel in the camera coordinate, our method Ray-ONet improves the reconstruction accuracy in comparison with Occupancy Networks (ONet), while reducing the network inference complexity to O($N^2$). As a result, Ray-ONet achieves state-of-the-art performance on the ShapeNet benchmark with more than 20$\times$ speed-up at $128^3$ resolution and maintains a similar memory footprint during inference.

Via

Access Paper or Ask Questions