Historical buildings are a treasure and milestone of human cultural heritage. Reconstructing the 3D models of these building hold significant value. The rapid development of neural rendering methods makes it possible to recover the 3D shape only based on archival photographs. However, this task presents considerable challenges due to the limitations of such datasets. Historical photographs are often limited in number and the scenes in these photos might have altered over time. The radiometric quality of these images is also often sub-optimal. To address these challenges, we introduce an approach to reconstruct the geometry of historical buildings, employing volumetric rendering techniques. We leverage dense point clouds as a geometric prior and introduce a color appearance embedding loss to recover the color of the building given limited available color images. We aim for our work to spark increased interest and focus on preserving historical buildings. Thus, we also introduce a new historical dataset of the Hungarian National Theater, providing a new benchmark for the reconstruction method.
Synthesizing novel views for urban environments is crucial for tasks like autonomous driving and virtual tours. Compared to object-level or indoor situations, outdoor settings present unique challenges, such as inconsistency across frames due to moving vehicles and camera pose drift over lengthy sequences. In this paper, we introduce a method that tackles these challenges on view synthesis for outdoor scenarios. We employ a neural point light field scene representation and strategically detect and mask out dynamic objects to reconstruct novel scenes without artifacts. Moreover, we simultaneously optimize camera pose along with the view synthesis process, and thus, we simultaneously refine both elements. Through validation on real-world urban datasets, we demonstrate state-of-the-art results in synthesizing novel views of urban scenes.
Neural surface implicit representations offer numerous advantages, including the ability to easily modify topology and surface resolution. However, reconstructing implicit geometry representation with only limited known data is challenging. In this paper, we present an approach that effectively interpolates and extrapolates within training points, generating additional training data to reconstruct a surface with superior qualitative and quantitative results. We also introduce a technique that efficiently calculates differentiable geometric properties, i.e., mean and Gaussian curvatures, to enhance the sampling process during training. Additionally, we propose a weight-aware implicit neural representation that not only streamlines surface extraction but also extend to non-closed surfaces by depicting non-closed areas as locally degenerated patches, thereby mitigating the drawbacks of the previous assumption in implicit neural representations.
We present Gradient-SDF, a novel representation for 3D geometry that combines the advantages of implict and explicit representations. By storing at every voxel both the signed distance field as well as its gradient vector field, we enhance the capability of implicit representations with approaches originally formulated for explicit surfaces. As concrete examples, we show that (1) the Gradient-SDF allows us to perform direct SDF tracking from depth images, using efficient storage schemes like hash maps, and that (2) the Gradient-SDF representation enables us to perform photometric bundle adjustment directly in a voxel representation (without transforming into a point cloud or mesh), naturally a fully implicit optimization of geometry and camera poses and easy geometry upsampling. Experimental results confirm that this leads to significantly sharper reconstructions. Since the overall SDF voxel structure is still respected, the proposed Gradient-SDF is equally suited for (GPU) parallelization as related approaches.
The empirical results suggest that the learnability of a neural network is directly related to its size. To mathematically prove this, we borrow a tool in topological algebra: Betti numbers to measure the topological geometric complexity of input data and the neural network. By characterizing the expressive capacity of a neural network with its topological complexity, we conduct a thorough analysis and show that the network's expressive capacity is limited by the scale of its layers. Further, we derive the upper bounds of the Betti numbers on each layer within the network. As a result, the problem of architecture selection of a neural network is transformed to determining the scale of the network that can represent the input data complexity. With the presented results, the architecture selection of a fully connected network boils down to choosing a suitable size of the network such that it equips the Betti numbers that are not smaller than the Betti numbers of the input data. We perform the experiments on a real-world dataset MNIST and the results verify our analysis and conclusion. The code will be publicly available.
A novel approach towards depth map super-resolution using multi-view uncalibrated photometric stereo is presented. Practically, an LED light source is attached to a commodity RGB-D sensor and is used to capture objects from multiple viewpoints with unknown motion. This non-static camera-to-object setup is described with a nonconvex variational approach such that no calibration on lighting or camera motion is required due to the formulation of an end-to-end joint optimization problem. Solving the proposed variational model results in high resolution depth, reflectance and camera pose estimates, as we show on challenging synthetic and real-world datasets.