Abstract:Multi-traversal scene reconstruction is important for high-fidelity autonomous driving simulation and digital twin construction. This task involves integrating multiple sequences captured from the same geographical area at different times. In this context, a primary challenge is the significant appearance inconsistency across traversals caused by varying illumination and environmental conditions, despite the shared underlying geometry. This paper presents ADM-GS (Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction), a framework that applies an explicit appearance decomposition to the static background to alleviate appearance entanglement across traversals. For the static background, we decompose the appearance into traversal-invariant material, representing intrinsic material properties, and traversal-dependent illumination, capturing lighting variations. Specifically, we propose a neural light field that utilizes a frequency-separated hybrid encoding strategy. By incorporating surface normals and explicit reflection vectors, this design separately captures low-frequency diffuse illumination and high-frequency specular reflections. Quantitative evaluations on the Argoverse 2 and Waymo Open datasets demonstrate the effectiveness of ADM-GS. In multi-traversal experiments, our method achieves a +0.98 dB PSNR improvement over existing latent-based baselines while producing more consistent appearance across traversals. Code will be available at https://github.com/IRMVLab/ADM-GS.




Abstract:Monocular depth prediction has been well studied recently, while there are few works focused on the depth prediction across multiple environments, e.g. changing illumination and seasons, owing to the lack of such real-world dataset and benchmark. In this work, we derive a new cross-season scaleless monocular depth prediction dataset SeasonDepth from CMU Visual Localization dataset through structure from motion. And then we formulate several metrics to benchmark the performance under different environments using recent stateof-the-art open-source depth prediction pretrained models from KITTI benchmark. Through extensive zero-shot experimental evaluation on the proposed dataset, we show that the long-term monocular depth prediction is far from solved and provide promising solutions in the future work, including geometricbased or scale-invariant training. Moreover, multi-environment synthetic dataset and cross-dataset validataion are beneficial to the robustness to real-world environmental variance.