Abstract:Recent NeRF methods on large-scale scenes have underlined the importance of scene decomposition for scalable NeRFs. Although achieving reasonable scalability, there are several critical problems remaining unexplored, i.e., learnable decomposition, modeling scene heterogeneity, and modeling efficiency. In this paper, we introduce Switch-NeRF++, a Heterogeneous Mixture of Hash Experts (HMoHE) network that addresses these challenges within a unified framework. It is a highly scalable NeRF that learns heterogeneous decomposition and heterogeneous NeRFs efficiently for large-scale scenes in an end-to-end manner. In our framework, a gating network learns to decomposes scenes and allocates 3D points to specialized NeRF experts. This gating network is co-optimized with the experts, by our proposed Sparsely Gated Mixture of Experts (MoE) NeRF framework. We incorporate a hash-based gating network and distinct heterogeneous hash experts. The hash-based gating efficiently learns the decomposition of the large-scale scene. The distinct heterogeneous hash experts consist of hash grids of different resolution ranges, enabling effective learning of the heterogeneous representation of different scene parts. These design choices make our framework an end-to-end and highly scalable NeRF solution for real-world large-scale scene modeling to achieve both quality and efficiency. We evaluate our accuracy and scalability on existing large-scale NeRF datasets and a new dataset with very large-scale scenes ($>6.5km^2$) from UrbanBIS. Extensive experiments demonstrate that our approach can be easily scaled to various large-scale scenes and achieve state-of-the-art scene rendering accuracy. Furthermore, our method exhibits significant efficiency, with an 8x acceleration in training and a 16x acceleration in rendering compared to Switch-NeRF. Codes will be released in https://github.com/MiZhenxing/Switch-NeRF.
Abstract:Few-shot object detection(FSOD) aims to design methods to adapt object detectors efficiently with only few annotated samples. Fine-tuning has been shown to be an effective and practical approach. However, previous works often take the classical base-novel two stage fine-tuning procedure but ignore the implicit stability-plasticity contradiction among different modules. Specifically, the random re-initialized classifiers need more plasticity to adapt to novel samples. The other modules inheriting pre-trained weights demand more stability to reserve their class-agnostic knowledge. Regular fine-tuning which couples the optimization of these two parts hurts the model generalization in FSOD scenarios. In this paper, we find that this problem is prominent in the end-to-end object detector Sparse R-CNN for its multi-classifier cascaded architecture. We propose to mitigate this contradiction by a new three-stage fine-tuning procedure by introducing an addtional plasticity classifier fine-tuning(PCF) stage. We further design the multi-source ensemble(ME) technique to enhance the generalization of the model in the final fine-tuning stage. Extensive experiments verify that our method is effective in regularizing Sparse R-CNN, outperforming previous methods in the FSOD benchmark.