Abstract:In RGB-D semantic segmentation for indoor scenes, a key challenge is effectively integrating the rich color information from RGB images with the spatial distance information from depth images. However, most existing methods overlook the inherent differences in how RGB and depth images express information. Properly distinguishing the processing of RGB and depth images is essential to fully exploiting their unique and significant characteristics. To address this, we propose a novel heterogeneous dual-branch framework called HDBFormer, specifically designed to handle these modality differences. For RGB images, which contain rich detail, we employ both a basic and detail encoder to extract local and global features. For the simpler depth images, we propose LDFormer, a lightweight hierarchical encoder that efficiently extracts depth features with fewer parameters. Additionally, we introduce the Modality Information Interaction Module (MIIM), which combines transformers with large kernel convolutions to interact global and local information across modalities efficiently. Extensive experiments show that HDBFormer achieves state-of-the-art performance on the NYUDepthv2 and SUN-RGBD datasets. The code is available at: https://github.com/Weishuobin/HDBFormer.
Abstract:Spline wavelets have shown favorable characteristics for localizing in both time and frequency. In this paper, we propose a new biorthogonal cubic special spline wavelet (BCSSW), based on the Cohen-Daubechies-Feauveau wavelet construction method and the cubic special spline algorithm. BCSSW has better properties in compact support, symmetry, and frequency domain characteristics. However, current mainstream detection operators usually ignore the uncertain representation of regional pixels and global structures. To solve these problems, we propose a structural uncertainty-aware and multi-structure operator fusion detection algorithm (EDBSW) based on a new BCSSW spline wavelet. By constructing a spline wavelet that efficiently handles edge effects, we utilize structural uncertainty-aware modulus maxima to detect highly uncertain edge samples. The proposed wavelet detection operator utilizes the multi-structure morphological operator and fusion reconstruction strategy to effectively address anti-noise processing and edge information of different frequencies. Numerous experiments have demonstrated its excellent performance in reducing noise and capturing edge structure details.