Abstract:This paper addresses the problem of reconstructing a high-resolution hyperspectral image from a low-resolution multispectral observation. While spatial super-resolution and spectral super-resolution have been extensively studied, joint spatio-spectral super-resolution remains relatively explored. We propose an end-to-end model-driven framework that explicitly decomposes the joint spatio-spectral super-resolution problem into spatial super-resolution, spectral super-resolution and fusion tasks. Each sub-task is addressed by unfolding a variational-based approach, where the operators involved in the proximal gradient iterative scheme are replaced with tailored learnable modules. In particular, we design an upsampling operator for spatial super-resolution based on classical back-projection algorithms, adapted to handle arbitrary scaling factors. Spectral reconstruction is performed using learnable cluster-based upsampling and downsampling operators. For image fusion, we integrate low-frequency estimation and high-frequency injection modules to combine the spatial and spectral information from spatial super-resolution and spectral super-resolution outputs. Additionally, we introduce an efficient nonlocal post-processing step that leverages image self-similarity by combining a multi-head attention mechanism with residual connections. Extensive evaluations on several datasets and sampling factors demonstrate the effectiveness of our approach. The source code will be available at https://github.com/TAMI-UIB/JSSUNet
Abstract:The objective of pansharpening and hypersharpening is to accurately combine a high-resolution panchromatic (PAN) image with a low-resolution multispectral (MS) or hyperspectral (HS) image, respectively. Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches. These techniques involve unrolling the steps of the optimization scheme derived from the minimization of an energy into a deep learning framework, resulting in efficient and highly interpretable architectures. In this paper, we propose a model-based deep unfolded method for satellite image fusion. Our approach is based on a variational formulation that incorporates the classic observation model for MS/HS data, a high-frequency injection constraint based on the PAN image, and an arbitrary convex prior. For the unfolding stage, we introduce upsampling and downsampling layers that use geometric information encoded in the PAN image through residual networks. The backbone of our method is a multi-head attention residual network (MARNet), which replaces the proximity operator in the optimization scheme and combines multiple head attentions with residual learning to exploit image self-similarities via nonlocal operators defined in terms of patches. Additionally, we incorporate a post-processing module based on the MARNet architecture to further enhance the quality of the fused images. Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method and its ability to generalize across different sensor configurations and varying spatial and spectral resolutions. The source code will be available at https://github.com/TAMI-UIB/MARNet.