This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field.
This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained on the GT-Rain dataset and evaluated on an extension of the dataset consisting of 15 additional scenes. Scenes in GT-Rain are comprised of real rainy image and ground truth image captured moments after the rain had stopped. 275 participants were registered in the challenge and 55 competed in the final testing phase.
Transformer-based Single Image Deraining (SID) methods have achieved remarkable success, primarily attributed to their robust capability in capturing long-range interactions. However, we've noticed that current methods handle rain-affected and unaffected regions concurrently, overlooking the disparities between these areas, resulting in confusion between rain streaks and background parts, and inabilities to obtain effective interactions, ultimately resulting in suboptimal deraining outcomes. To address the above issue, we introduce the Region Transformer (Regformer), a novel SID method that underlines the importance of independently processing rain-affected and unaffected regions while considering their combined impact for high-quality image reconstruction. The crux of our method is the innovative Region Transformer Block (RTB), which integrates a Region Masked Attention (RMA) mechanism and a Mixed Gate Forward Block (MGFB). Our RTB is used for attention selection of rain-affected and unaffected regions and local modeling of mixed scales. The RMA generates attention maps tailored to these two regions and their interactions, enabling our model to capture comprehensive features essential for rain removal. To better recover high-frequency textures and capture more local details, we develop the MGFB as a compensation module to complete local mixed scale modeling. Extensive experiments demonstrate that our model reaches state-of-the-art performance, significantly improving the image deraining quality. Our code and trained models are publicly available.
The generalization capability of existing image restoration and enhancement (IRE) methods is constrained by the limited pre-trained datasets, making it difficult to handle agnostic inputs such as different degradation levels and scenarios beyond their design scopes. Moreover, they are not equipped with interactive mechanisms to consider user preferences or feedback, and their end-to-end settings cannot provide users with more choices. Faced with the above-mentioned IRE method's limited performance and insufficient interactivity, we try to solve it from the engineering and system framework levels. Specifically, we propose Clarity ChatGPT-a transformative system that combines the conversational intelligence of ChatGPT with multiple IRE methods. Clarity ChatGPT can automatically detect image degradation types and select appropriate IRE methods to restore images, or iteratively generate satisfactory results based on user feedback. Its innovative features include a CLIP-powered detector for accurate degradation classification, no-reference image quality evaluation for performance evaluation, region-specific processing for precise enhancements, and advanced fusion techniques for optimal restoration results. Clarity ChatGPT marks a significant advancement in integrating language and vision, enhancing image-text interactions, and providing a robust, high-performance IRE solution. Our case studies demonstrate that Clarity ChatGPT effectively improves the generalization and interaction capabilities in the IRE, and also fills the gap in the low-level domain of the existing vision-language model.
Stereo images, containing left and right view images with disparity, are utilized in solving low-vision tasks recently, e.g., rain removal and super-resolution. Stereo image restoration methods usually obtain better performance than monocular methods by learning the disparity between dual views either implicitly or explicitly. However, existing stereo rain removal methods still cannot make full use of the complementary information between two views, and we find it is because: 1) the rain streaks have more complex distributions in directions and densities, which severely damage the complementary information and pose greater challenges; 2) the disparity estimation is not accurate enough due to the imperfect fusion mechanism for the features between two views. To overcome such limitations, we propose a new \underline{Stereo} \underline{I}mage \underline{R}ain \underline{R}emoval method (StereoIRR) via sufficient interaction between two views, which incorporates: 1) a new Dual-view Mutual Attention (DMA) mechanism which generates mutual attention maps by taking left and right views as key information for each other to facilitate cross-view feature fusion; 2) a long-range and cross-view interaction, which is constructed with basic blocks and dual-view mutual attention, can alleviate the adverse effect of rain on complementary information to help the features of stereo images to get long-range and cross-view interaction and fusion. Notably, StereoIRR outperforms other related monocular and stereo image rain removal methods on several datasets. Our codes and datasets will be released.
Removing the rain streaks from single image is still a challenging task, since the shapes and direc-tions of rain streaks in the synthetic datasets are very different from real images. Although super-vised deep deraining networks have obtained im-pressive results on synthetic datasets, they still cannot obtain satisfactory results on real images due to weak generalization of rain removal capac-ity, i.e., the pre-trained models usually cannot handle new shapes and directions that may lead to over-derained/under-derained results. In this paper, we propose a new semi-supervised GAN-based deraining network termed Semi-DerainGAN, which can use both synthetic and real rainy images in a uniform network using two supervised and unsupervised processes. Specifically, a semi-supervised rain streak learner termed SSRML sharing the same parameters of both processes is derived, which makes the real images contribute more rain streak information. To deliver better deraining results, we design a paired discriminator for distinguishing the real pairs from fake pairs. Note that we also contribute a new real-world rainy image dataset Real200 to alleviate the dif-ference between the synthetic and real image do-mains. Extensive results on public datasets show that our model can obtain competitive perfor-mance, especially on real images.
Single image deraining (SID) is an important and challenging topic in emerging vision applications, and most of emerged deraining methods are supervised relying on the ground truth (i.e., paired images) in recent years. However, in practice it is rather common to have no un-paired images in real deraining task, in such cases how to remove the rain streaks in an unsupervised way will be a very challenging task due to lack of constraints between images and hence suffering from low-quality recovery results. In this paper, we explore the unsupervised SID task using unpaired data and propose a novel net called Attention-guided Deraining by Constrained CycleGAN (or shortly, DerainCycleGAN), which can fully utilize the constrained transfer learning abilitiy and circulatory structure of CycleGAN. Specifically, we design an unsu-pervised attention guided rain streak extractor (U-ARSE) that utilizes a memory to extract the rain streak masks with two constrained cycle-consistency branches jointly by paying attention to both the rainy and rain-free image domains. As a by-product, we also contribute a new paired rain image dataset called Rain200A, which is constructed by our network automatically. Compared with existing synthesis datasets, the rainy streaks in Rain200A contains more obvious and diverse shapes and directions. As a result, existing supervised methods trained on Rain200A can perform much better for processing real rainy images. Extensive experiments on synthesis and real datasets show that our net is superior to existing unsupervised deraining networks, and is also very competitive to other related supervised networks.
Single image deraining task is still a very challenging task due to its ill-posed nature in reality. Recently, researchers have tried to fix this issue by training the CNN-based end-to-end models, but they still cannot extract the negative rain streaks from rainy images precisely, which usually leads to an over de-rained or under de-rained result. To handle this issue, this paper proposes a new coarse-to-fine single image deraining framework termed Multi-stream Hybrid Deraining Network (shortly, MH-DerainNet). To obtain the negative rain streaks during training process more accurately, we present a new module named dual path residual dense block, i.e., Residual path and Dense path. The Residual path is used to reuse com-mon features from the previous layers while the Dense path can explore new features. In addition, to concatenate different scaled features, we also apply the idea of multi-stream with shortcuts between cascaded dual path residual dense block based streams. To obtain more distinct derained images, we combine the SSIM loss and perceptual loss to preserve the per-pixel similarity as well as preserving the global structures so that the deraining result is more accurate. Extensive experi-ments on both synthetic and real rainy images demonstrate that our MH-DerainNet can deliver significant improvements over several recent state-of-the-art methods.