Abstract:Single-image super-resolution refers to the reconstruction of a high-resolution image from a single low-resolution observation. Although recent deep learning-based methods have demonstrated notable success on simulated datasets -- with low-resolution images obtained by degrading and downsampling high-resolution ones -- they frequently fail to generalize to real-world settings, such as document scans, which are affected by complex degradations and semantic variability. In this study, we introduce a task-driven, multi-task learning framework for training a super-resolution network specifically optimized for optical character recognition tasks. We propose to incorporate auxiliary loss functions derived from high-level vision tasks, including text detection using the connectionist text proposal network, text recognition via a convolutional recurrent neural network, keypoints localization using Key.Net, and hue consistency. To balance these diverse objectives, we employ dynamic weight averaging mechanism, which adaptively adjusts the relative importance of each loss term based on its convergence behavior. We validate our approach upon the SRResNet architecture, which is a well-established technique for single-image super-resolution. Experimental evaluations on both simulated and real-world scanned document datasets demonstrate that the proposed approach improves text detection, measured with intersection over union, while preserving overall image fidelity. These findings underscore the value of multi-objective optimization in super-resolution models for bridging the gap between simulated training regimes and practical deployment in real-world scenarios.
Abstract:Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.