A common problem for composite images is the incompatibility of their foreground and background components. Image harmonization aims to solve this problem, making the whole image look more authentic and coherent. Most existing solutions predict lookup tables (LUTs) or reconstruct images, utilizing various attributes of composite images. Recent approaches have primarily focused on employing global transformations like normalization and color curve rendering to achieve visual consistency, and they often overlook the importance of local visual coherence. We present a patch-based harmonization network consisting of novel Patch-based normalization (PN) blocks and a feature extractor based on statistical color transfer. Extensive experiments demonstrate the network's high generalization capability for different domains. Our network achieves state-of-the-art results on the iHarmony4 dataset. Also, we created a new human portrait harmonization dataset based on FFHQ and checked the proposed method to show the generalization ability by achieving the best metrics on it. The benchmark experiments confirm that the suggested patch-based normalization block and feature extractor effectively improve the network's capability to harmonize portraits. Our code and model baselines are publicly available.
One of the main challenges of the sign language recognition task is the difficulty of collecting a suitable dataset due to the gap between deaf and hearing society. In addition, the sign language in each country differs significantly, which obliges the creation of new data for each of them. This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms. The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of RSL gestures received by 194 signers. We also provide the entire dataset creation pipeline, from data collection to video annotation, with the following demo application. Several neural networks are trained and evaluated on the Slovo to demonstrate its teaching ability. Proposed data and pre-trained models are publicly available.