Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Niu

Inharmonious Region Localization via Recurrent Self-Reasoning

Oct 05, 2022

Penghao Wu, Li Niu, Jing Liang, Liqing Zhang

Figure 1 for Inharmonious Region Localization via Recurrent Self-Reasoning

Figure 2 for Inharmonious Region Localization via Recurrent Self-Reasoning

Figure 3 for Inharmonious Region Localization via Recurrent Self-Reasoning

Figure 4 for Inharmonious Region Localization via Recurrent Self-Reasoning

Abstract:Synthetic images created by image editing operations are prevalent, but the color or illumination inconsistency between the manipulated region and background may make it unrealistic. Thus, it is important yet challenging to localize the inharmonious region to improve the quality of synthetic image. Inspired by the classic clustering algorithm, we aim to group pixels into two clusters: inharmonious cluster and background cluster by inserting a novel Recurrent Self-Reasoning (RSR) module into the bottleneck of UNet structure. The mask output from RSR module is provided for the decoder as attention guidance. Finally, we adaptively combine the masks from RSR and the decoder to form our final mask. Experimental results on the image harmonization dataset demonstrate that our method achieves competitive performance both quantitatively and qualitatively.

* BMVC2022

Via

Access Paper or Ask Questions

Inharmonious Region Localization with Auxiliary Style Feature

Oct 05, 2022

Penghao Wu, Li Niu, Liqing Zhang

Figure 1 for Inharmonious Region Localization with Auxiliary Style Feature

Figure 2 for Inharmonious Region Localization with Auxiliary Style Feature

Figure 3 for Inharmonious Region Localization with Auxiliary Style Feature

Figure 4 for Inharmonious Region Localization with Auxiliary Style Feature

Abstract:With the prevalence of image editing techniques, users can create fantastic synthetic images, but the image quality may be compromised by the color/illumination discrepancy between the manipulated region and background. Inharmonious region localization aims to localize the inharmonious region in a synthetic image. In this work, we attempt to leverage auxiliary style feature to facilitate this task. Specifically, we propose a novel color mapping module and a style feature loss to extract discriminative style features containing task-relevant color/illumination information. Based on the extracted style features, we also propose a novel style voting module to guide the localization of inharmonious region. Moreover, we introduce semantic information into the style voting module to achieve further improvement. Our method surpasses the existing methods by a large margin on the benchmark dataset.

* BMVC2022

Via

Access Paper or Ask Questions

Inharmonious Region Localization by Magnifying Domain Discrepancy

Sep 30, 2022

Jing Liang, Li Niu, Penghao Wu, Fengjun Guo, Teng Long

Figure 1 for Inharmonious Region Localization by Magnifying Domain Discrepancy

Figure 2 for Inharmonious Region Localization by Magnifying Domain Discrepancy

Figure 3 for Inharmonious Region Localization by Magnifying Domain Discrepancy

Figure 4 for Inharmonious Region Localization by Magnifying Domain Discrepancy

Abstract:Inharmonious region localization aims to localize the region in a synthetic image which is incompatible with surrounding background. The inharmony issue is mainly attributed to the color and illumination inconsistency produced by image editing techniques. In this work, we tend to transform the input image to another color space to magnify the domain discrepancy between inharmonious region and background, so that the model can identify the inharmonious region more easily. To this end, we present a novel framework consisting of a color mapping module and an inharmonious region localization network, in which the former is equipped with a novel domain discrepancy magnification loss and the latter could be an arbitrary localization network. Extensive experiments on image harmonization dataset show the superiority of our designed framework. Our code is available at https://github.com/bcmi/MadisNet-Inharmonious-Region-Localization.

Via

Access Paper or Ask Questions

DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Jul 28, 2022

Yan Hong, Li Niu, Jianfu Zhang, Liqing Zhang

Figure 1 for DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Figure 2 for DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Figure 3 for DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Figure 4 for DeltaGAN: Towards Diverse Few-shot Image Generation with Sample-Specific Delta

Abstract:Learning to generate new images for a novel category based on only a few images, named as few-shot image generation, has attracted increasing research interest. Several state-of-the-art works have yielded impressive results, but the diversity is still limited. In this work, we propose a novel Delta Generative Adversarial Network (DeltaGAN), which consists of a reconstruction subnetwork and a generation subnetwork. The reconstruction subnetwork captures intra-category transformation, i.e., delta, between same-category pairs. The generation subnetwork generates sample-specific delta for an input image, which is combined with this input image to generate a new image within the same category. Besides, an adversarial delta matching loss is designed to link the above two subnetworks together. Extensive experiments on six benchmark datasets demonstrate the effectiveness of our proposed method. Our code is available at https://github.com/bcmi/DeltaGAN-Few-Shot-Image-Generation.

* I want to withdraw this version and use it to update the previous version at arXiv:2009.08753

Via

Access Paper or Ask Questions

Learning Object Placement via Dual-path Graph Completion

Jul 23, 2022

Siyuan Zhou, Liu Liu, Li Niu, Liqing Zhang

Figure 1 for Learning Object Placement via Dual-path Graph Completion

Figure 2 for Learning Object Placement via Dual-path Graph Completion

Figure 3 for Learning Object Placement via Dual-path Graph Completion

Figure 4 for Learning Object Placement via Dual-path Graph Completion

Abstract:Object placement aims to place a foreground object over a background image with a suitable location and size. In this work, we treat object placement as a graph completion problem and propose a novel graph completion module (GCM). The background scene is represented by a graph with multiple nodes at different spatial locations with various receptive fields. The foreground object is encoded as a special node that should be inserted at a reasonable place in this graph. We also design a dual-path framework upon the structure of GCM to fully exploit annotated composite images. With extensive experiments on OPA dataset, our method proves to significantly outperform existing methods in generating plausible object placement without loss of diversity.

* 25 pages, 9 figures

Via

Access Paper or Ask Questions

Few-shot Image Generation Using Discrete Content Representation

Jul 22, 2022

Yan Hong, Li Niu, Jianfu Zhang, Liqing Zhang

Figure 1 for Few-shot Image Generation Using Discrete Content Representation

Figure 2 for Few-shot Image Generation Using Discrete Content Representation

Figure 3 for Few-shot Image Generation Using Discrete Content Representation

Figure 4 for Few-shot Image Generation Using Discrete Content Representation

Abstract:Few-shot image generation and few-shot image translation are two related tasks, both of which aim to generate new images for an unseen category with only a few images. In this work, we make the first attempt to adapt few-shot image translation method to few-shot image generation task. Few-shot image translation disentangles an image into style vector and content map. An unseen style vector can be combined with different seen content maps to produce different images. However, it needs to store seen images to provide content maps and the unseen style vector may be incompatible with seen content maps. To adapt it to few-shot image generation task, we learn a compact dictionary of local content vectors via quantizing continuous content maps into discrete content maps instead of storing seen images. Furthermore, we model the autoregressive distribution of discrete content map conditioned on style vector, which can alleviate the incompatibility between content map and style vector. Qualitative and quantitative results on three real datasets demonstrate that our model can produce images of higher diversity and fidelity for unseen categories than previous methods.

* This paper is accepted by ACM MM 2022

Via

Access Paper or Ask Questions

Human-centric Image Cropping with Partition-aware and Content-preserving Features

Jul 21, 2022

Bo Zhang, Li Niu, Xing Zhao, Liqing Zhang

Figure 1 for Human-centric Image Cropping with Partition-aware and Content-preserving Features

Figure 2 for Human-centric Image Cropping with Partition-aware and Content-preserving Features

Figure 3 for Human-centric Image Cropping with Partition-aware and Content-preserving Features

Abstract:Image cropping aims to find visually appealing crops in an image, which is an important yet challenging task. In this paper, we consider a specific and practical application: human-centric image cropping, which focuses on the depiction of a person. To this end, we propose a human-centric image cropping method with two novel feature designs for the candidate crop: partition-aware feature and content-preserving feature. For partition-aware feature, we divide the whole image into nine partitions based on the human bounding box and treat different partitions in a candidate crop differently conditioned on the human information. For content-preserving feature, we predict a heatmap indicating the important content to be included in a good crop, and extract the geometric relation between the heatmap and a candidate crop. Extensive experiments demonstrate that our method can perform favorably against state-of-the-art image cropping methods on human-centric image cropping task. Code is available at https://github.com/bcmi/Human-Centric-Image-Cropping.

Via

Access Paper or Ask Questions

Spatial Transformation for Image Composition via Correspondence Learning

Jul 06, 2022

Bo Zhang, Yue Liu, Kaixin Lu, Li Niu, Liqing Zhang

Figure 1 for Spatial Transformation for Image Composition via Correspondence Learning

Figure 2 for Spatial Transformation for Image Composition via Correspondence Learning

Figure 3 for Spatial Transformation for Image Composition via Correspondence Learning

Figure 4 for Spatial Transformation for Image Composition via Correspondence Learning

Abstract:When using cut-and-paste to acquire a composite image, the geometry inconsistency between foreground and background may severely harm its fidelity. To address the geometry inconsistency in composite images, several existing works learned to warp the foreground object for geometric correction. However, the absence of annotated dataset results in unsatisfactory performance and unreliable evaluation. In this work, we contribute a Spatial TRAnsformation for virtual Try-on (STRAT) dataset covering three typical application scenarios. Moreover, previous works simply concatenate foreground and background as input without considering their mutual correspondence. Instead, we propose a novel correspondence learning network (CorrelNet) to model the correspondence between foreground and background using cross-attention maps, based on which we can predict the target coordinate that each source coordinate of foreground should be mapped to on the background. Then, the warping parameters of foreground object can be derived from pairs of source and target coordinates. Additionally, we learn a filtering mask to eliminate noisy pairs of coordinates to estimate more accurate warping parameters. Extensive experiments on our STRAT dataset demonstrate that our proposed CorrelNet performs more favorably against previous methods.

Via

Access Paper or Ask Questions

CcHarmony: Color-checker based Image Harmonization Dataset

Jun 01, 2022

Haoxu Huang, Li Niu

Figure 1 for CcHarmony: Color-checker based Image Harmonization Dataset

Figure 2 for CcHarmony: Color-checker based Image Harmonization Dataset

Figure 3 for CcHarmony: Color-checker based Image Harmonization Dataset

Abstract:Image harmonization targets at adjusting the foreground in a composite image to make it compatible with the background, producing a more realistic and harmonious image. Training deep image harmonization network requires abundant training data, but it is extremely difficult to acquire training pairs of composite images and ground-truth harmonious images. Therefore, existing works turn to adjust the foreground appearance in a real image to create a synthetic composite image. However, such adjustment may not faithfully reflect the natural illumination change of foreground. In this work, we explore a novel transitive way to construct image harmonization dataset. Specifically, based on the existing datasets with recorded illumination information, we first convert the foreground in a real image to the standard illumination condition, and then convert it to another illumination condition, which is combined with the original background to form a synthetic composite image. In this manner, we construct an image harmonization dataset called ccHarmony, which is named after color checker (cc). The dataset is available at https://github.com/bcmi/Image-Harmonization-Dataset-ccHarmony.

Via

Access Paper or Ask Questions

From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

May 30, 2022

Jiangtong Li, Li Niu, Liqing Zhang

Figure 1 for From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Figure 2 for From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Figure 3 for From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Figure 4 for From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Abstract:Video understanding has achieved great success in representation learning, such as video caption, video object grounding, and video descriptive question-answer. However, current methods still struggle on video reasoning, including evidence reasoning and commonsense reasoning. To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). For commonsense reasoning, we set up a two-step solution by answering the question and providing a proper reason. Through extensive experiments on existing VideoQA methods, we find that the state-of-the-art methods are strong in descriptions but weak in reasoning. We hope that Causal-VidQA can guide the research of video understanding from representation learning to deeper reasoning. The dataset and related resources are available at \url{https://github.com/bcmi/Causal-VidQA.git}.

* To appear in CVPR 2022

Via

Access Paper or Ask Questions