Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liqing Zhang

OPA: Object Placement Assessment Dataset

Jul 05, 2021

Liu Liu, Bo Zhang, Jiangtong Li, Li Niu, Qingyang Liu, Liqing Zhang

Abstract:Image composition aims to generate realistic composite image by inserting an object from one image into another background image, where the placement (e.g., location, size, occlusion) of inserted object may be unreasonable, which would significantly degrade the quality of the composite image. Although some works attempted to learn object placement to create realistic composite images, they did not focus on assessing the plausibility of object placement. In this paper, we focus on object placement assessment task, which verifies whether a composite image is plausible in terms of the object placement. To accomplish this task, we construct the first Object Placement Assessment (OPA) dataset consisting of composite images and their rationality labels. Dataset is available at https://github.com/bcmi/Object-Placement-Assessment-Dataset-OPA.

Via

Access Paper or Ask Questions

Making Images Real Again: A Comprehensive Survey on Deep Image Composition

Jun 28, 2021

Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang

Figure 1 for Making Images Real Again: A Comprehensive Survey on Deep Image Composition

Figure 2 for Making Images Real Again: A Comprehensive Survey on Deep Image Composition

Figure 3 for Making Images Real Again: A Comprehensive Survey on Deep Image Composition

Figure 4 for Making Images Real Again: A Comprehensive Survey on Deep Image Composition

Abstract:As a common image editing operation, image composition aims to cut the foreground from one image and paste it on another image, resulting in a composite image. However, there are many issues that could make the composite images unrealistic. These issues can be summarized as the inconsistency between foreground and background, which include appearance inconsistency (e.g., incompatible color and illumination) and geometry inconsistency (e.g., unreasonable size and location). Previous works on image composition target at one or more issues. Since each individual issue is a complicated problem, there are some research directions (e.g., image harmonization, object placement) which focus on only one issue. By putting all the efforts together, we can acquire realistic composite images. Sometimes, we expect the composite images to be not only realistic but also aesthetic, in which case aesthetic evaluation needs to be considered. In this survey, we summarize the datasets and methods for the above research directions. We also discuss the limitations and potential directions to facilitate the future research for image composition. Finally, as a double-edged sword, image composition may also have negative effect on our lives (e.g., fake news) and thus it is imperative to develop algorithms to fight against composite images. Datasets and codes for image composition are summarized at https://github.com/bcmi/Awesome-Image-Composition.

Via

Access Paper or Ask Questions

End-to-End Video Object Detection with Spatial-Temporal Transformers

May 23, 2021

Lu He, Qianyu Zhou, Xiangtai Li, Li Niu, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang

Figure 1 for End-to-End Video Object Detection with Spatial-Temporal Transformers

Figure 2 for End-to-End Video Object Detection with Spatial-Temporal Transformers

Figure 3 for End-to-End Video Object Detection with Spatial-Temporal Transformers

Figure 4 for End-to-End Video Object Detection with Spatial-Temporal Transformers

Abstract:Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on Video Object Detection (VOD) has not been well explored. In this paper, we present TransVOD, an end-to-end video object detection model based on a spatial-temporal Transformer architecture. The goal of this paper is to streamline the pipeline of VOD, effectively removing the need for many hand-crafted components for feature aggregation, e.g., optical flow, recurrent neural networks, relation networks. Besides, benefited from the object query design in DETR, our method does not need complicated post-processing methods such as Seq-NMS or Tubelet rescoring, which keeps the pipeline simple and clean. In particular, we present temporal Transformer to aggregate both the spatial object queries and the feature memories of each frame. Our temporal Transformer consists of three components: Temporal Deformable Transformer Encoder (TDTE) to encode the multiple frame spatial details, Temporal Query Encoder (TQE) to fuse object queries, and Temporal Deformable Transformer Decoder to obtain current frame detection results. These designs boost the strong baseline deformable DETR by a significant margin (3%-4% mAP) on the ImageNet VID dataset. TransVOD yields comparable results performance on the benchmark of ImageNet VID. We hope our TransVOD can provide a new perspective for video object detection. Code will be made publicly available at https://github.com/SJTU-LuHe/TransVOD.

Via

Access Paper or Ask Questions

Shadow Generation for Composite Image in Real-world Scenes

Apr 21, 2021

Yan Hong, Li Niu, Jianfu Zhang, Liqing Zhang

Figure 1 for Shadow Generation for Composite Image in Real-world Scenes

Figure 2 for Shadow Generation for Composite Image in Real-world Scenes

Figure 3 for Shadow Generation for Composite Image in Real-world Scenes

Figure 4 for Shadow Generation for Composite Image in Real-world Scenes

Abstract:Image composition targets at inserting a foreground object on a background image. Most previous image composition methods focus on adjusting the foreground to make it compatible with background while ignoring the shadow effect of foreground on the background. In this work, we focus on generating plausible shadow for the foreground object in the composite image. First, we contribute a real-world shadow generation dataset DESOBA by generating synthetic composite images based on paired real images and deshadowed images. Then, we propose a novel shadow generation network SGRNet, which consists of a shadow mask prediction stage and a shadow filling stage. In the shadow mask prediction stage, foreground and background information are thoroughly interacted to generate foreground shadow mask. In the shadow filling stage, shadow parameters are predicted to fill the shadow area. Extensive experiments on our DESOBA dataset and real composite images demonstrate the effectiveness of our proposed method.

Via

Access Paper or Ask Questions

Inharmonious Region Localization

Apr 19, 2021

Jing Liang, Li Niu, Liqing Zhang

Figure 1 for Inharmonious Region Localization

Figure 2 for Inharmonious Region Localization

Figure 3 for Inharmonious Region Localization

Figure 4 for Inharmonious Region Localization

Abstract:The advance of image editing techniques allows users to create artistic works, but the manipulated regions may be incompatible with the background. Localizing the inharmonious region is an appealing yet challenging task. Realizing that this task requires effective aggregation of multi-scale contextual information and suppression of redundant information, we design novel Bi-directional Feature Integration (BFI) block and Global-context Guided Decoder (GGD) block to fuse multi-scale features in the encoder and decoder respectively. We also employ Mask-guided Dual Attention (MDA) block between the encoder and decoder to suppress the redundant information. Experiments on the image harmonization dataset demonstrate that our method achieves competitive performance for inharmonious region localization. The source code is available at https://github.com/bcmi/DIRL.

* 12 pages, 7 figures

Via

Access Paper or Ask Questions

Image Composition Assessment with Saliency-augmented Multi-pattern Pooling

Apr 07, 2021

Bo Zhang, Li Niu, Liqing Zhang

Figure 1 for Image Composition Assessment with Saliency-augmented Multi-pattern Pooling

Figure 2 for Image Composition Assessment with Saliency-augmented Multi-pattern Pooling

Figure 3 for Image Composition Assessment with Saliency-augmented Multi-pattern Pooling

Figure 4 for Image Composition Assessment with Saliency-augmented Multi-pattern Pooling

Abstract:Image composition assessment is crucial in aesthetic assessment, which aims to assess the overall composition quality of a given image. However, to the best of our knowledge, there is neither dataset nor method specifically proposed for this task. In this paper, we contribute the first composition assessment dataset CADB with composition scores for each image provided by multiple professional raters. Besides, we propose a composition assessment network SAMP-Net with a novel Saliency-Augmented Multi-pattern Pooling (SAMP) module, which analyses visual layout from the perspectives of multiple composition patterns. We also leverage composition-relevant attributes to further boost the performance, and extend Earth Mover's Distance (EMD) loss to weighted EMD loss to eliminate the content bias. The experimental results show that our SAMP-Net can perform more favorably than previous aesthetic assessment approaches and offer constructive composition suggestions.

Via

Access Paper or Ask Questions

Deep Image Harmonization by Bridging the Reality Gap

Mar 31, 2021

Wenyan Cong, Junyan Cao, Li Niu, Jianfu Zhang, Xuesong Gao, Zhiwei Tang, Liqing Zhang

Figure 1 for Deep Image Harmonization by Bridging the Reality Gap

Figure 2 for Deep Image Harmonization by Bridging the Reality Gap

Figure 3 for Deep Image Harmonization by Bridging the Reality Gap

Figure 4 for Deep Image Harmonization by Bridging the Reality Gap

Abstract:Image harmonization has been significantly advanced with large-scale harmonization dataset. However, the current way to build dataset is still labor-intensive, which adversely affects the extendability of dataset. To address this problem, we propose to construct a large-scale rendered harmonization dataset RHHarmony with fewer human efforts to augment the existing real-world dataset. To leverage both real-world images and rendered images, we propose a cross-domain harmonization network CharmNet to bridge the domain gap between two domains. Moreover, we also employ well-designed style classifiers and losses to facilitate cross-domain knowledge transfer. Extensive experiments demonstrate the potential of using rendered images for image harmonization and the effectiveness of our proposed network. Our dataset and code are available at https://github.com/bcmi/Rendered_Image_Harmonization_Datasets.

* 17 pages with supplementary

Via

Access Paper or Ask Questions

Disentangled Information Bottleneck

Dec 22, 2020

Ziqi Pan, Li Niu, Jianfu Zhang, Liqing Zhang

Figure 1 for Disentangled Information Bottleneck

Figure 2 for Disentangled Information Bottleneck

Figure 3 for Disentangled Information Bottleneck

Figure 4 for Disentangled Information Bottleneck

Abstract:The information bottleneck (IB) method is a technique for extracting information that is relevant for predicting the target random variable from the source random variable, which is typically implemented by optimizing the IB Lagrangian that balances the compression and prediction terms. However, the IB Lagrangian is hard to optimize, and multiple trials for tuning values of Lagrangian multiplier are required. Moreover, we show that the prediction performance strictly decreases as the compression gets stronger during optimizing the IB Lagrangian. In this paper, we implement the IB method from the perspective of supervised disentangling. Specifically, we introduce Disentangled Information Bottleneck (DisenIB) that is consistent on compressing source maximally without target prediction performance loss (maximum compression). Theoretical and experimental results demonstrate that our method is consistent on maximum compression, and performs well in terms of generalization, robustness to adversarial attack, out-of-distribution detection, and supervised disentangling.

* Revised mathematical proof

Via

Access Paper or Ask Questions

From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

Sep 29, 2020

Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang

Figure 1 for From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

Figure 2 for From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

Figure 3 for From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

Figure 4 for From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation

Abstract:Zero-shot learning has been actively studied for image classification task to relieve the burden of annotating image labels. Interestingly, semantic segmentation task requires more labor-intensive pixel-wise annotation, but zero-shot semantic segmentation has only attracted limited research interest. Thus, we focus on zero-shot semantic segmentation, which aims to segment unseen objects with only category-level semantic representations provided for unseen categories. In this paper, we propose a novel Context-aware feature Generation Network (CaGNet), which can synthesize context-aware pixel-wise visual features for unseen categories based on category-level semantic representations and pixel-wise contextual information. The synthesized features are used to finetune the classifier to enable segmenting unseen objects. Furthermore, we extend pixel-wise feature generation and finetuning to patch-wise feature generation and finetuning, which additionally considers inter-pixel relationship. Experimental results on Pascal-VOC, Pascal-Context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods. Code is available at https://github.com/bcmi/CaGNetv2-Zero-Shot-Semantic-Segmentation.

* submitted to the TIP

Via

Access Paper or Ask Questions

Weak-shot Fine-grained Classification via Similarity Transfer

Sep 19, 2020

Junjie Chen, Li Niu, Liu Liu, Liqing Zhang

Figure 1 for Weak-shot Fine-grained Classification via Similarity Transfer

Figure 2 for Weak-shot Fine-grained Classification via Similarity Transfer

Figure 3 for Weak-shot Fine-grained Classification via Similarity Transfer

Figure 4 for Weak-shot Fine-grained Classification via Similarity Transfer

Abstract:Recognizing fine-grained categories remains a challenging task, due to the subtle distinctions among different subordinate categories, which results in the need of abundant annotated samples. To alleviate the data-hungry problem, we consider the problem of learning novel categories from web data with the support of a clean set of base categories, which is referred to as weak-shot learning. Under this setting, we propose to transfer pairwise semantic similarity from base categories to novel categories, because this similarity is highly transferable and beneficial for learning from web data. Specifically, we firstly train a similarity net on clean data, and then employ two simple yet effective strategies to leverage the transferred similarity to denoise web training data. In addition, we apply adversarial loss on similarity net to enhance the transferability of similarity. Comprehensive experiments on three fine-grained datasets demonstrate that we could dramatically facilitate webly supervised learning by a clean set and similarity transfer is effective under this setting.

Via

Access Paper or Ask Questions