Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Niu

CcHarmony: Color-checker based Image Harmonization Dataset

Jun 01, 2022

Haoxu Huang, Li Niu

Figure 1 for CcHarmony: Color-checker based Image Harmonization Dataset

Figure 2 for CcHarmony: Color-checker based Image Harmonization Dataset

Figure 3 for CcHarmony: Color-checker based Image Harmonization Dataset

Abstract:Image harmonization targets at adjusting the foreground in a composite image to make it compatible with the background, producing a more realistic and harmonious image. Training deep image harmonization network requires abundant training data, but it is extremely difficult to acquire training pairs of composite images and ground-truth harmonious images. Therefore, existing works turn to adjust the foreground appearance in a real image to create a synthetic composite image. However, such adjustment may not faithfully reflect the natural illumination change of foreground. In this work, we explore a novel transitive way to construct image harmonization dataset. Specifically, based on the existing datasets with recorded illumination information, we first convert the foreground in a real image to the standard illumination condition, and then convert it to another illumination condition, which is combined with the original background to form a synthetic composite image. In this manner, we construct an image harmonization dataset called ccHarmony, which is named after color checker (cc). The dataset is available at https://github.com/bcmi/Image-Harmonization-Dataset-ccHarmony.

Via

Access Paper or Ask Questions

From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

May 30, 2022

Jiangtong Li, Li Niu, Liqing Zhang

Figure 1 for From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Figure 2 for From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Figure 3 for From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Figure 4 for From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering

Abstract:Video understanding has achieved great success in representation learning, such as video caption, video object grounding, and video descriptive question-answer. However, current methods still struggle on video reasoning, including evidence reasoning and commonsense reasoning. To facilitate deeper video understanding towards video reasoning, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). For commonsense reasoning, we set up a two-step solution by answering the question and providing a proper reason. Through extensive experiments on existing VideoQA methods, we find that the state-of-the-art methods are strong in descriptions but weak in reasoning. We hope that Causal-VidQA can guide the research of video understanding from representation learning to deeper reasoning. The dataset and related resources are available at \url{https://github.com/bcmi/Causal-VidQA.git}.

* To appear in CVPR 2022

Via

Access Paper or Ask Questions

Fast Object Placement Assessment

May 28, 2022

Li Niu, Qingyang Liu, Zhenchen Liu, Jiangtong Li

Figure 1 for Fast Object Placement Assessment

Figure 2 for Fast Object Placement Assessment

Figure 3 for Fast Object Placement Assessment

Figure 4 for Fast Object Placement Assessment

Abstract:Object placement assessment (OPA) aims to predict the rationality score of a composite image in terms of the placement (e.g., scale, location) of inserted foreground object. However, given a pair of scaled foreground and background, to enumerate all the reasonable locations, existing OPA model needs to place the foreground at each location on the background and pass the obtained composite image through the model one at a time, which is very time-consuming. In this work, we investigate a new task named as fast OPA. Specifically, provided with a scaled foreground and a background, we only pass them through the model once and predict the rationality scores for all locations. To accomplish this task, we propose a pioneering fast OPA model with several innovations (i.e., foreground dynamic filter, background prior transfer, and composite feature mimicking) to bridge the performance gap between slow OPA model and fast OPA model. Extensive experiments on OPA dataset show that our proposed fast OPA model performs on par with slow OPA model but runs significantly faster.

Via

Access Paper or Ask Questions

Deep Video Harmonization with Color Mapping Consistency

May 02, 2022

Xinyuan Lu, Shengyuan Huang, Li Niu, Wenyan Cong, Liqing Zhang

Figure 1 for Deep Video Harmonization with Color Mapping Consistency

Figure 2 for Deep Video Harmonization with Color Mapping Consistency

Figure 3 for Deep Video Harmonization with Color Mapping Consistency

Figure 4 for Deep Video Harmonization with Color Mapping Consistency

Abstract:Video harmonization aims to adjust the foreground of a composite video to make it compatible with the background. So far, video harmonization has only received limited attention and there is no public dataset for video harmonization. In this work, we construct a new video harmonization dataset HYouTube by adjusting the foreground of real videos to create synthetic composite videos. Moreover, we consider the temporal consistency in video harmonization task. Unlike previous works which establish the spatial correspondence, we design a novel framework based on the assumption of color mapping consistency, which leverages the color mapping of neighboring frames to refine the current frame. Extensive experiments on our HYouTube dataset prove the effectiveness of our proposed framework. Our dataset and code are available at https://github.com/bcmi/Video-Harmonization-Dataset-HYouTube.

Via

Access Paper or Ask Questions

Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Oct 27, 2021

Yan Liu, Zhijie Zhang, Li Niu, Junjie Chen, Liqing Zhang

Figure 1 for Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Figure 2 for Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Figure 3 for Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Figure 4 for Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity

Abstract:Object detection has achieved promising success, but requires large-scale fully-annotated data, which is time-consuming and labor-extensive. Therefore, we consider object detection with mixed supervision, which learns novel object categories using weak annotations with the help of full annotations of existing base object categories. Previous works using mixed supervision mainly learn the class-agnostic objectness from fully-annotated categories, which can be transferred to upgrade the weak annotations to pseudo full annotations for novel categories. In this paper, we further transfer mask prior and semantic similarity to bridge the gap between novel categories and base categories. Specifically, the ability of using mask prior to help detect objects is learned from base categories and transferred to novel categories. Moreover, the semantic similarity between objects learned from base categories is transferred to denoise the pseudo full annotations for novel categories. Experimental results on three benchmark datasets demonstrate the effectiveness of our method over existing methods. Codes are available at https://github.com/bcmi/TraMaS-Weak-Shot-Object-Detection.

* accepted by NeurIPS2021

Via

Access Paper or Ask Questions

Weak Novel Categories without Tears: A Survey on Weak-Shot Learning

Oct 16, 2021

Li Niu

Figure 1 for Weak Novel Categories without Tears: A Survey on Weak-Shot Learning

Abstract:Deep learning is a data-hungry approach, which requires massive training data. However, it is time-consuming and labor-intensive to collect abundant fully-annotated training data for all categories. Assuming the existence of base categories with adequate fully-annotated training samples, different paradigms requiring fewer training samples or weaker annotations for novel categories have attracted growing research interest. Among them, zero-shot (resp., few-shot) learning explores using zero (resp., a few) training samples for novel categories, which lowers the quantity requirement for novel categories. Instead, weak-shot learning lowers the quality requirement for novel categories. Specifically, sufficient training samples are collected for novel categories but they only have weak annotations. In different tasks, weak annotations are presented in different forms (e.g., noisy labels for image classification, image labels for object detection, bounding boxes for segmentation), similar to the definitions in weakly supervised learning. Therefore, weak-shot learning can also be treated as weakly supervised learning with auxiliary fully supervised categories. In this paper, we discuss the existing weak-shot learning methodologies in different tasks and summarize the codes at https://github.com/bcmi/Awesome-Weak-Shot-Learning.

Via

Access Paper or Ask Questions

Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary

Oct 04, 2021

Siyuan Zhou, Li Niu, Jianlou Si, Chen Qian, Liqing Zhang

Figure 1 for Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary

Figure 2 for Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary

Figure 3 for Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary

Figure 4 for Weak-shot Semantic Segmentation by Transferring Semantic Affinity and Boundary

Abstract:Weakly-supervised semantic segmentation (WSSS) with image-level labels has been widely studied to relieve the annotation burden of the traditional segmentation task. In this paper, we show that existing fully-annotated base categories can help segment objects of novel categories with only image-level labels, even if base and novel categories have no overlap. We refer to this task as weak-shot semantic segmentation, which could also be treated as WSSS with auxiliary fully-annotated categories. Recent advanced WSSS methods usually obtain class activation maps (CAMs) and refine them by affinity propagation. Based on the observation that semantic affinity and boundary are class-agnostic, we propose a method under the WSSS framework to transfer semantic affinity and boundary from base categories to novel ones. As a result, we find that pixel-level annotation of base categories can facilitate affinity learning and propagation, leading to higher-quality CAMs of novel categories. Extensive experiments on PASCAL VOC 2012 dataset demonstrate that our method significantly outperforms WSSS baselines on novel categories.

* 16 pages, 8 figures

Via

Access Paper or Ask Questions

HYouTube: Video Harmonization Dataset

Sep 18, 2021

Xinyuan Lu, Shengyuan Huang, Li Niu, Wenyan Cong, Liqing Zhang

Figure 1 for HYouTube: Video Harmonization Dataset

Figure 2 for HYouTube: Video Harmonization Dataset

Figure 3 for HYouTube: Video Harmonization Dataset

Abstract:Video composition aims to generate a composite video by combining the foreground of one video with the background of another video, but the inserted foreground may be incompatible with the background in terms of color and illumination. Video harmonization aims to adjust the foreground of a composite video to make it compatible with the background. So far, video harmonization has only received limited attention and there is no public dataset for video harmonization. In this work, we construct a new video harmonization dataset HYouTube by adjusting the foreground of real videos to create synthetic composite videos. Considering the domain gap between real composite videos and synthetic composite videos, we additionally create 100 real composite videos via copy-and-paste. Datasets are available at https://github.com/bcmi/Video-Harmonization-Dataset-HYouTube.

Via

Access Paper or Ask Questions

High-Resolution Image Harmonization via Collaborative Dual Transformations

Sep 14, 2021

Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, Xuesong Gao, Qihao Sun, Liqing Zhang

Figure 1 for High-Resolution Image Harmonization via Collaborative Dual Transformations

Figure 2 for High-Resolution Image Harmonization via Collaborative Dual Transformations

Figure 3 for High-Resolution Image Harmonization via Collaborative Dual Transformations

Figure 4 for High-Resolution Image Harmonization via Collaborative Dual Transformations

Abstract:Given a composite image, image harmonization aims to adjust the foreground to make it compatible with the background. High-resolution image harmonization is in high demand, but still remains unexplored. Conventional image harmonization methods learn global RGB-to-RGB transformation which could effortlessly scale to high resolution, but ignore diverse local context. Recent deep learning methods learn the dense pixel-to-pixel transformation which could generate harmonious outputs, but are highly constrained in low resolution. In this work, we propose a high-resolution image harmonization network with Collaborative Dual Transformation (CDTNet) to combine pixel-to-pixel transformation and RGB-to-RGB transformation coherently in an end-to-end framework. Our CDTNet consists of a low-resolution generator for pixel-to-pixel transformation, a color mapping module for RGB-to-RGB transformation, and a refinement module to take advantage of both. Extensive experiments on high-resolution image harmonization dataset demonstrate that our CDTNet strikes a good balance between efficiency and effectiveness.

Via

Access Paper or Ask Questions

Visible Watermark Removal via Self-calibrated Localization and Background Refinement

Aug 08, 2021

Jing Liang, Li Niu, Fengjun Guo, Teng Long, Liqing Zhang

Figure 1 for Visible Watermark Removal via Self-calibrated Localization and Background Refinement

Figure 2 for Visible Watermark Removal via Self-calibrated Localization and Background Refinement

Figure 3 for Visible Watermark Removal via Self-calibrated Localization and Background Refinement

Figure 4 for Visible Watermark Removal via Self-calibrated Localization and Background Refinement

Abstract:Superimposing visible watermarks on images provides a powerful weapon to cope with the copyright issue. Watermark removal techniques, which can strengthen the robustness of visible watermarks in an adversarial way, have attracted increasing research interest. Modern watermark removal methods perform watermark localization and background restoration simultaneously, which could be viewed as a multi-task learning problem. However, existing approaches suffer from incomplete detected watermark and degraded texture quality of restored background. Therefore, we design a two-stage multi-task network to address the above issues. The coarse stage consists of a watermark branch and a background branch, in which the watermark branch self-calibrates the roughly estimated mask and passes the calibrated mask to background branch to reconstruct the watermarked area. In the refinement stage, we integrate multi-level features to improve the texture quality of watermarked area. Extensive experiments on two datasets demonstrate the effectiveness of our proposed method.

Via

Access Paper or Ask Questions