Alert button
Picture for Shiyin Wang

Shiyin Wang

Alert button

DiffI2I: Efficient Diffusion Model for Image-to-Image Translation

Aug 26, 2023
Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Radu Timotfe, Luc Van Gool

Figure 1 for DiffI2I: Efficient Diffusion Model for Image-to-Image Translation
Figure 2 for DiffI2I: Efficient Diffusion Model for Image-to-Image Translation
Figure 3 for DiffI2I: Efficient Diffusion Model for Image-to-Image Translation
Figure 4 for DiffI2I: Efficient Diffusion Model for Image-to-Image Translation

The Diffusion Model (DM) has emerged as the SOTA approach for image synthesis. However, the existing DM cannot perform well on some image-to-image translation (I2I) tasks. Different from image synthesis, some I2I tasks, such as super-resolution, require generating results in accordance with GT images. Traditional DMs for image synthesis require extensive iterations and large denoising models to estimate entire images, which gives their strong generative ability but also leads to artifacts and inefficiency for I2I. To tackle this challenge, we propose a simple, efficient, and powerful DM framework for I2I, called DiffI2I. Specifically, DiffI2I comprises three key components: a compact I2I prior extraction network (CPEN), a dynamic I2I transformer (DI2Iformer), and a denoising network. We train DiffI2I in two stages: pretraining and DM training. For pretraining, GT and input images are fed into CPEN$_{S1}$ to capture a compact I2I prior representation (IPR) guiding DI2Iformer. In the second stage, the DM is trained to only use the input images to estimate the same IRP as CPEN$_{S1}$. Compared to traditional DMs, the compact IPR enables DiffI2I to obtain more accurate outcomes and employ a lighter denoising network and fewer iterations. Through extensive experiments on various I2I tasks, we demonstrate that DiffI2I achieves SOTA performance while significantly reducing computational burdens.

Viaarxiv icon

DiffIR: Efficient Diffusion Model for Image Restoration

Mar 16, 2023
Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, Luc Van Gool

Figure 1 for DiffIR: Efficient Diffusion Model for Image Restoration
Figure 2 for DiffIR: Efficient Diffusion Model for Image Restoration
Figure 3 for DiffIR: Efficient Diffusion Model for Image Restoration
Figure 4 for DiffIR: Efficient Diffusion Model for Image Restoration

Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network. However, different from image synthesis generating each pixel from scratch, most pixels of image restoration (IR) are given. Thus, for IR, traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient. To address this issue, we propose an efficient DM for IR (DiffIR), which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network. Specifically, DiffIR has two training stages: pretraining and training DM. In pretraining, we input ground-truth images into CPEN$_{S1}$ to capture a compact IR prior representation (IPR) to guide DIRformer. In the second stage, we train the DM to directly estimate the same IRP as pretrained CPEN$_{S1}$ only using LQ images. We observe that since the IPR is only a compact vector, DiffIR can use fewer iterations than traditional DM to obtain accurate estimations and generate more stable and realistic results. Since the iterations are few, our DiffIR can adopt a joint optimization of CPEN$_{S2}$, DIRformer, and denoising network, which can further reduce the estimation error influence. We conduct extensive experiments on several IR tasks and achieve SOTA performance while consuming less computational costs.

Viaarxiv icon

Human De-occlusion: Invisible Perception and Recovery for Humans

Mar 22, 2021
Qiang Zhou, Shiyin Wang, Yitong Wang, Zilong Huang, Xinggang Wang

Figure 1 for Human De-occlusion: Invisible Perception and Recovery for Humans
Figure 2 for Human De-occlusion: Invisible Perception and Recovery for Humans
Figure 3 for Human De-occlusion: Invisible Perception and Recovery for Humans
Figure 4 for Human De-occlusion: Invisible Perception and Recovery for Humans

In this paper, we tackle the problem of human de-occlusion which reasons about occluded segmentation masks and invisible appearance content of humans. In particular, a two-stage framework is proposed to estimate the invisible portions and recover the content inside. For the stage of mask completion, a stacked network structure is devised to refine inaccurate masks from a general instance segmentation model and predict integrated masks simultaneously. Additionally, the guidance from human parsing and typical pose masks are leveraged to bring prior information. For the stage of content recovery, a novel parsing guided attention module is applied to isolate body parts and capture context information across multiple scales. Besides, an Amodal Human Perception dataset (AHP) is collected to settle the task of human de-occlusion. AHP has advantages of providing annotations from real-world scenes and the number of humans is comparatively larger than other amodal perception datasets. Based on this dataset, experiments demonstrate that our method performs over the state-of-the-art techniques in both tasks of mask completion and content recovery. Our AHP dataset is available at \url{https://sydney0zq.github.io/ahp/}.

* 11 pages, 6 figures, conference 
Viaarxiv icon

Partially-Typed NER Datasets Integration: Connecting Practice to Theory

May 01, 2020
Shi Zhi, Liyuan Liu, Yu Zhang, Shiyin Wang, Qi Li, Chao Zhang, Jiawei Han

Figure 1 for Partially-Typed NER Datasets Integration: Connecting Practice to Theory
Figure 2 for Partially-Typed NER Datasets Integration: Connecting Practice to Theory
Figure 3 for Partially-Typed NER Datasets Integration: Connecting Practice to Theory
Figure 4 for Partially-Typed NER Datasets Integration: Connecting Practice to Theory

While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them. Instead of relying on fully-typed NER datasets, many efforts have been made to leverage multiple partially-typed ones for training and allow the resulting model to cover a full type set. However, there is neither guarantee on the quality of integrated datasets, nor guidance on the design of training algorithms. Here, we conduct a systematic analysis and comparison between partially-typed NER datasets and fully-typed ones, in both theoretical and empirical manner. Firstly, we derive a bound to establish that models trained with partially-typed annotations can reach a similar performance with the ones trained with fully-typed annotations, which also provides guidance on the algorithm design. Moreover, we conduct controlled experiments, which shows partially-typed datasets leads to similar performance with the model trained with the same amount of fully-typed annotations

* Work in progress 
Viaarxiv icon

Fast Top-k Area Topics Extraction with Knowledge Base

Dec 04, 2017
Fang Zhang, Xiaochen Wang, Jingfei Han, Jie Tang, Shiyin Wang, Marie-Francine Moens

Figure 1 for Fast Top-k Area Topics Extraction with Knowledge Base
Figure 2 for Fast Top-k Area Topics Extraction with Knowledge Base
Figure 3 for Fast Top-k Area Topics Extraction with Knowledge Base

What are the most popular research topics in Artificial Intelligence (AI)? We formulate the problem as extracting top-$k$ topics that can best represent a given area with the help of knowledge base. We theoretically prove that the problem is NP-hard and propose an optimization model, FastKATE, to address this problem by combining both explicit and latent representations for each topic. We leverage a large-scale knowledge base (Wikipedia) to generate topic embeddings using neural networks and use this kind of representations to help capture the representativeness of topics for given areas. We develop a fast heuristic algorithm to efficiently solve the problem with a provable error bound. We evaluate the proposed model on three real-world datasets. Experimental results demonstrate our model's effectiveness, robustness, real-timeness (return results in $<1$s), and its superiority over several alternative methods.

Viaarxiv icon