Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yingbin Zheng

MultiColor: Image Colorization by Learning from Multiple Color Spaces

Aug 08, 2024

Xiangcheng Du, Zhao Zhou, Yanlong Wang, Zhuoyao Wang, Yingbin Zheng, Cheng Jin

Abstract:Deep networks have shown impressive performance in the image restoration tasks, such as image colorization. However, we find that previous approaches rely on the digital representation from single color model with a specific mapping function, a.k.a., color space, during the colorization pipeline. In this paper, we first investigate the modeling of different color spaces, and find each of them exhibiting distinctive characteristics with unique distribution of colors. The complementarity among multiple color spaces leads to benefits for the image colorization task. We present MultiColor, a new learning-based approach to automatically colorize grayscale images that combines clues from multiple color spaces. Specifically, we employ a set of dedicated colorization modules for individual color space. Within each module, a transformer decoder is first employed to refine color query embeddings and then a color mapper produces color channel prediction using the embeddings and semantic features. With these predicted color channels representing various color spaces, a complementary network is designed to exploit the complementarity and generate pleasing and reasonable colorized images. We conduct extensive experiments on real-world datasets, and the results demonstrate superior performance over the state-of-the-arts.

Via

Access Paper or Ask Questions

Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Jul 08, 2024

Lintao Zhang, Xiangcheng Du, LeoWu TomyEnrique, Yiqun Wang, Yingbin Zheng, Cheng Jin

Figure 1 for Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Figure 2 for Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Figure 3 for Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Figure 4 for Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Abstract:For image inpainting, the existing Denoising Diffusion Probabilistic Model (DDPM) based method i.e. RePaint can produce high-quality images for any inpainting form. It utilizes a pre-trained DDPM as a prior and generates inpainting results by conditioning on the reverse diffusion process, namely denoising process. However, this process is significantly time-consuming. In this paper, we propose an efficient DDPM-based image inpainting method which includes three speed-up strategies. First, we utilize a pre-trained Light-Weight Diffusion Model (LWDM) to reduce the number of parameters. Second, we introduce a skip-step sampling scheme of Denoising Diffusion Implicit Models (DDIM) for the denoising process. Finally, we propose Coarse-to-Fine Sampling (CFS), which speeds up inference by reducing image resolution in the coarse stage and decreasing denoising timesteps in the refinement stage. We conduct extensive experiments on both faces and general-purpose image inpainting tasks, and our method achieves competitive performance with approximately 60 times speedup.

* The code is avaliable at: https://github.com/linghuyuhangyuan/M2S

Via

Access Paper or Ask Questions

Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Jul 03, 2024

Yiqun Wang, Zhao Zhou, Xiangcheng Du, Xingjiao Wu, Yingbin Zheng, Cheng Jin

Figure 1 for Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Figure 2 for Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Figure 3 for Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Figure 4 for Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Abstract:When dealing with the task of fine-grained scene image classification, most previous works lay much emphasis on global visual features when doing multi-modal feature fusion. In other words, models are deliberately designed based on prior intuitions about the importance of different modalities. In this paper, we present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter), trying to make the model learn the importance of different modalities in different cases adaptively, without giving a prior setting in the model architecture. More specifically, we eliminate the modal differences in distribution and then use a modality-agnostic Transformer encoder for a semantic-level feature fusion. Our experiments demonstrate that MAA achieves state-of-the-art results on benchmarks by applying the same modalities with previous methods. Besides, it is worth mentioning that new modalities can be easily added when using MAA and further boost the performance. Code is available at https://github.com/quniLcs/MAA.

Via

Access Paper or Ask Questions

DDT: Dual-branch Deformable Transformer for Image Denoising

Apr 13, 2023

Kangliang Liu, Xiangcheng Du, Sijie Liu, Yingbin Zheng, Xingjiao Wu, Cheng Jin

Figure 1 for DDT: Dual-branch Deformable Transformer for Image Denoising

Figure 2 for DDT: Dual-branch Deformable Transformer for Image Denoising

Figure 3 for DDT: Dual-branch Deformable Transformer for Image Denoising

Figure 4 for DDT: Dual-branch Deformable Transformer for Image Denoising

Abstract:Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases. However, directly applying the transformer structure to remove noise is challenging because its complexity grows quadratically with the spatial resolution. In this paper, we propose an efficient Dual-branch Deformable Transformer (DDT) denoising network which captures both local and global interactions in parallel. We divide features with a fixed patch size and a fixed number of patches in local and global branches, respectively. In addition, we apply deformable attention operation in both branches, which helps the network focus on more important regions and further reduces computational complexity. We conduct extensive experiments on real-world and synthetic denoising tasks, and the proposed DDT achieves state-of-the-art performance with significantly fewer computational costs.

* The code is avaliable at: https://github.com/Merenguelkl/DDT

Via

Access Paper or Ask Questions

Aggregated Text Transformer for Scene Text Detection

Nov 25, 2022

Zhao Zhou, Xiangcheng Du, Yingbin Zheng, Cheng Jin

Figure 1 for Aggregated Text Transformer for Scene Text Detection

Figure 2 for Aggregated Text Transformer for Scene Text Detection

Figure 3 for Aggregated Text Transformer for Scene Text Detection

Figure 4 for Aggregated Text Transformer for Scene Text Detection

Abstract:This paper explores the multi-scale aggregation strategy for scene text detection in natural images. We present the Aggregated Text TRansformer(ATTR), which is designed to represent texts in scene images with a multi-scale self-attention mechanism. Starting from the image pyramid with multiple resolutions, the features are first extracted at different scales with shared weight and then fed into an encoder-decoder architecture of Transformer. The multi-scale image representations are robust and contain rich information on text contents of various sizes. The text Transformer aggregates these features to learn the interaction across different scales and improve text representation. The proposed method detects scene texts by representing each text instance as an individual binary mask, which is tolerant of curve texts and regions with dense instances. Extensive experiments on public scene text detection datasets demonstrate the effectiveness of the proposed framework.

Via

Access Paper or Ask Questions

Progressive Scene Text Erasing with Self-Supervision

Jul 23, 2022

Xiangcheng Du, Zhao Zhou, Yingbin Zheng, Xingjiao Wu, Tianlong Ma, Cheng Jin

Figure 1 for Progressive Scene Text Erasing with Self-Supervision

Figure 2 for Progressive Scene Text Erasing with Self-Supervision

Figure 3 for Progressive Scene Text Erasing with Self-Supervision

Figure 4 for Progressive Scene Text Erasing with Self-Supervision

Abstract:Scene text erasing seeks to erase text contents from scene images and current state-of-the-art text erasing models are trained on large-scale synthetic data. Although data synthetic engines can provide vast amounts of annotated training samples, there are differences between synthetic and real-world data. In this paper, we employ self-supervision for feature representation on unlabeled real-world scene text images. A novel pretext task is designed to keep consistent among text stroke masks of image variants. We design the Progressive Erasing Network in order to remove residual texts. The scene text is erased progressively by leveraging the intermediate generated results which provide the foundation for subsequent higher quality results. Experiments show that our method significantly improves the generalization of the text erasing task and achieves state-of-the-art performance on public benchmarks.

Via

Access Paper or Ask Questions

Cross-Domain Document Layout Analysis via Unsupervised Document Style Guide

Jan 24, 2022

Xingjiao Wu, Luwei Xiao, Xiangcheng Du, Yingbin Zheng, Xin Li, Tianlong Ma, Liang He

Figure 1 for Cross-Domain Document Layout Analysis via Unsupervised Document Style Guide

Figure 2 for Cross-Domain Document Layout Analysis via Unsupervised Document Style Guide

Figure 3 for Cross-Domain Document Layout Analysis via Unsupervised Document Style Guide

Figure 4 for Cross-Domain Document Layout Analysis via Unsupervised Document Style Guide

Abstract:The document layout analysis (DLA) aims to decompose document images into high-level semantic areas (i.e., figures, tables, texts, and background). Creating a DLA framework with strong generalization capabilities is a challenge due to document objects are diversity in layout, size, aspect ratio, texture, etc. Many researchers devoted this challenge by synthesizing data to build large training sets. However, the synthetic training data has different styles and erratic quality. Besides, there is a large gap between the source data and the target data. In this paper, we propose an unsupervised cross-domain DLA framework based on document style guidance. We integrated the document quality assessment and the document cross-domain analysis into a unified framework. Our framework is composed of three components, Document Layout Generator (GLD), Document Elements Decorator(GED), and Document Style Discriminator(DSD). The GLD is used to document layout generates, the GED is used to document layout elements fill, and the DSD is used to document quality assessment and cross-domain guidance. First, we apply GLD to predict the positions of the generated document. Then, we design a novel algorithm based on aesthetic guidance to fill the document positions. Finally, we use contrastive learning to evaluate the quality assessment of the document. Besides, we design a new strategy to change the document quality assessment component into a document cross-domain style guide component. Our framework is an unsupervised document layout analysis framework. We have proved through numerous experiments that our proposed method has achieved remarkable performance.

Via

Access Paper or Ask Questions

Cascaded Detail-Preserving Networks for Super-Resolution of Document Images

Nov 25, 2019

Zhichao Fu, Yu Kong, Yingbin Zheng, Hao Ye, Wenxin Hu, Jing Yang, Liang He

Figure 1 for Cascaded Detail-Preserving Networks for Super-Resolution of Document Images

Figure 2 for Cascaded Detail-Preserving Networks for Super-Resolution of Document Images

Figure 3 for Cascaded Detail-Preserving Networks for Super-Resolution of Document Images

Figure 4 for Cascaded Detail-Preserving Networks for Super-Resolution of Document Images

Abstract:The accuracy of OCR is usually affected by the quality of the input document image and different kinds of marred document images hamper the OCR results. Among these scenarios, the low-resolution image is a common and challenging case. In this paper, we propose the cascaded networks for document image super-resolution. Our model is composed by the Detail-Preserving Networks with small magnification. The loss function with perceptual terms is designed to simultaneously preserve the original patterns and enhance the edge of the characters. These networks are trained with the same architecture and different parameters and then assembled into a pipeline model with a larger magnification. The low-resolution images can upscale gradually by passing through each Detail-Preserving Network until the final high-resolution images. Through extensive experiments on two scanning document image datasets, we demonstrate that the proposed approach outperforms recent state-of-the-art image super-resolution methods, and combining it with standard OCR system lead to signification improvements on the recognition results.

Via

Access Paper or Ask Questions

Scene Text Recognition with Temporal Convolutional Encoder

Nov 04, 2019

Xiangcheng Du, Tianlong Ma, Yingbin Zheng, Hao Ye, Xingjiao Wu, Liang He

Figure 1 for Scene Text Recognition with Temporal Convolutional Encoder

Figure 2 for Scene Text Recognition with Temporal Convolutional Encoder

Figure 3 for Scene Text Recognition with Temporal Convolutional Encoder

Figure 4 for Scene Text Recognition with Temporal Convolutional Encoder

Abstract:Texts from scene images typically consist of several characters and exhibit a characteristic sequence structure. Existing methods capture the structure with the sequence-to-sequence models by an encoder to have the visual representations and then a decoder to translate the features into the label sequence. In this paper, we study text recognition framework by considering the long-term temporal dependencies in the encoder stage. We demonstrate that the proposed Temporal Convolutional Encoder with increased sequential extents improves the accuracy of text recognition. We also study the impact of different attention modules in convolutional blocks for learning accurate text representations. We conduct comparisons on seven datasets and the experiments demonstrate the effectiveness of our proposed approach.

Via

Access Paper or Ask Questions

Edge-Aware Deep Image Deblurring

Jul 04, 2019

Zhichao Fu, Yingbin Zheng, Hao Ye, Yu Kong, Jing Yang, Liang He

Figure 1 for Edge-Aware Deep Image Deblurring

Figure 2 for Edge-Aware Deep Image Deblurring

Figure 3 for Edge-Aware Deep Image Deblurring

Figure 4 for Edge-Aware Deep Image Deblurring

Abstract:Image deblurring is a fundamental and challenging low-level vision problem. Previous vision research indicates that edge structure in natural scenes is one of the most important factors to estimate the abilities of human visual perception. In this paper, we resort to human visual demands of sharp edges and propose a two-phase edge-aware deep network to improve deep image deblurring. An edge detection convolutional subnet is designed in the first phase and a residual fully convolutional deblur subnet is then used for generating deblur results. The introduction of the edge-aware network enables our model with the specific capacity of enhancing images with sharp edges. We successfully apply our framework on standard benchmarks and promising results are achieved by our proposed deblur model.

Via

Access Paper or Ask Questions