Alert button
Picture for Zhiming Wang

Zhiming Wang

Alert button

SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations

Aug 17, 2023
Zhiming Wang, Lin Gu, Feng Lu

Figure 1 for SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations
Figure 2 for SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations
Figure 3 for SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations
Figure 4 for SRMAE: Masked Image Modeling for Scale-Invariant Deep Representations

Due to the prevalence of scale variance in nature images, we propose to use image scale as a self-supervised signal for Masked Image Modeling (MIM). Our method involves selecting random patches from the input image and downsampling them to a low-resolution format. Our framework utilizes the latest advances in super-resolution (SR) to design the prediction head, which reconstructs the input from low-resolution clues and other patches. After 400 epochs of pre-training, our Super Resolution Masked Autoencoders (SRMAE) get an accuracy of 82.1% on the ImageNet-1K task. Image scale signal also allows our SRMAE to capture scale invariance representation. For the very low resolution (VLR) recognition task, our model achieves the best performance, surpassing DeriveNet by 1.3%. Our method also achieves an accuracy of 74.84% on the task of recognizing low-resolution facial expressions, surpassing the current state-of-the-art FMD by 9.48%.

Viaarxiv icon

Generalized Expectation Maximization Framework for Blind Image Super Resolution

May 23, 2023
Yuxiao Li, Zhiming Wang, Yuan Shen

Figure 1 for Generalized Expectation Maximization Framework for Blind Image Super Resolution
Figure 2 for Generalized Expectation Maximization Framework for Blind Image Super Resolution
Figure 3 for Generalized Expectation Maximization Framework for Blind Image Super Resolution
Figure 4 for Generalized Expectation Maximization Framework for Blind Image Super Resolution

Learning-based methods for blind single image super resolution (SISR) conduct the restoration by a learned mapping between high-resolution (HR) images and their low-resolution (LR) counterparts degraded with arbitrary blur kernels. However, these methods mostly require an independent step to estimate the blur kernel, leading to error accumulation between steps. We propose an end-to-end learning framework for the blind SISR problem, which enables image restoration within a unified Bayesian framework with either full- or semi-supervision. The proposed method, namely SREMN, integrates learning techniques into the generalized expectation-maximization (GEM) algorithm and infers HR images from the maximum likelihood estimation (MLE). Extensive experiments show the superiority of the proposed method with comparison to existing work and novelty in semi-supervised learning.

Viaarxiv icon

Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Nov 07, 2022
Andrey Ignatov, Radu Timofte, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Ziyao Yi, Yan Xiang, Zibin Liu, Shaoqing Li, Keming Shi, Dehui Kong, Ke Xu, Minsu Kwon, Yaqi Wu, Jiesi Zheng, Zhihao Fan, Xun Wu, Feng Zhang, Albert No, Minhyeok Cho, Zewen Chen, Xiaze Zhang, Ran Li, Juan Wang, Zhiming Wang, Marcos V. Conde, Ui-Jin Choi, Georgy Perevozchikov, Egor Ershov, Zheng Hui, Mengchuan Dong, Xin Lou, Wei Zhou, Cong Pang, Haina Qin, Mingxuan Cai

Figure 1 for Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report
Figure 2 for Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report
Figure 3 for Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report
Figure 4 for Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.

Viaarxiv icon

Get away from Style: Category-Guided Domain Adaptation for Semantic Segmentation

Mar 29, 2021
Yantian Luo, Zhiming Wang, Danlan Huang, Ning Ge, Jianhua Lu

Figure 1 for Get away from Style: Category-Guided Domain Adaptation for Semantic Segmentation
Figure 2 for Get away from Style: Category-Guided Domain Adaptation for Semantic Segmentation
Figure 3 for Get away from Style: Category-Guided Domain Adaptation for Semantic Segmentation
Figure 4 for Get away from Style: Category-Guided Domain Adaptation for Semantic Segmentation

Unsupervised domain adaptation (UDA) becomes more and more popular in tackling real-world problems without ground truth of the target domain. Though a mass of tedious annotation work is not needed, UDA unavoidably faces the problem how to narrow the domain discrepancy to boost the transferring performance. In this paper, we focus on UDA for semantic segmentation task. Firstly, we propose a style-independent content feature extraction mechanism to keep the style information of extracted features in the similar space, since the style information plays a extremely slight role for semantic segmentation compared with the content part. Secondly, to keep the balance of pseudo labels on each category, we propose a category-guided threshold mechanism to choose category-wise pseudo labels for self-supervised learning. The experiments are conducted using GTA5 as the source domain, Cityscapes as the target domain. The results show that our model outperforms the state-of-the-arts with a noticeable gain on cross-domain adaptation tasks.

Viaarxiv icon

deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search

Mar 24, 2021
Chen Zeng, Yue Yu, Shanshan Li, Xin Xia, Zhiming Wang, Mingyang Geng, Bailin Xiao, Wei Dong, Xiangke Liao

Figure 1 for deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search
Figure 2 for deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search
Figure 3 for deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search
Figure 4 for deGraphCS: Embedding Variable-based Flow Graph for Neural Code Search

With the rapid increase in the amount of public code repositories, developers maintain a great desire to retrieve precise code snippets by using natural language. Despite existing deep learning based approaches(e.g., DeepCS and MMAN) have provided the end-to-end solutions (i.e., accepts natural language as queries and shows related code fragments retrieved directly from code corpus), the accuracy of code search in the large-scale repositories is still limited by the code representation (e.g., AST) and modeling (e.g., directly fusing the features in the attention stage). In this paper, we propose a novel learnable deep Graph for Code Search (calleddeGraphCS), to transfer source code into variable-based flow graphs based on the intermediate representation technique, which can model code semantics more precisely compared to process the code as text directly or use the syntactic tree representation. Furthermore, we propose a well-designed graph optimization mechanism to refine the code representation, and apply an improved gated graph neural network to model variable-based flow graphs. To evaluate the effectiveness of deGraphCS, we collect a large-scale dataset from GitHub containing 41,152 code snippets written in C language, and reproduce several typical deep code search methods for comparison. Besides, we design a qualitative user study to verify the practical value of our approach. The experimental results have shown that deGraphCS can achieve state-of-the-art performances, and accurately retrieve code snippets satisfying the needs of the users.

* 32 pages 
Viaarxiv icon

G-DARTS-A: Groups of Channel Parallel Sampling with Attention

Oct 16, 2020
Zhaowen Wang, Wei Zhang, Zhiming Wang

Figure 1 for G-DARTS-A: Groups of Channel Parallel Sampling with Attention
Figure 2 for G-DARTS-A: Groups of Channel Parallel Sampling with Attention
Figure 3 for G-DARTS-A: Groups of Channel Parallel Sampling with Attention
Figure 4 for G-DARTS-A: Groups of Channel Parallel Sampling with Attention

Differentiable Architecture Search (DARTS) provides a baseline for searching effective network architectures based gradient, but it is accompanied by huge computational overhead in searching and training network architecture. Recently, many novel works have improved DARTS. Particularly, Partially-Connected DARTS(PC-DARTS) proposed the partial channel sampling technique which achieved good results. In this work, we found that the backbone provided by DARTS is prone to overfitting. To mitigate this problem, we propose an approach named Group-DARTS with Attention (G-DARTS-A), using multiple groups of channels for searching. Inspired by the partially sampling strategy of PC-DARTS, we use groups channels to sample the super-network to perform a more efficient search while maintaining the relative integrity of the network information. In order to relieve the competition between channel groups and keep channel balance, we follow the attention mechanism in Squeeze-and-Excitation Network. Each group of channels shares defined weights thence they can provide different suggestion for searching. The searched architecture is more powerful and better adapted to different deployments. Specifically, by only using the attention module on DARTS we achieved an error rate of 2.82%/16.36% on CIFAR10/100 with 0.3GPU-days for search process on CIFAR10. Apply our G-DARTS-A to DARTS/PC-DARTS, an error rate of 2.57%/2.61% on CIFAR10 with 0.5/0.4 GPU-days is achieved.

Viaarxiv icon

Small-footprint Keyword Spotting Using Deep Neural Network and Connectionist Temporal Classifier

Sep 12, 2017
Zhiming Wang, Xiaolong Li, Jun Zhou

Figure 1 for Small-footprint Keyword Spotting Using Deep Neural Network and Connectionist Temporal Classifier
Figure 2 for Small-footprint Keyword Spotting Using Deep Neural Network and Connectionist Temporal Classifier
Figure 3 for Small-footprint Keyword Spotting Using Deep Neural Network and Connectionist Temporal Classifier
Figure 4 for Small-footprint Keyword Spotting Using Deep Neural Network and Connectionist Temporal Classifier

Mainly for the sake of solving the lack of keyword-specific data, we propose one Keyword Spotting (KWS) system using Deep Neural Network (DNN) and Connectionist Temporal Classifier (CTC) on power-constrained small-footprint mobile devices, taking full advantage of general corpus from continuous speech recognition which is of great amount. DNN is to directly predict the posterior of phoneme units of any personally customized key-phrase, and CTC to produce a confidence score of the given phoneme sequence as responsive decision-making mechanism. The CTC-KWS has competitive performance in comparison with purely DNN based keyword specific KWS, but not increasing any computational complexity.

Viaarxiv icon