Alert button
Picture for Tong Zheng

Tong Zheng

Alert button

EIT: Enhanced Interactive Transformer

Dec 20, 2022
Tong Zheng, Bei Li, Huiwen Bao, Tong Xiao, Jingbo Zhu

Figure 1 for EIT: Enhanced Interactive Transformer
Figure 2 for EIT: Enhanced Interactive Transformer
Figure 3 for EIT: Enhanced Interactive Transformer
Figure 4 for EIT: Enhanced Interactive Transformer

In this paper, we propose a novel architecture, the Enhanced Interactive Transformer (EIT), to address the issue of head degradation in self-attention mechanisms. Our approach replaces the traditional multi-head self-attention mechanism with the Enhanced Multi-Head Attention (EMHA) mechanism, which relaxes the one-to-one mapping constraint among queries and keys, allowing each query to attend to multiple keys. Furthermore, we introduce two interaction models, Inner-Subspace Interaction and Cross-Subspace Interaction, to fully utilize the many-to-many mapping capabilities of EMHA. Extensive experiments on a wide range of tasks (e.g. machine translation, abstractive summarization, grammar correction, language modelling and brain disease automatic diagnosis) show its superiority with a very modest increase in model size.

* 25 pages, 21 figures 
Viaarxiv icon

Learning Multiscale Transformer Models for Sequence Generation

Jun 19, 2022
Bei Li, Tong Zheng, Yi Jing, Chengbo Jiao, Tong Xiao, Jingbo Zhu

Figure 1 for Learning Multiscale Transformer Models for Sequence Generation
Figure 2 for Learning Multiscale Transformer Models for Sequence Generation
Figure 3 for Learning Multiscale Transformer Models for Sequence Generation
Figure 4 for Learning Multiscale Transformer Models for Sequence Generation

Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism. For example, restricting the receptive field across heads or extracting local fine-grained features via convolutions. However, most of existing works directly modeled local features but ignored the word-boundary information. This results in redundant and ambiguous attention distributions, which lacks of interpretability. In this work, we define those scales in different linguistic units, including sub-words, words and phrases. We built a multiscale Transformer model by establishing relationships among scales based on word-boundary information and phrase-level prior knowledge. The proposed \textbf{U}niversal \textbf{M}ulti\textbf{S}cale \textbf{T}ransformer, namely \textsc{Umst}, was evaluated on two sequence generation tasks. Notably, it yielded consistent performance gains over the strong baseline on several test sets without sacrificing the efficiency.

* accepted by ICML2022 
Viaarxiv icon

COVID-19 Infection Segmentation from Chest CT Images Based on Scale Uncertainty

Jan 09, 2022
Masahiro Oda, Tong Zheng, Yuichiro Hayashi, Yoshito Otake, Masahiro Hashimoto, Toshiaki Akashi, Shigeki Aoki, Kensaku Mori

Figure 1 for COVID-19 Infection Segmentation from Chest CT Images Based on Scale Uncertainty
Figure 2 for COVID-19 Infection Segmentation from Chest CT Images Based on Scale Uncertainty
Figure 3 for COVID-19 Infection Segmentation from Chest CT Images Based on Scale Uncertainty
Figure 4 for COVID-19 Infection Segmentation from Chest CT Images Based on Scale Uncertainty

This paper proposes a segmentation method of infection regions in the lung from CT volumes of COVID-19 patients. COVID-19 spread worldwide, causing many infected patients and deaths. CT image-based diagnosis of COVID-19 can provide quick and accurate diagnosis results. An automated segmentation method of infection regions in the lung provides a quantitative criterion for diagnosis. Previous methods employ whole 2D image or 3D volume-based processes. Infection regions have a considerable variation in their sizes. Such processes easily miss small infection regions. Patch-based process is effective for segmenting small targets. However, selecting the appropriate patch size is difficult in infection region segmentation. We utilize the scale uncertainty among various receptive field sizes of a segmentation FCN to obtain infection regions. The receptive field sizes can be defined as the patch size and the resolution of volumes where patches are clipped from. This paper proposes an infection segmentation network (ISNet) that performs patch-based segmentation and a scale uncertainty-aware prediction aggregation method that refines the segmentation result. We design ISNet to segment infection regions that have various intensity values. ISNet has multiple encoding paths to process patch volumes normalized by multiple intensity ranges. We collect prediction results generated by ISNets having various receptive field sizes. Scale uncertainty among the prediction results is extracted by the prediction aggregation method. We use an aggregation FCN to generate a refined segmentation result considering scale uncertainty among the predictions. In our experiments using 199 chest CT volumes of COVID-19 cases, the prediction aggregation method improved the dice similarity score from 47.6% to 62.1%.

* DCL 2021, PPML 2021, LL-COVID19 2021, CLIP 2021, Lecture Notes in Computer Science (LNCS) 12969, pp.88-97  
* Accepted paper as a oral presentation at CILP2021, 10th MICCAI CLIP Workshop 
Viaarxiv icon

Micro CT Image-Assisted Cross Modality Super-Resolution of Clinical CT Images Utilizing Synthesized Training Dataset

Oct 20, 2020
Tong Zheng, Hirohisa Oda, Masahiro Oda, Shota Nakamura, Masaki Mori, Hirotsugu Takabatake, Hiroshi Natori, Kensaku Mori

Figure 1 for Micro CT Image-Assisted Cross Modality Super-Resolution of Clinical CT Images Utilizing Synthesized Training Dataset
Figure 2 for Micro CT Image-Assisted Cross Modality Super-Resolution of Clinical CT Images Utilizing Synthesized Training Dataset
Figure 3 for Micro CT Image-Assisted Cross Modality Super-Resolution of Clinical CT Images Utilizing Synthesized Training Dataset
Figure 4 for Micro CT Image-Assisted Cross Modality Super-Resolution of Clinical CT Images Utilizing Synthesized Training Dataset

This paper proposes a novel, unsupervised super-resolution (SR) approach for performing the SR of a clinical CT into the resolution level of a micro CT ($\mu$CT). The precise non-invasive diagnosis of lung cancer typically utilizes clinical CT data. Due to the resolution limitations of clinical CT (about $0.5 \times 0.5 \times 0.5$ mm$^3$), it is difficult to obtain enough pathological information such as the invasion area at alveoli level. On the other hand, $\mu$CT scanning allows the acquisition of volumes of lung specimens with much higher resolution ($50 \times 50 \times 50 \mu {\rm m}^3$ or higher). Thus, super-resolution of clinical CT volume may be helpful for diagnosis of lung cancer. Typical SR methods require aligned pairs of low-resolution (LR) and high-resolution (HR) images for training. Unfortunately, obtaining paired clinical CT and $\mu$CT volumes of human lung tissues is infeasible. Unsupervised SR methods are required that do not need paired LR and HR images. In this paper, we create corresponding clinical CT-$\mu$CT pairs by simulating clinical CT images from $\mu$CT images by modified CycleGAN. After this, we use simulated clinical CT-$\mu$CT image pairs to train an SR network based on SRGAN. Finally, we use the trained SR network to perform SR of the clinical CT images. We compare our proposed method with another unsupervised SR method for clinical CT images named SR-CycleGAN. Experimental results demonstrate that the proposed method can successfully perform SR of clinical CT images of lung cancer patients with $\mu$CT level resolution, and quantitatively and qualitatively outperformed conventional method (SR-CycleGAN), improving the SSIM (structure similarity) form 0.40 to 0.51.

Viaarxiv icon

Multi-modality super-resolution loss for GAN-based super-resolution of clinical CT images using micro CT image database

Dec 30, 2019
Tong Zheng, Hirohisa Oda, Takayasu Moriya, Shota Nakamura, Masahiro Oda, Masaki Mori, Horitsugu Takabatake, Hiroshi Natori, Kensaku Mori

Figure 1 for Multi-modality super-resolution loss for GAN-based super-resolution of clinical CT images using micro CT image database
Figure 2 for Multi-modality super-resolution loss for GAN-based super-resolution of clinical CT images using micro CT image database

This paper newly introduces multi-modality loss function for GAN-based super-resolution that can maintain image structure and intensity on unpaired training dataset of clinical CT and micro CT volumes. Precise non-invasive diagnosis of lung cancer mainly utilizes 3D multidetector computed-tomography (CT) data. On the other hand, we can take micro CT images of resected lung specimen in 50 micro meter or higher resolution. However, micro CT scanning cannot be applied to living human imaging. For obtaining highly detailed information such as cancer invasion area from pre-operative clinical CT volumes of lung cancer patients, super-resolution (SR) of clinical CT volumes to $\mu$CT level might be one of substitutive solutions. While most SR methods require paired low- and high-resolution images for training, it is infeasible to obtain precisely paired clinical CT and micro CT volumes. We aim to propose unpaired SR approaches for clincial CT using micro CT images based on unpaired image translation methods such as CycleGAN or UNIT. Since clinical CT and micro CT are very different in structure and intensity, direct application of GAN-based unpaired image translation methods in super-resolution tends to generate arbitrary images. Aiming to solve this problem, we propose new loss function called multi-modality loss function to maintain the similarity of input images and corresponding output images in super-resolution task. Experimental results demonstrated that the newly proposed loss function made CycleGAN and UNIT to successfully perform SR of clinical CT images of lung cancer patients into micro CT level resolution, while original CycleGAN and UNIT failed in super-resolution.

* 6 pages, 2 figures 
Viaarxiv icon

DeepIlluminance: Contextual Illuminance Estimation via Deep Neural Networks

May 12, 2019
Jun Zhang, Tong Zheng, Shengping Zhang, Meng Wang

Figure 1 for DeepIlluminance: Contextual Illuminance Estimation via Deep Neural Networks
Figure 2 for DeepIlluminance: Contextual Illuminance Estimation via Deep Neural Networks
Figure 3 for DeepIlluminance: Contextual Illuminance Estimation via Deep Neural Networks
Figure 4 for DeepIlluminance: Contextual Illuminance Estimation via Deep Neural Networks

Computational color constancy refers to the estimation of the scene illumination and makes the perceived color relatively stable under varying illumination. In the past few years, deep Convolutional Neural Networks (CNNs) have delivered superior performance in illuminant estimation. Several representative methods formulate it as a multi-label prediction problem by learning the local appearance of image patches using CNNs. However, these approaches inevitably make incorrect estimations for the ambiguous patches affected by their neighborhood contexts. Inaccurate local estimates are likely to bring in degraded performance when combining into a global prediction. To address the above issues, we propose a contextual deep network for patch-based illuminant estimation equipped with refinement. First, the contextual net with a center-surround architecture extracts local contextual features from image patches, and generates initial illuminant estimates and the corresponding color corrected patches. The patches are sampled based on the observation that pixels with large color differences describe the illumination well. Then, the refinement net integrates the input patches with the corrected patches in conjunction with the use of intermediate features to improve the performance. To train such a network with numerous parameters, we propose a stage-wise training strategy, in which the features and the predicted illuminant from previous stages are provided to the next learning stage with more finer estimates recovered. Experiments show that our approach obtains competitive performance on two illuminant estimation benchmarks.

* 11 pages, 7 figures 
Viaarxiv icon