Alert button
Picture for Zi Yang

Zi Yang

Alert button

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Jun 01, 2023
Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang

Figure 1 for Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding
Figure 2 for Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding
Figure 3 for Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding
Figure 4 for Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Fine-tuned transformer models have shown superior performances in many natural language tasks. However, the large model size prohibits deploying high-performance transformer models on resource-constrained devices. This paper proposes a quantization-aware tensor-compressed training approach to reduce the model size, arithmetic operations, and ultimately runtime latency of transformer-based models. We compress the embedding and linear layers of transformers into small low-rank tensor cores, which significantly reduces model parameters. A quantization-aware training with learnable scale factors is used to further obtain low-precision representations of the tensor-compressed models. The developed approach can be used for both end-to-end training and distillation-based training. To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer. The performance is demonstrated in two natural language understanding tasks, showing up to $63\times$ compression ratio, little accuracy loss and remarkable inference and training speedup.

Viaarxiv icon

Leveraging Global Binary Masks for Structure Segmentation in Medical Images

May 13, 2022
Mahdieh Kazemimoghadam, Zi Yang, Lin Ma, Mingli Chen, Weiguo Lu, Xuejun Gu

Figure 1 for Leveraging Global Binary Masks for Structure Segmentation in Medical Images
Figure 2 for Leveraging Global Binary Masks for Structure Segmentation in Medical Images
Figure 3 for Leveraging Global Binary Masks for Structure Segmentation in Medical Images
Figure 4 for Leveraging Global Binary Masks for Structure Segmentation in Medical Images

Deep learning (DL) models for medical image segmentation are highly influenced by intensity variations of input images and lack generalization due to primarily utilizing pixels' intensity information for inference. Acquiring sufficient training data is another challenge limiting models' applications. We proposed to leverage the consistency of organs' anatomical shape and position information in medical images. We introduced a framework leveraging recurring anatomical patterns through global binary masks for organ segmentation. Two scenarios were studied.1) Global binary masks were the only model's (i.e. U-Net) input, forcing exclusively encoding organs' position and shape information for segmentation/localization.2) Global binary masks were incorporated as an additional channel functioning as position/shape clues to mitigate training data scarcity. Two datasets of the brain and heart CT images with their ground-truth were split into (26:10:10) and (12:3:5) for training, validation, and test respectively. Training exclusively on global binary masks led to Dice scores of 0.77(0.06) and 0.85(0.04), with the average Euclidian distance of 3.12(1.43)mm and 2.5(0.93)mm relative to the center of mass of the ground truth for the brain and heart structures respectively. The outcomes indicate that a surprising degree of position and shape information is encoded through global binary masks. Incorporating global binary masks led to significantly higher accuracy relative to the model trained on only CT images in small subsets of training data; the performance improved by 4.3-125.3% and 1.3-48.1% for 1-8 training cases of the brain and heart datasets respectively. The findings imply the advantages of utilizing global binary masks for building generalizable models and to compensate for training data scarcity.

Viaarxiv icon

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

Oct 05, 2020
Zi Lin, Jeremiah Zhe Liu, Zi Yang, Nan Hua, Dan Roth

Figure 1 for Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Figure 2 for Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Figure 3 for Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Figure 4 for Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

Traditional (unstructured) pruning methods for a Transformer model focus on regularizing the individual weights by penalizing them toward zero. In this work, we explore spectral-normalized identity priors (SNIP), a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping. Our method identifies and discards unimportant non-linear mappings in the residual connections by applying a thresholding operator on the function norm. It is applicable to any structured module, including a single attention head, an entire attention block, or a feed-forward subnetwork. Furthermore, we introduce spectral normalization to stabilize the distribution of the post-activation values of the Transformer layers, further improving the pruning effectiveness of the proposed methodology. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance. Specifically, we improve the performance over the state-of-the-art by 0.5 to 1.0% on average at 50% compression ratio.

* Findings of EMNLP 2020 
Viaarxiv icon

Towards a Human-like Open-Domain Chatbot

Feb 27, 2020
Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, Quoc V. Le

Figure 1 for Towards a Human-like Open-Domain Chatbot
Figure 2 for Towards a Human-like Open-Domain Chatbot
Figure 3 for Towards a Human-like Open-Domain Chatbot
Figure 4 for Towards a Human-like Open-Domain Chatbot

We present Meena, a multi-turn open-domain chatbot trained end-to-end on data mined and filtered from public domain social media conversations. This 2.6B parameter neural network is simply trained to minimize perplexity of the next token. We also propose a human evaluation metric called Sensibleness and Specificity Average (SSA), which captures key elements of a human-like multi-turn conversation. Our experiments show strong correlation between perplexity and SSA. The fact that the best perplexity end-to-end trained Meena scores high on SSA (72% on multi-turn evaluation) suggests that a human-level SSA of 86% is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79% SSA, 23% higher in absolute SSA than the existing chatbots we evaluated.

* 38 pages, 12 figures 
Viaarxiv icon

Breast Ultrasound Computer-Aided Diagnosis Using Structure-Aware Triplet Path Networks

Aug 09, 2019
Erlei Zhang, Zi Yang, Stephen Seiler, Mingli Chen, Weiguo Lu, Xuejun Gu

Figure 1 for Breast Ultrasound Computer-Aided Diagnosis Using Structure-Aware Triplet Path Networks
Figure 2 for Breast Ultrasound Computer-Aided Diagnosis Using Structure-Aware Triplet Path Networks
Figure 3 for Breast Ultrasound Computer-Aided Diagnosis Using Structure-Aware Triplet Path Networks
Figure 4 for Breast Ultrasound Computer-Aided Diagnosis Using Structure-Aware Triplet Path Networks

Breast ultrasound (US) is an effective imaging modality for breast cancer detec-tion and diagnosis. The structural characteristics of breast lesion play an im-portant role in Computer-Aided Diagnosis (CAD). In this paper, a novel struc-ture-aware triplet path networks (SATPN) was designed to integrate classifica-tion and two image reconstruction tasks to achieve accurate diagnosis on US im-ages with small training dataset. Specifically, we enhance clinically-approved breast lesion structure characteristics though converting original breast US imag-es to BIRADS-oriented feature maps (BFMs) with a distance-transformation coupled Gaussian filter. Then, the converted BFMs were used as the inputs of SATPN, which performed lesion classification task and two unsupervised stacked convolutional Auto-Encoder (SCAE) networks for benign and malignant image reconstruction tasks, independently. We trained the SATPN with an alter-native learning strategy by balancing image reconstruction error and classification label prediction error. At the test stage, the lesion label was determined by the weighted voting with reconstruction error and label prediction error. We com-pared the performance of the SATPN with TPN using original image as input and our previous developed semi-supervised deep learning methods using BFMs as inputs. Experimental results on two breast US datasets showed that SATPN ranked the best among the three networks, with classification accuracy around 93.5%. These findings indicated that SATPN is promising for effective breast US lesion CAD using small datasets.

* arXiv admin note: substantial text overlap with arXiv:1904.01076 
Viaarxiv icon

Best arm identification in multi-armed bandits with delayed feedback

Mar 29, 2018
Aditya Grover, Todor Markov, Peter Attia, Norman Jin, Nicholas Perkins, Bryan Cheong, Michael Chen, Zi Yang, Stephen Harris, William Chueh, Stefano Ermon

Figure 1 for Best arm identification in multi-armed bandits with delayed feedback
Figure 2 for Best arm identification in multi-armed bandits with delayed feedback

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework to model the relationship between partial and delayed feedback, and as a special case we introduce efficient algorithms for settings where the partial feedback are biased or unbiased estimators of the delayed feedback. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Our experiments in real-world settings, involving policy search and hyperparameter optimization in computational sustainability domains for fast charging of batteries and wildlife corridor construction, demonstrate that exploiting the structure of partial feedback can lead to significant improvements over baselines in both sequential and parallel MAB.

* AISTATS 2018 
Viaarxiv icon

Structural Embedding of Syntactic Trees for Machine Comprehension

Aug 31, 2017
Rui Liu, Junjie Hu, Wei Wei, Zi Yang, Eric Nyberg

Figure 1 for Structural Embedding of Syntactic Trees for Machine Comprehension
Figure 2 for Structural Embedding of Syntactic Trees for Machine Comprehension
Figure 3 for Structural Embedding of Syntactic Trees for Machine Comprehension
Figure 4 for Structural Embedding of Syntactic Trees for Machine Comprehension

Deep neural networks for machine comprehension typically utilizes only word or character embeddings without explicitly taking advantage of structured linguistic information such as constituency trees and dependency trees. In this paper, we propose structural embedding of syntactic trees (SEST), an algorithm framework to utilize structured information and encode them into vector representations that can boost the performance of algorithms for the machine comprehension. We evaluate our approach using a state-of-the-art neural attention model on the SQuAD dataset. Experimental results demonstrate that our model can accurately identify the syntactic boundaries of the sentences and extract answers that are syntactically coherent over the baseline methods.

Viaarxiv icon