Alert button
Picture for Zeyu Li

Zeyu Li

Alert button

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Nov 07, 2023
Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs.

Viaarxiv icon

CluCDD:Contrastive Dialogue Disentanglement via Clustering

Feb 16, 2023
Jingsheng Gao, Zeyu Li, Suncheng Xiang, Ting Liu, Yuzhuo Fu

Figure 1 for CluCDD:Contrastive Dialogue Disentanglement via Clustering
Figure 2 for CluCDD:Contrastive Dialogue Disentanglement via Clustering
Figure 3 for CluCDD:Contrastive Dialogue Disentanglement via Clustering
Figure 4 for CluCDD:Contrastive Dialogue Disentanglement via Clustering

A huge number of multi-participant dialogues happen online every day, which leads to difficulty in understanding the nature of dialogue dynamics for both humans and machines. Dialogue disentanglement aims at separating an entangled dialogue into detached sessions, thus increasing the readability of long disordered dialogue. Previous studies mainly focus on message-pair classification and clustering in two-step methods, which cannot guarantee the whole clustering performance in a dialogue. To address this challenge, we propose a simple yet effective model named CluCDD, which aggregates utterances by contrastive learning. More specifically, our model pulls utterances in the same session together and pushes away utterances in different ones. Then a clustering method is adopted to generate predicted clustering labels. Comprehensive experiments conducted on the Movie Dialogue dataset and IRC dataset demonstrate that our model achieves a new state-of-the-art result.

* 5 pages 
Viaarxiv icon

Recommend for a Reason: Unlocking the Power of Unsupervised Aspect-Sentiment Co-Extraction

Sep 07, 2021
Zeyu Li, Wei Cheng, Reema Kshetramade, John Houser, Haifeng Chen, Wei Wang

Figure 1 for Recommend for a Reason: Unlocking the Power of Unsupervised Aspect-Sentiment Co-Extraction
Figure 2 for Recommend for a Reason: Unlocking the Power of Unsupervised Aspect-Sentiment Co-Extraction
Figure 3 for Recommend for a Reason: Unlocking the Power of Unsupervised Aspect-Sentiment Co-Extraction
Figure 4 for Recommend for a Reason: Unlocking the Power of Unsupervised Aspect-Sentiment Co-Extraction

Compliments and concerns in reviews are valuable for understanding users' shopping interests and their opinions with respect to specific aspects of certain items. Existing review-based recommenders favor large and complex language encoders that can only learn latent and uninterpretable text representations. They lack explicit user attention and item property modeling, which however could provide valuable information beyond the ability to recommend items. Therefore, we propose a tightly coupled two-stage approach, including an Aspect-Sentiment Pair Extractor (ASPE) and an Attention-Property-aware Rating Estimator (APRE). Unsupervised ASPE mines Aspect-Sentiment pairs (AS-pairs) and APRE predicts ratings using AS-pairs as concrete aspect-level evidence. Extensive experiments on seven real-world Amazon Review Datasets demonstrate that ASPE can effectively extract AS-pairs which enable APRE to deliver superior accuracy over the leading baselines.

* 16 pages; Accepted to Findings of EMNLP-2021 
Viaarxiv icon

Powering Comparative Classification with Sentiment Analysis via Domain Adaptive Knowledge Transfer

Sep 07, 2021
Zeyu Li, Yilong Qin, Zihan Liu, Wei Wang

Figure 1 for Powering Comparative Classification with Sentiment Analysis via Domain Adaptive Knowledge Transfer
Figure 2 for Powering Comparative Classification with Sentiment Analysis via Domain Adaptive Knowledge Transfer
Figure 3 for Powering Comparative Classification with Sentiment Analysis via Domain Adaptive Knowledge Transfer
Figure 4 for Powering Comparative Classification with Sentiment Analysis via Domain Adaptive Knowledge Transfer

We study Comparative Preference Classification (CPC) which aims at predicting whether a preference comparison exists between two entities in a given sentence and, if so, which entity is preferred over the other. High-quality CPC models can significantly benefit applications such as comparative question answering and review-based recommendations. Among the existing approaches, non-deep learning methods suffer from inferior performances. The state-of-the-art graph neural network-based ED-GAT (Ma et al., 2020) only considers syntactic information while ignoring the critical semantic relations and the sentiments to the compared entities. We proposed sentiment Analysis Enhanced COmparative Network (SAECON) which improves CPC ac-curacy with a sentiment analyzer that learns sentiments to individual entities via domain adaptive knowledge transfer. Experiments on the CompSent-19 (Panchenko et al., 2019) dataset present a significant improvement on the F1 scores over the best existing CPC approaches.

* 13 pages; EMNLP-2021 Main Conference 
Viaarxiv icon

Towards Visual Explainable Active Learning for Zero-Shot Classification

Aug 15, 2021
Shichao Jia, Zeyu Li, Nuo Chen, Jiawan Zhang

Figure 1 for Towards Visual Explainable Active Learning for Zero-Shot Classification
Figure 2 for Towards Visual Explainable Active Learning for Zero-Shot Classification
Figure 3 for Towards Visual Explainable Active Learning for Zero-Shot Classification
Figure 4 for Towards Visual Explainable Active Learning for Zero-Shot Classification

Zero-shot classification is a promising paradigm to solve an applicable problem when the training classes and test classes are disjoint. Achieving this usually needs experts to externalize their domain knowledge by manually specifying a class-attribute matrix to define which classes have which attributes. Designing a suitable class-attribute matrix is the key to the subsequent procedure, but this design process is tedious and trial-and-error with no guidance. This paper proposes a visual explainable active learning approach with its design and implementation called semantic navigator to solve the above problems. This approach promotes human-AI teaming with four actions (ask, explain, recommend, respond) in each interaction loop. The machine asks contrastive questions to guide humans in the thinking process of attributes. A novel visualization called semantic map explains the current status of the machine. Therefore analysts can better understand why the machine misclassifies objects. Moreover, the machine recommends the labels of classes for each attribute to ease the labeling burden. Finally, humans can steer the model by modifying the labels interactively, and the machine adjusts its recommendations. The visual explainable active learning approach improves humans' efficiency of building zero-shot classification models interactively, compared with the method without guidance. We justify our results with user studies using the standard benchmarks for zero-shot classification.

Viaarxiv icon

Learning Gender-Neutral Word Embeddings

Aug 29, 2018
Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, Kai-Wei Chang

Figure 1 for Learning Gender-Neutral Word Embeddings
Figure 2 for Learning Gender-Neutral Word Embeddings
Figure 3 for Learning Gender-Neutral Word Embeddings
Figure 4 for Learning Gender-Neutral Word Embeddings

Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications. However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs. To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings. Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence. Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe). Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model.

* EMNLP 2018 
Viaarxiv icon

Peeking the Impact of Points of Interests on Didi

Apr 06, 2018
Yonghong Tian, Zeyu Li, Zhiwei Xu, Xuying Meng, Bing Zheng

Figure 1 for Peeking the Impact of Points of Interests on Didi
Figure 2 for Peeking the Impact of Points of Interests on Didi
Figure 3 for Peeking the Impact of Points of Interests on Didi
Figure 4 for Peeking the Impact of Points of Interests on Didi

Recently, the online car-hailing service, Didi, has emerged as a leader in the sharing economy. Used by passengers and drivers extensive, it becomes increasingly important for the car-hailing service providers to minimize the waiting time of passengers and optimize the vehicle utilization, thus to improve the overall user experience. Therefore, the supply-demand estimation is an indispensable ingredient of an efficient online car-hailing service. To improve the accuracy of the estimation results, we analyze the implicit relationships between the points of Interest (POI) and the supply-demand gap in this paper. The different categories of POIs have positive or negative effects on the estimation, we propose a POI selection scheme and incorporate it into XGBoost [1] to achieve more accurate estimation results. Our experiment demonstrates our method provides more accurate estimation results and more stable estimation results than the existing methods.

Viaarxiv icon