Alert button
Picture for Chonglin Sun

Chonglin Sun

Alert button

High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models

Apr 15, 2021
Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, KR Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, Vijay Rao

Figure 1 for High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models
Figure 2 for High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models
Figure 3 for High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models
Figure 4 for High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models

Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) developing sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments.

Viaarxiv icon

Time-based Sequence Model for Personalization and Recommendation Systems

Aug 27, 2020
Tigran Ishkhanov, Maxim Naumov, Xianjie Chen, Yan Zhu, Yuan Zhong, Alisson Gusatti Azzolini, Chonglin Sun, Frank Jiang, Andrey Malevich, Liang Xiong

Figure 1 for Time-based Sequence Model for Personalization and Recommendation Systems
Figure 2 for Time-based Sequence Model for Personalization and Recommendation Systems
Figure 3 for Time-based Sequence Model for Personalization and Recommendation Systems
Figure 4 for Time-based Sequence Model for Personalization and Recommendation Systems

In this paper we develop a novel recommendation model that explicitly incorporates time information. The model relies on an embedding layer and TSL attention-like mechanism with inner products in different vector spaces, that can be thought of as a modification of multi-headed attention. This mechanism allows the model to efficiently treat sequences of user behavior of different length. We study the properties of our state-of-the-art model on statistically designed data set. Also, we show that it outperforms more complex models with longer sequence length on the Taobao User Behavior dataset.

* 17 pages, 7 figures 
Viaarxiv icon

Category Enhanced Word Embedding

Nov 30, 2015
Chunting Zhou, Chonglin Sun, Zhiyuan Liu, Francis C. M. Lau

Figure 1 for Category Enhanced Word Embedding
Figure 2 for Category Enhanced Word Embedding
Figure 3 for Category Enhanced Word Embedding
Figure 4 for Category Enhanced Word Embedding

Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words that present similar co-occurrence statistics. Besides local occurrence statistics, global topical information is also important knowledge that may help discriminate a word from another. In this paper, we incorporate category information of documents in the learning of word representations and to learn the proposed models in a document-wise manner. Our models outperform several state-of-the-art models in word analogy and word similarity tasks. Moreover, we evaluate the learned word vectors on sentiment analysis and text classification tasks, which shows the superiority of our learned word vectors. We also learn high-quality category embeddings that reflect topical meanings.

Viaarxiv icon

A C-LSTM Neural Network for Text Classification

Nov 30, 2015
Chunting Zhou, Chonglin Sun, Zhiyuan Liu, Francis C. M. Lau

Figure 1 for A C-LSTM Neural Network for Text Classification
Figure 2 for A C-LSTM Neural Network for Text Classification
Figure 3 for A C-LSTM Neural Network for Text Classification
Figure 4 for A C-LSTM Neural Network for Text Classification

Neural network models have been demonstrated to be capable of achieving remarkable performance in sentence and document modeling. Convolutional neural network (CNN) and recurrent neural network (RNN) are two mainstream architectures for such modeling tasks, which adopt totally different ways of understanding natural languages. In this work, we combine the strengths of both architectures and propose a novel and unified model called C-LSTM for sentence representation and text classification. C-LSTM utilizes CNN to extract a sequence of higher-level phrase representations, and are fed into a long short-term memory recurrent neural network (LSTM) to obtain the sentence representation. C-LSTM is able to capture both local features of phrases as well as global and temporal sentence semantics. We evaluate the proposed architecture on sentiment classification and question classification tasks. The experimental results show that the C-LSTM outperforms both CNN and LSTM and can achieve excellent performance on these tasks.

Viaarxiv icon