Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zheng Chai

LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

May 07, 2025

Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu(+7 more)

Abstract:Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.

Via

Access Paper or Ask Questions

Large Memory Network for Recommendation

Feb 08, 2025

Hui Lu, Zheng Chai, Yuchao Zheng, Zhe Chen, Deping Xie, Peng Xu, Xun Zhou

Figure 1 for Large Memory Network for Recommendation

Figure 2 for Large Memory Network for Recommendation

Figure 3 for Large Memory Network for Recommendation

Figure 4 for Large Memory Network for Recommendation

Abstract:Modeling user behavior sequences in recommender systems is essential for understanding user preferences over time, enabling personalized and accurate recommendations for improving user retention and enhancing business values. Despite its significance, there are two challenges for current sequential modeling approaches. From the spatial dimension, it is difficult to mutually perceive similar users' interests for a generalized intention understanding; from the temporal dimension, current methods are generally prone to forgetting long-term interests due to the fixed-length input sequence. In this paper, we present Large Memory Network (LMN), providing a novel idea by compressing and storing user history behavior information in a large-scale memory block. With the elaborated online deployment strategy, the memory block can be easily scaled up to million-scale in the industry. Extensive offline comparison experiments, memory scaling up experiments, and online A/B test on Douyin E-Commerce Search (ECS) are performed, validating the superior performance of LMN. Currently, LMN has been fully deployed in Douyin ECS, serving millions of users each day.

* WWW 2025

Via

Access Paper or Ask Questions

Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

Feb 08, 2025

Zheng Chai, Hui Lu, Di Chen, Qin Ren, Xun Zhou

Figure 1 for Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

Figure 2 for Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

Figure 3 for Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

Figure 4 for Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

Abstract:Users generally exhibit complex behavioral patterns and diverse intentions in multiple business scenarios of super applications like Douyin, presenting great challenges to current industrial multi-domain recommenders. To mitigate the discrepancies across diverse domains, researches and industrial practices generally emphasize sophisticated network structures to accomodate diverse data distributions, while neglecting the inherent understanding of user behavioral sequence from the multi-domain perspective. In this paper, we present Adaptive Domain Scaling (ADS) model, which comprehensively enhances the personalization capability in target-aware sequence modeling across multiple domains. Specifically, ADS comprises of two major modules, including personalized sequence representation generation (PSRG) and personalized candidate representation generation (PCRG). The modules contribute to the tailored multi-domain learning by dynamically learning both the user behavioral sequence item representation and the candidate target item representation under different domains, facilitating adaptive user intention understanding. Experiments are performed on both a public dataset and two billion-scaled industrial datasets, and the extensive results verify the high effectiveness and compatibility of ADS. Besides, we conduct online experiments on two influential business scenarios including Douyin Advertisement Platform and Douyin E-commerce Service Platform, both of which show substantial business improvements. Currently, ADS has been fully deployed in many recommendation services at ByteDance, serving billions of users.

Via

Access Paper or Ask Questions

Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models

Jan 04, 2024

Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang(+3 more)

Abstract:The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated models like OpenAI's ChatGPT, represents a significant advancement in artificial intelligence. These models, however, bring forth substantial challenges in the high consumption of computational, memory, energy, and financial resources, especially in environments with limited resource capabilities. This survey aims to systematically address these challenges by reviewing a broad spectrum of techniques designed to enhance the resource efficiency of LLMs. We categorize methods based on their optimization focus: computational, memory, energy, financial, and network resources and their applicability across various stages of an LLM's lifecycle, including architecture design, pretraining, finetuning, and system design. Additionally, the survey introduces a nuanced categorization of resource efficiency techniques by their specific resource types, which uncovers the intricate relationships and mappings between various resources and corresponding optimization techniques. A standardized set of evaluation metrics and datasets is also presented to facilitate consistent and fair comparisons across different models and techniques. By offering a comprehensive overview of the current sota and identifying open research avenues, this survey serves as a foundational reference for researchers and practitioners, aiding them in developing more sustainable and efficient LLMs in a rapidly evolving landscape.

* Preprint. GitHub repo: https://github.com/tiingweii-shii/Awesome-Resource-Efficient-LLM-Papers

Via

Access Paper or Ask Questions

Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Aug 25, 2023

Guangji Bai, Ziyang Yu, Zheng Chai, Yue Cheng, Liang Zhao

Figure 1 for Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Figure 2 for Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Figure 3 for Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Figure 4 for Staleness-Alleviated Distributed GNN Training via Online Dynamic-Embedding Prediction

Abstract:Despite the recent success of Graph Neural Networks (GNNs), it remains challenging to train GNNs on large-scale graphs due to neighbor explosions. As a remedy, distributed computing becomes a promising solution by leveraging abundant computing resources (e.g., GPU). However, the node dependency of graph data increases the difficulty of achieving high concurrency in distributed GNN training, which suffers from the massive communication overhead. To address it, Historical value approximation is deemed a promising class of distributed training techniques. It utilizes an offline memory to cache historical information (e.g., node embedding) as an affordable approximation of the exact value and achieves high concurrency. However, such benefits come at the cost of involving dated training information, leading to staleness, imprecision, and convergence issues. To overcome these challenges, this paper proposes SAT (Staleness-Alleviated Training), a novel and scalable distributed GNN training framework that reduces the embedding staleness adaptively. The key idea of SAT is to model the GNN's embedding evolution as a temporal graph and build a model upon it to predict future embedding, which effectively alleviates the staleness of the cached historical embedding. We propose an online algorithm to train the embedding predictor and the distributed GNN alternatively and further provide a convergence analysis. Empirically, we demonstrate that SAT can effectively reduce embedding staleness and thus achieve better performance and convergence speed on multiple large-scale graph datasets.

* Preprint. Do not distribute. arXiv admin note: text overlap with arXiv:2206.00057

Via

Access Paper or Ask Questions

Distributed Graph Neural Network Training with Periodic Historical Embedding Synchronization

May 31, 2022

Zheng Chai, Guangji Bai, Liang Zhao, Yue Cheng

Figure 1 for Distributed Graph Neural Network Training with Periodic Historical Embedding Synchronization

Figure 2 for Distributed Graph Neural Network Training with Periodic Historical Embedding Synchronization

Figure 3 for Distributed Graph Neural Network Training with Periodic Historical Embedding Synchronization

Figure 4 for Distributed Graph Neural Network Training with Periodic Historical Embedding Synchronization

Abstract:Despite the recent success of Graph Neural Networks (GNNs), it remains challenging to train a GNN on large graphs, which are prevalent in various applications such as social network, recommender systems, and knowledge graphs. Traditional sampling-based methods accelerate GNN by dropping edges and nodes, which impairs the graph integrity and model performance. Differently, distributed GNN algorithms, which accelerate GNN training by utilizing multiple computing devices, can be classified into two types: "partition-based" methods enjoy low communication costs but suffer from information loss due to dropped edges, while "propagation-based" methods avoid information loss but suffer prohibitive communication overhead. To jointly address these problems, this paper proposes DIstributed Graph Embedding SynchronizaTion (DIGEST), a novel distributed GNN training framework that synergizes the complementary strength of both categories of existing methods. During subgraph parallel training, we propose to let each device store the historical embedding of its neighbors in other subgraphs. Therefore, our method does not discard any neighbors in other subgraphs, nor does it updates them intensively. This effectively avoids (1) the intensive computation on explosively-increasing neighbors and (2) excessive communications across different devices. We proved that the approximation error induced by the staleness of historical embedding can be upper bounded and it does NOT affect the GNN model's expressiveness. More importantly, our convergence analysis demonstrates that DIGEST enjoys a state-of-the-art convergence rate. Extensive experimental evaluation on large, real-world graph datasets shows that DIGEST achieves up to $21.82\times$ speedup without compromising the performance compared to state-of-the-art distributed GNN training frameworks.

* Preprint: 18 pages, 7 figures

Via

Access Paper or Ask Questions

LOF: Structure-Aware Line Tracking based on Optical Flow

Sep 17, 2021

Meixiang Quan, Zheng Chai, Xiao Liu

Figure 1 for LOF: Structure-Aware Line Tracking based on Optical Flow

Figure 2 for LOF: Structure-Aware Line Tracking based on Optical Flow

Figure 3 for LOF: Structure-Aware Line Tracking based on Optical Flow

Figure 4 for LOF: Structure-Aware Line Tracking based on Optical Flow

Abstract:Lines provide the significantly richer geometric structural information about the environment than points, so lines are widely used in recent Visual Odometry (VO) works. Since VO with lines use line tracking results to locate and map, line tracking is a crucial component in VO. Although the state-of-the-art line tracking methods have made great progress, they are still heavily dependent on line detection or the predicted line segments. In order to relieve the dependencies described above to track line segments completely, accurately, and robustly at higher computational efficiency, we propose a structure-aware Line tracking algorithm based entirely on Optical Flow (LOF). Firstly, we propose a gradient-based strategy to sample pixels on lines that are suitable for line optical flow calculation. Then, in order to align the lines by fully using the structural relationship between the sampled points on it and effectively removing the influence of sampled points on it occluded by other objects, we propose a two-step structure-aware line segment alignment method. Furthermore, we propose a line refinement method to refine the orientation, position, and endpoints of the aligned line segments. Extensive experimental results demonstrate that the proposed LOF outperforms the state-of-the-art performance in line tracking accuracy, robustness, and efficiency, which also improves the location accuracy and robustness of VO system with lines.

* 7 pages, 4 figures

Via

Access Paper or Ask Questions

Asynchronous Federated Learning for Sensor Data with Concept Drift

Sep 01, 2021

Yujing Chen, Zheng Chai, Yue Cheng, Huzefa Rangwala

Figure 1 for Asynchronous Federated Learning for Sensor Data with Concept Drift

Figure 2 for Asynchronous Federated Learning for Sensor Data with Concept Drift

Figure 3 for Asynchronous Federated Learning for Sensor Data with Concept Drift

Figure 4 for Asynchronous Federated Learning for Sensor Data with Concept Drift

Abstract:Federated learning (FL) involves multiple distributed devices jointly training a shared model without any of the participants having to reveal their local data to a centralized server. Most of previous FL approaches assume that data on devices are fixed and stationary during the training process. However, this assumption is unrealistic because these devices usually have varying sampling rates and different system configurations. In addition, the underlying distribution of the device data can change dynamically over time, which is known as concept drift. Concept drift makes the learning process complicated because of the inconsistency between existing and upcoming data. Traditional concept drift handling techniques such as chunk based and ensemble learning-based methods are not suitable in the federated learning frameworks due to the heterogeneity of local devices. We propose a novel approach, FedConD, to detect and deal with the concept drift on local devices and minimize the effect on the performance of models in asynchronous FL. The drift detection strategy is based on an adaptive mechanism which uses the historical performance of the local models. The drift adaptation is realized by adjusting the regularization parameter of objective function on each local device. Additionally, we design a communication strategy on the server side to select local updates in a prudent fashion and speed up model convergence. Experimental evaluations on three evolving data streams and two image datasets show that \model~detects and handles concept drift, and also reduces the overall communication cost compared to other baseline methods.

Via

Access Paper or Ask Questions

Method Towards CVPR 2021 Image Matching Challenge

Aug 11, 2021

Xiaopeng Bi, Yu Chen, Xinyang Liu, Dehao Zhang, Ran Yan, Zheng Chai, Haotian Zhang, Xiao Liu

Figure 1 for Method Towards CVPR 2021 Image Matching Challenge

Figure 2 for Method Towards CVPR 2021 Image Matching Challenge

Abstract:This report describes Megvii-3D team's approach towards CVPR 2021 Image Matching Workshop.

Via

Access Paper or Ask Questions

Method Towards CVPR 2021 SimLocMatch Challenge

Aug 11, 2021

Xiaopeng Bi, Ran Yan, Zheng Chai, Haotian Zhang, Xiao Liu

Figure 1 for Method Towards CVPR 2021 SimLocMatch Challenge

Abstract:This report describes Megvii-3D team's approach towards SimLocMatch Challenge @ CVPR 2021 Image Matching Workshop.

Via

Access Paper or Ask Questions