Alert button
Picture for Jiarui Fang

Jiarui Fang

Alert button

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

Jan 19, 2024
Xuanlei Zhao, Shenggan Cheng, Guangyang Lu, Jiarui Fang, Haotian Zhou, Bin Jia, Ziming Liu, Yang You

Viaarxiv icon

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

Feb 22, 2023
Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

Figure 1 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 2 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 3 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 4 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Viaarxiv icon

MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models

Feb 06, 2023
Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

Figure 1 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 2 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 3 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 4 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Viaarxiv icon

Elixir: Train a Large Language Model on a Small GPU Cluster

Dec 10, 2022
Haichen Huang, Jiarui Fang, Hongxin Liu, Shenggui Li, Yang You

Figure 1 for Elixir: Train a Large Language Model on a Small GPU Cluster
Figure 2 for Elixir: Train a Large Language Model on a Small GPU Cluster
Figure 3 for Elixir: Train a Large Language Model on a Small GPU Cluster
Figure 4 for Elixir: Train a Large Language Model on a Small GPU Cluster
Viaarxiv icon

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

Sep 06, 2022
Jiangsu Du, Ziming Liu, Jiarui Fang, Shenggui Li, Yongbin Li, Yutong Lu, Yang You

Figure 1 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 2 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 3 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 4 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Viaarxiv icon

A Frequency-aware Software Cache for Large Recommendation System Embeddings

Aug 08, 2022
Jiarui Fang, Geng Zhang, Jiatong Han, Shenggui Li, Zhengda Bian, Yongbin Li, Jin Liu, Yang You

Figure 1 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 2 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 3 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 4 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Viaarxiv icon

PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management

Aug 12, 2021
Jiarui Fang, Yang Yu, Shenggui Li, Yang You, Jie Zhou

Figure 1 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 2 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 3 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 4 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Viaarxiv icon

TurboTransformers: An Efficient GPU Serving System For Transformer Models

Oct 09, 2020
Jiarui Fang, Yang Yu, Chengduo Zhao, Jie Zhou

Figure 1 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 2 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 3 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 4 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Viaarxiv icon

swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

Mar 16, 2019
Jiarui Fang, Liandeng Li, Haohuan Fu, Jinlei Jiang, Wenlai Zhao, Conghui He, Xin You, Guangwen Yang

Figure 1 for swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Figure 2 for swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Figure 3 for swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Figure 4 for swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Viaarxiv icon

RedSync : Reducing Synchronization Traffic for Distributed Deep Learning

Aug 13, 2018
Jiarui Fang, Haohuan Fu, Guangwen Yang, Cho-Jui Hsieh

Figure 1 for RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Figure 2 for RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Figure 3 for RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Figure 4 for RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Viaarxiv icon