Alert button
Picture for Jiarui Fang

Jiarui Fang

Alert button

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

Add code
Bookmark button
Alert button
Jan 19, 2024
Xuanlei Zhao, Shenggan Cheng, Guangyang Lu, Jiarui Fang, Haotian Zhou, Bin Jia, Ziming Liu, Yang You

Viaarxiv icon

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

Add code
Bookmark button
Alert button
Feb 22, 2023
Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

Figure 1 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 2 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 3 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Figure 4 for Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Viaarxiv icon

MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models

Add code
Bookmark button
Alert button
Feb 06, 2023
Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

Figure 1 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 2 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 3 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Figure 4 for MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models
Viaarxiv icon

Elixir: Train a Large Language Model on a Small GPU Cluster

Add code
Bookmark button
Alert button
Dec 10, 2022
Haichen Huang, Jiarui Fang, Hongxin Liu, Shenggui Li, Yang You

Figure 1 for Elixir: Train a Large Language Model on a Small GPU Cluster
Figure 2 for Elixir: Train a Large Language Model on a Small GPU Cluster
Figure 3 for Elixir: Train a Large Language Model on a Small GPU Cluster
Figure 4 for Elixir: Train a Large Language Model on a Small GPU Cluster
Viaarxiv icon

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

Add code
Bookmark button
Alert button
Sep 06, 2022
Jiangsu Du, Ziming Liu, Jiarui Fang, Shenggui Li, Yongbin Li, Yutong Lu, Yang You

Figure 1 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 2 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 3 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Figure 4 for EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Viaarxiv icon

A Frequency-aware Software Cache for Large Recommendation System Embeddings

Add code
Bookmark button
Alert button
Aug 08, 2022
Jiarui Fang, Geng Zhang, Jiatong Han, Shenggui Li, Zhengda Bian, Yongbin Li, Jin Liu, Yang You

Figure 1 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 2 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 3 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Figure 4 for A Frequency-aware Software Cache for Large Recommendation System Embeddings
Viaarxiv icon

PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management

Add code
Bookmark button
Alert button
Aug 12, 2021
Jiarui Fang, Yang Yu, Shenggui Li, Yang You, Jie Zhou

Figure 1 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 2 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 3 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Figure 4 for PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management
Viaarxiv icon

TurboTransformers: An Efficient GPU Serving System For Transformer Models

Add code
Bookmark button
Alert button
Oct 09, 2020
Jiarui Fang, Yang Yu, Chengduo Zhao, Jie Zhou

Figure 1 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 2 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 3 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Figure 4 for TurboTransformers: An Efficient GPU Serving System For Transformer Models
Viaarxiv icon

swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

Add code
Bookmark button
Alert button
Mar 16, 2019
Jiarui Fang, Liandeng Li, Haohuan Fu, Jinlei Jiang, Wenlai Zhao, Conghui He, Xin You, Guangwen Yang

Figure 1 for swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Figure 2 for swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Figure 3 for swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Figure 4 for swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight
Viaarxiv icon

RedSync : Reducing Synchronization Traffic for Distributed Deep Learning

Add code
Bookmark button
Alert button
Aug 13, 2018
Jiarui Fang, Haohuan Fu, Guangwen Yang, Cho-Jui Hsieh

Figure 1 for RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Figure 2 for RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Figure 3 for RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Figure 4 for RedSync : Reducing Synchronization Traffic for Distributed Deep Learning
Viaarxiv icon