Picture for Youshan Miao

Youshan Miao

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

Add code
Nov 26, 2023
Viaarxiv icon

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

Add code
May 31, 2023
Figure 1 for Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Figure 2 for Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Figure 3 for Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Figure 4 for Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Viaarxiv icon

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

Add code
Jan 21, 2023
Figure 1 for SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Figure 2 for SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Figure 3 for SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Figure 4 for SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Viaarxiv icon

Dense-to-Sparse Gate for Mixture-of-Experts

Add code
Dec 29, 2021
Figure 1 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 2 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 3 for Dense-to-Sparse Gate for Mixture-of-Experts
Figure 4 for Dense-to-Sparse Gate for Mixture-of-Experts
Viaarxiv icon

CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning

Add code
May 13, 2021
Figure 1 for CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Figure 2 for CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Figure 3 for CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Figure 4 for CoCoNet: Co-Optimizing Computation and Communication for Distributed Machine Learning
Viaarxiv icon

CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner

Add code
Mar 14, 2021
Figure 1 for CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner
Figure 2 for CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner
Viaarxiv icon

Architectural Implications of Graph Neural Networks

Add code
Sep 02, 2020
Figure 1 for Architectural Implications of Graph Neural Networks
Figure 2 for Architectural Implications of Graph Neural Networks
Figure 3 for Architectural Implications of Graph Neural Networks
Figure 4 for Architectural Implications of Graph Neural Networks
Viaarxiv icon

Towards Efficient Large-Scale Graph Neural Network Computing

Add code
Oct 19, 2018
Figure 1 for Towards Efficient Large-Scale Graph Neural Network Computing
Figure 2 for Towards Efficient Large-Scale Graph Neural Network Computing
Figure 3 for Towards Efficient Large-Scale Graph Neural Network Computing
Figure 4 for Towards Efficient Large-Scale Graph Neural Network Computing
Viaarxiv icon

RPC Considered Harmful: Fast Distributed Deep Learning on RDMA

Add code
May 22, 2018
Figure 1 for RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Figure 2 for RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Figure 3 for RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Figure 4 for RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Viaarxiv icon