Picture for Saeed Rashidi

Saeed Rashidi

FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Add code
Jun 28, 2024
Viaarxiv icon

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Add code
May 26, 2023
Viaarxiv icon

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Add code
Mar 24, 2023
Viaarxiv icon

COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training

Add code
Nov 30, 2022
Viaarxiv icon

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

Add code
Jul 22, 2022
Figure 1 for Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Figure 2 for Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Figure 3 for Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Figure 4 for Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Viaarxiv icon

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

Add code
Oct 09, 2021
Figure 1 for Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Figure 2 for Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Figure 3 for Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Figure 4 for Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Viaarxiv icon

Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models

Add code
Sep 24, 2021
Figure 1 for Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Figure 2 for Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Figure 3 for Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Figure 4 for Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Viaarxiv icon

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

Add code
Aug 19, 2020
Figure 1 for Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Figure 2 for Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Figure 3 for Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Figure 4 for Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Viaarxiv icon