Alert button
Picture for Aamir Shafi

Aamir Shafi

Alert button

DK

The Case for Co-Designing Model Architectures with Hardware

Jan 30, 2024
Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

Viaarxiv icon

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Jan 17, 2024
Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Viaarxiv icon

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

May 24, 2023
Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda

Figure 1 for Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Figure 2 for Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Figure 3 for Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Figure 4 for Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference
Viaarxiv icon

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Mar 15, 2023
Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda

Figure 1 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 2 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 3 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Figure 4 for MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
Viaarxiv icon

Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version

Mar 09, 2023
Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K., Panda

Figure 1 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 2 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 3 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Figure 4 for Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version
Viaarxiv icon

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems

Oct 20, 2021
Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K Panda

Figure 1 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 2 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 3 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Figure 4 for OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems
Viaarxiv icon

Efficient MPI-based Communication for GPU-Accelerated Dask Applications

Jan 21, 2021
Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda

Figure 1 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 2 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 3 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Figure 4 for Efficient MPI-based Communication for GPU-Accelerated Dask Applications
Viaarxiv icon