Picture for Dhiraj Kalamkar

Dhiraj Kalamkar

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

Add code
Apr 25, 2023
Figure 1 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 2 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 3 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 4 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Viaarxiv icon

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Add code
Apr 16, 2021
Figure 1 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 2 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 3 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 4 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Viaarxiv icon

Efficient and Generic 1D Dilated Convolution Layer for Deep Learning

Add code
Apr 16, 2021
Figure 1 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 2 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 3 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 4 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Viaarxiv icon

Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads

Add code
Apr 14, 2021
Figure 1 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 2 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 3 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 4 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Viaarxiv icon

Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures

Add code
May 10, 2020
Figure 1 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 2 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 3 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 4 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Viaarxiv icon

K-TanH: Hardware Efficient Activations For Deep Learning

Add code
Oct 21, 2019
Figure 1 for K-TanH: Hardware Efficient Activations For Deep Learning
Figure 2 for K-TanH: Hardware Efficient Activations For Deep Learning
Figure 3 for K-TanH: Hardware Efficient Activations For Deep Learning
Figure 4 for K-TanH: Hardware Efficient Activations For Deep Learning
Viaarxiv icon

High-Performance Deep Learning via a Single Building Block

Add code
Jun 18, 2019
Figure 1 for High-Performance Deep Learning via a Single Building Block
Figure 2 for High-Performance Deep Learning via a Single Building Block
Figure 3 for High-Performance Deep Learning via a Single Building Block
Figure 4 for High-Performance Deep Learning via a Single Building Block
Viaarxiv icon

A Study of BFLOAT16 for Deep Learning Training

Add code
Jun 13, 2019
Figure 1 for A Study of BFLOAT16 for Deep Learning Training
Figure 2 for A Study of BFLOAT16 for Deep Learning Training
Figure 3 for A Study of BFLOAT16 for Deep Learning Training
Figure 4 for A Study of BFLOAT16 for Deep Learning Training
Viaarxiv icon

Mixed Precision Training of Convolutional Neural Networks using Integer Operations

Add code
Feb 23, 2018
Figure 1 for Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Figure 2 for Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Figure 3 for Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Figure 4 for Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Viaarxiv icon

On Scale-out Deep Learning Training for Cloud and HPC

Add code
Jan 24, 2018
Figure 1 for On Scale-out Deep Learning Training for Cloud and HPC
Figure 2 for On Scale-out Deep Learning Training for Cloud and HPC
Viaarxiv icon