Alert button
Picture for Evangelos Georganas

Evangelos Georganas

Alert button

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

Add code
Bookmark button
Alert button
Apr 25, 2023
Evangelos Georganas, Dhiraj Kalamkar, Kirill Voronin, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke

Figure 1 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 2 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 3 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 4 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Viaarxiv icon

FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems

Add code
Bookmark button
Alert button
Apr 22, 2022
Rui Ma, Evangelos Georganas, Alexander Heinecke, Andrew Boutros, Eriko Nurvitadhi

Figure 1 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 2 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 3 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 4 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Viaarxiv icon

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Add code
Bookmark button
Alert button
Apr 16, 2021
Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj Kalamkar, Nesreen K. Ahmed, Sasikanth Avancha

Figure 1 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 2 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 3 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 4 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Viaarxiv icon

Efficient and Generic 1D Dilated Convolution Layer for Deep Learning

Add code
Bookmark button
Alert button
Apr 16, 2021
Narendra Chaudhary, Sanchit Misra, Dhiraj Kalamkar, Alexander Heinecke, Evangelos Georganas, Barukh Ziv, Menachem Adelman, Bharat Kaul

Figure 1 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 2 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 3 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 4 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Viaarxiv icon

Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads

Add code
Bookmark button
Alert button
Apr 14, 2021
Evangelos Georganas, Dhiraj Kalamkar, Sasikanth Avancha, Menachem Adelman, Cristina Anderson, Alexander Breuer, Narendra Chaudhary, Abhisek Kundu, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst, Barukh Ziv, Alexander Heinecke

Figure 1 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 2 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 3 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 4 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Viaarxiv icon

Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures

Add code
Bookmark button
Alert button
May 10, 2020
Dhiraj Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, Alexander Heinecke

Figure 1 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 2 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 3 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 4 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Viaarxiv icon

High-Performance Deep Learning via a Single Building Block

Add code
Bookmark button
Alert button
Jun 18, 2019
Evangelos Georganas, Kunal Banerjee, Dhiraj Kalamkar, Sasikanth Avancha, Anand Venkat, Michael Anderson, Greg Henry, Hans Pabst, Alexander Heinecke

Figure 1 for High-Performance Deep Learning via a Single Building Block
Figure 2 for High-Performance Deep Learning via a Single Building Block
Figure 3 for High-Performance Deep Learning via a Single Building Block
Figure 4 for High-Performance Deep Learning via a Single Building Block
Viaarxiv icon

A Study of BFLOAT16 for Deep Learning Training

Add code
Bookmark button
Alert button
Jun 13, 2019
Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey

Figure 1 for A Study of BFLOAT16 for Deep Learning Training
Figure 2 for A Study of BFLOAT16 for Deep Learning Training
Figure 3 for A Study of BFLOAT16 for Deep Learning Training
Figure 4 for A Study of BFLOAT16 for Deep Learning Training
Viaarxiv icon