Picture for Alexander Heinecke

Alexander Heinecke

Towards a high-performance AI compiler with upstream MLIR

Add code
Apr 15, 2024
Viaarxiv icon

Microscaling Data Formats for Deep Learning

Add code
Oct 19, 2023
Figure 1 for Microscaling Data Formats for Deep Learning
Figure 2 for Microscaling Data Formats for Deep Learning
Figure 3 for Microscaling Data Formats for Deep Learning
Figure 4 for Microscaling Data Formats for Deep Learning
Viaarxiv icon

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

Apr 25, 2023
Figure 1 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 2 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 3 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Figure 4 for Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
Viaarxiv icon

FP8 Formats for Deep Learning

Sep 12, 2022
Figure 1 for FP8 Formats for Deep Learning
Figure 2 for FP8 Formats for Deep Learning
Figure 3 for FP8 Formats for Deep Learning
Figure 4 for FP8 Formats for Deep Learning
Viaarxiv icon

FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems

Apr 22, 2022
Figure 1 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 2 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 3 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Figure 4 for FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems
Viaarxiv icon

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

Apr 16, 2021
Figure 1 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 2 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 3 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Figure 4 for DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks
Viaarxiv icon

Efficient and Generic 1D Dilated Convolution Layer for Deep Learning

Add code
Apr 16, 2021
Figure 1 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 2 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 3 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Figure 4 for Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Viaarxiv icon

Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads

Apr 14, 2021
Figure 1 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 2 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 3 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Figure 4 for Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Viaarxiv icon

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

Jun 02, 2020
Figure 1 for PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives
Figure 2 for PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives
Figure 3 for PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives
Figure 4 for PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives
Viaarxiv icon

Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures

Add code
May 10, 2020
Figure 1 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 2 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 3 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 4 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Viaarxiv icon