Picture for Alexander Heinecke

Alexander Heinecke

Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures

Add code
May 10, 2020
Figure 1 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 2 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 3 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Figure 4 for Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures
Viaarxiv icon

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

Add code
Feb 06, 2020
Figure 1 for PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives
Figure 2 for PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives
Figure 3 for PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives
Figure 4 for PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives
Viaarxiv icon

Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)

Add code
Nov 05, 2019
Figure 1 for Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)
Figure 2 for Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)
Viaarxiv icon

High-Performance Deep Learning via a Single Building Block

Add code
Jun 18, 2019
Figure 1 for High-Performance Deep Learning via a Single Building Block
Figure 2 for High-Performance Deep Learning via a Single Building Block
Figure 3 for High-Performance Deep Learning via a Single Building Block
Figure 4 for High-Performance Deep Learning via a Single Building Block
Viaarxiv icon

A Study of BFLOAT16 for Deep Learning Training

Add code
Jun 13, 2019
Figure 1 for A Study of BFLOAT16 for Deep Learning Training
Figure 2 for A Study of BFLOAT16 for Deep Learning Training
Figure 3 for A Study of BFLOAT16 for Deep Learning Training
Figure 4 for A Study of BFLOAT16 for Deep Learning Training
Viaarxiv icon

ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler

Add code
Oct 12, 2018
Figure 1 for ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Figure 2 for ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Figure 3 for ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Figure 4 for ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler
Viaarxiv icon

Mixed Precision Training of Convolutional Neural Networks using Integer Operations

Add code
Feb 23, 2018
Figure 1 for Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Figure 2 for Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Figure 3 for Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Figure 4 for Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Viaarxiv icon