Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

A Learned Performance Model for the Tensor Processing Unit

Aug 03, 2020
Samuel J. Kaufman, Phitchaya Mangpo Phothilimthana, Yanqi Zhou, Mike Burrows

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as an minimization objective, or by autotuners to find an optimal configuration of a specific program. However, they are difficult to develop because contemporary processors are complex, and the recent proliferation of deep learning accelerators has increased the development burden. We demonstrate a method of learning performance models from a corpus of tensor computation graph programs for the Tensor Processing Unit (TPU). We train a neural network over kernel-level sub-graphs from the corpus and find that the learned model is competitive to a heavily-optimized analytical cost model used in the production XLA compiler.

Share this with someone who'll enjoy it:

   Access Paper Source

Share this with someone who'll enjoy it: