Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

Hierarchical Roofline Performance Analysis for Deep Learning Applications

Sep 11, 2020
Yunsong Wang, Charlene Yang, Steven Farrell, Thorsten Kurth, Samuel Williams

Share this with someone who'll enjoy it:

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for more data precision support and Tensor Core support and introduces an Nsight Compute based method to accurately collect application performance information. This methodology allows for automated machine characterization and application characterization for Roofline analysis across the entire memory hierarchy on NVIDIA GPUs, and it is validated by a complex deep learning application used for climate image segmentation. We will use two versions of the code, in TensorFlow and PyTorch respectively, to demonstrate the use and effectiveness of this methodology, and some insights will be highlighted on how the application utilizes the compute and memory capabilities on the GPU and how the implementation and performance differs in two deep learning frameworks.

* 8 pages 

   Access Paper Source

Share this with someone who'll enjoy it: