Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

Sep 26, 2019

Tian Zhao, Yaqi Zhang, Kunle Olukotun

Figure 1 for Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

Figure 2 for Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

Figure 3 for Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

Figure 4 for Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

Share this with someone who'll enjoy it:

Abstract:Recurrent Neural Network (RNN) applications form a major class of AI-powered, low-latency data center workloads. Most execution models for RNN acceleration break computation graphs into BLAS kernels, which lead to significant inter-kernel data movement and resource underutilization. We show that by supporting more general loop constructs that capture design parameters in accelerators, it is possible to improve resource utilization using cross-kernel optimization without sacrificing programmability. Such abstraction level enables a design space search that can lead to efficient usage of on-chip resources on a spatial architecture across a range of problem sizes. We evaluate our optimization strategy on such abstraction with DeepBench using a configurable spatial accelerator. We demonstrate that this implementation provides a geometric speedup of 30x in performance, 1.6x in area, and 2x in power efficiency compared to a Tesla V100 GPU, and a geometric speedup of 2x compared to Microsoft Brainwave implementation on a Stratix 10 FPGA.

View paper on

Share this with someone who'll enjoy it:

Title:Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

Paper and Code