Optimizing the convolution operation to accelerate deep neural networks on FPGA

V. Saravanan; R.Susmitha

Submitted

Apr 15, 2022

Published

Jun 30, 2019

Download

PDF

Statistic

Read Counter : 10 Download : 5

Abstract

As convolution contributes most operations in convolutional neural network (CNN), the convolution acceleration scheme significantly affects the efficiency and performance of hardware CNN accelerator. Convolution involves multiply and accumulates operation with four levels of loops, which results in a large design space. Prior works either employ limited loop optimization techniques ,e.g, loop unrolling, tiling, and interchange, or only tune some of the design variables after the accelerator architecture and dataflow are already fixed .Without fully studying the convolution loop optimization before the hardware design phase ,the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. To overcome these barriers by quantitatively analyzing and optimizing the design objectives (e.g. Memory access) of the CNN accelerator based on multiple design variables. The presented CNN acceleration scheme and architecture are demonstrated by implementing end-to-end CNNs including NiN, VGG-16, and ResNet-50/ResNet-152 for interference. For VGG-16 CNN, overall throughputs348 GOPS and 715 GOPS on Intel StatrixV and Arria 10 Fpgas, respectively.

How to Cite

V. Saravanan, & R.Susmitha. (2019). Optimizing the convolution operation to accelerate deep neural networks on FPGA . International Journal of Intellectual Advancements and Research in Engineering Computations, 7(2), 2985–2994. Retrieved from https://ijiarec.com/ijiarec/article/view/1211

Download Citation

Optimizing the convolution operation to accelerate deep neural networks on FPGA

Article Sidebar

Main Article Content

Abstract

Article Details