An OpenCL Framework for Real-time Inference of Next-generation Convolutional Neural Networks on FPGAs
Author | : Sachin Kumawat |
Publisher | : |
Total Pages | : |
Release | : 2017 |
ISBN-10 | : 0355764415 |
ISBN-13 | : 9780355764413 |
Rating | : 4/5 (413 Downloads) |
Download or read book An OpenCL Framework for Real-time Inference of Next-generation Convolutional Neural Networks on FPGAs written by Sachin Kumawat and published by . This book was released on 2017 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Modern Convolutional Neural Networks (CNNs) consist of billions of multiplications and additions which require the use of parallel computing units such as GPUs, FPGAs and other DSP processors. Consequently, General Purpose GPU (GPGPU) computing has taken this field by storm. At the same time, there has been an increased interest in FPGA based acceleration of CNN inference. In this work, we present FICaffe, a framework for FPGA-based Inference with Caffe, which provides a complete automated generation and mapping of CNN accelerators on FPGAs. We target applications with critical latency requirements and design high processing efficiency accelerators for CNNs. The architecture is structured in a highly concurrent OpenCL library, which enables High Level Synthesis tools to effectively exploit data, task and pipeline parallelism. We propose a unified memory model, that drives exploration of optimal design by matching on-chip and off-chip memory bandwidths available on FPGA platforms. We also identify origins of all clock cycle stalls and overheads inherent to CNN acceleration designs and provide a detailed model to accurately predict the runtime latency with less than 4% error against on-board tests. Furthermore, with FICaffe we provide support for cross-network synthesis, such that it is possible to processes a variety of CNNs, with reasonable efficiency, without long re-compilation hours. FICaffe is integrated with the popular deep learning framework Caffe, and is deployable to a wide variety of CNNs. FICaffe's efficacy is shown by mapping to a 28nm Stratix V GXA7 chip, and both network specific and cross-network performance are reported for AlexNet, VGG, SqueezeNet and GoogLeNet. We show a processing efficiency of 95.8% for the widely-reported VGG benchmark, which outperforms prior work. FICaffe also achieves more than 2X speedup on Stratix V GXA7 compared with the best published results on this chip, to the best of our knowledge.