A data-reuse aware accelerator for large-scale convolutional networks

M.C.J. Peemen, B. Mesman, H. Corporaal

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademic

178 Downloads (Pure)


This paper presents a clustered SIMD accelerator template for Convolutional Networks. These networks significantly outperform other methods in detection and classification tasks in the vision domain. Due to the excessive compute and data transfer requirements these applications benefit a lot from a dedicated accelerator. The proposed accelerator reduces memory traffic by loop transformations such as tiling and fusion to merge successive layers. Although fusion can introduce redundant computations it often reduces the data transfer, and therefore can remove performance bottlenecks. The SIMD cluster is mapped to a Xilinx Zynq FPGA, which can achieve 6.4 Gops performance with a small amount of resources. The performance can be scaled by using multiple clusters.
Original languageEnglish
Title of host publicationWorkshop on Neuromorphic Architectures (NeuroArch), 14 June 2014, Minneapolis, Minnesota
Publication statusPublished - 2014


Dive into the research topics of 'A data-reuse aware accelerator for large-scale convolutional networks'. Together they form a unique fingerprint.

Cite this