A locality aware convolutional neural networks accelerator

R. Shi, Z. Xu, Z. Sun, M.C.J. Peemen, A. Li, H. Corporaal, D. Wu

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

7 Citations (Scopus)


The advantages of Convolutional Neural Networks (CNNs) with respect to traditional methods for visual pattern recognition have changed the field of machine vision. The main issue that hinders broad adoption of this technique is the massive computing workload in CNN that prevents real-time implementation on low-power embedded platforms. Recently, several dedicated solutions have been proposed to improve the energy efficiency and throughput, nevertheless the huge amount of data transfer involved in the processing is still a challenging issue. This work proposes a new CNN accelerator exploiting a novel memory access scheme which significantly improves data locality in CNN related processing. With this scheme, external memory access is reduced by 50% while achieving similar or even better throughput. The accelerator is implemented using 28nm CMOS technology. Implementation results show that the accelerator achieves a performance of 102GOp/s @800MHz while consuming 0.303mm2 in silicon area. Power simulation shows that the dynamic power of the accelerator is 68mW. Its flexibility is demonstrated by running various different CNN benchmarks.

Original languageEnglish
Title of host publicationProceedings - 18th Euromicro Conference on Digital System Design, DSD 2015
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)978-1-4673-8035-5
Publication statusPublished - 20 Oct 2015
Event18th Euromicro Conference on Digital System Design (DSD 2015) - Funchal, Portugal
Duration: 26 Aug 201528 Aug 2015
Conference number: 18


Conference18th Euromicro Conference on Digital System Design (DSD 2015)
Abbreviated titleDSD 2015
OtherConference co-located with the 41st Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2015)
Internet address


  • Buffer storage
  • Convolution
  • Feature extraction
  • Parallel processing
  • Random access memory
  • Registers
  • System-on-chip


Dive into the research topics of 'A locality aware convolutional neural networks accelerator'. Together they form a unique fingerprint.

Cite this