Abstract
The advantages of Convolutional Neural Networks (CNNs) with respect to traditional methods for visual pattern recognition have changed the field of machine vision. The main issue that hinders broad adoption of this technique is the massive computing workload in CNN that prevents real-time implementation on low-power embedded platforms. Recently, several dedicated solutions have been proposed to improve the energy efficiency and throughput, nevertheless the huge amount of data transfer involved in the processing is still a challenging issue. This work proposes a new CNN accelerator exploiting a novel memory access scheme which significantly improves data locality in CNN related processing. With this scheme, external memory access is reduced by 50% while achieving similar or even better throughput. The accelerator is implemented using 28nm CMOS technology. Implementation results show that the accelerator achieves a performance of 102GOp/s @800MHz while consuming 0.303mm2 in silicon area. Power simulation shows that the dynamic power of the accelerator is 68mW. Its flexibility is demonstrated by running various different CNN benchmarks.
Original language | English |
---|---|
Title of host publication | Proceedings - 18th Euromicro Conference on Digital System Design, DSD 2015 |
Place of Publication | Piscataway |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 591-598 |
Number of pages | 8 |
ISBN (Electronic) | 978-1-4673-8035-5 |
DOIs | |
Publication status | Published - 20 Oct 2015 |
Event | 18th Euromicro Conference on Digital System Design (DSD 2015) - Funchal, Portugal Duration: 26 Aug 2015 → 28 Aug 2015 Conference number: 18 https://paginas.fe.up.pt/~dsd-seaa-2015/dsd2015/ |
Conference
Conference | 18th Euromicro Conference on Digital System Design (DSD 2015) |
---|---|
Abbreviated title | DSD 2015 |
Country/Territory | Portugal |
City | Funchal |
Period | 26/08/15 → 28/08/15 |
Other | Conference co-located with the 41st Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2015) |
Internet address |
Keywords
- Buffer storage
- Convolution
- Feature extraction
- Parallel processing
- Random access memory
- Registers
- System-on-chip