Abstract
Single-Instruction-Multiple-Data (SIMD) architectures, which exploit data-level parallelism (DLP), are widely used to achieve high-performance and low-power computing. In most of streaming applications, such as CNN-based detection and recognition, color space conversion and various kinds of filters, multiply-accumulate is one of the most important and expensive operations to be executed. In this paper, we propose a high-performance low-power SIMD architecture with advanced multiply accumulator (MAC) support (MacSim) to improve the computational efficiency. In addition, a smart loop tiling scheme is proposed. To support this tiling even further, the MAC unit is equipped with multiple accumulator registers. According to the Design Space Exploration (DSE) of the proposed MAC unit, a MAC instance with four accumulator registers (MAC4reg) is selected as a good choice for target kernels. In this paper, a 64-PE 16-bit (processing element) SIMD instance without MAC support is taken as the baseline. For a head-to-head comparison, a 64-PE 16-bit SIMD with MAC4reg (MacSim4) and the baseline SIMD are all implemented in HDL and synthesized with a TSMC 40nm low-power library. Five streaming application kernels are mapped to both architectures. Our experimental results show with MAC4reg the runtime and energy consumption are reduced up to 38% and 42% respectively. Besides, a 4-layer CNN-based detection application is also fully mapped onto the proposed MacSim4. Working at 950MHz, MacSim4 reaches a throughput of 62.4 GOPS, which meets the requirement of real-time (720P HD, 30fps) detection. The energy consumption per PE per operation is very low, 4.7pJ/Op excluding SRAM (Static Random Access Memory) and 4.8pJ/Op including a 2k-entry SRAM bank. As a prototype, the proposed SIMD is mapped into an FPGA and can run all the kernels.
Original language | English |
---|---|
Title of host publication | Proceedings - 19th Euromicro Conference on Digital System Design, DSD 2016 |
Place of Publication | Piscataway |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 160-167 |
Number of pages | 8 |
ISBN (Electronic) | 978-1-5090-2817-7 |
DOIs | |
Publication status | Published - 26 Oct 2016 |
Event | 19th Euromicro Conference on Digital System Design (DSD 2016) - Limassol, Cyprus Duration: 31 Aug 2016 → 2 Sept 2016 Conference number: 19 http://dsd-seaa2016.cs.ucy.ac.cy/index.php?p=DSD2016 |
Conference
Conference | 19th Euromicro Conference on Digital System Design (DSD 2016) |
---|---|
Abbreviated title | DSD 2016 |
Country/Territory | Cyprus |
City | Limassol |
Period | 31/08/16 → 2/09/16 |
Internet address |
Keywords
- High Performance
- Loop Tiling
- Low Power
- MAC
- SIMD