Abstract
Accelerators designed for deep neural network (DNN) inference with extremely low operand widths, down to 1-bit, have become popular due to their ability to significantly reduce energy consumption during inference. This paper introduces a compiler-programmable flexible System-on-Chip (SoC) with mixed-precision support. This SoC is based on a Transport-Triggered Architecture (TTA) that facilitates efficient implementation of DNN workloads. By shifting the complexity of data movement from the hardware scheduler to the exposed-datapath compiler, DNN workloads can be implemented in an energy efficient yet flexible way. The architecture is fully supported by a compiler and can be programmed using C/C++/OpenCL. The SoC is implemented using 22nm FDX technology and achieves a peak energy efficiency of 28.6/14.9/2.47 TOPS/W for binary, ternary, and 8-bit precision, respectively, while delivering a throughput of 614/307/77 GOPS. Compared to state-of-the-art (SotA), this work achieves up to 3.3x better energy efficiency compared to other programmable solutions.
Original language | English |
---|---|
Title of host publication | 2023 IEEE International Conference on Computer Design (ICCD) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 78-85 |
Number of pages | 8 |
ISBN (Electronic) | 979-8-3503-4291-8 |
DOIs | |
Publication status | Published - 22 Dec 2023 |
Event | 41st IEEE International Conference on Computer Design, ICCD 2023 - Washington, United States Duration: 6 Nov 2023 → 8 Nov 2023 |
Conference
Conference | 41st IEEE International Conference on Computer Design, ICCD 2023 |
---|---|
Abbreviated title | ICCD 2023 |
Country/Territory | United States |
City | Washington |
Period | 6/11/23 → 8/11/23 |
Keywords
- Binary neural network
- Ternary neural network
- deep learning
- edge computing
- low power inference
- mixed precision accelerator
- neural network accelerator