Quantization of deep neural networks for accumulator-constrained processors

Barry de Bruin (Corresponding author), Zoran Zivkovic, Henk Corporaal

Research output: Contribution to journalConference articleAcademicpeer-review

134 Downloads (Pure)

Abstract

We introduce an Artificial Neural Network (ANN) quantization methodology for platforms without wide accumulation registers. This enables fixed-point model deployment on embedded compute platforms that are not specifically designed for large kernel computations (i.e. accumulator-constrained processors). We formulate the quantization problem as a function of accumulator size, and aim to maximize the model accuracy by maximizing bit width of input data and weights. To reduce the number of configurations to consider, only solutions that fully utilize the available accumulator bits are being tested. We demonstrate that 16 bit accumulators are able to obtain a classification accuracy within 1% of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks. Additionally, a near-optimal 2 × speedup is obtained on an ARM processor, by exploiting 16 bit accumulators for image classification on the All-CNN-C and AlexNet networks.

Original languageEnglish
Article number102872
Number of pages11
JournalMicroprocessors and Microsystems
Volume72
DOIs
Publication statusPublished - 1 Feb 2020

Fingerprint

Image classification
ARM processors
Neural networks
Deep neural networks

Keywords

  • Convolutional neural networks
  • Efficient inference
  • Fixed-point
  • Narrow accumulators
  • Quantization

Cite this

@article{de8a7c70cba94bde89e92c0fb10f1bc6,
title = "Quantization of deep neural networks for accumulator-constrained processors",
abstract = "We introduce an Artificial Neural Network (ANN) quantization methodology for platforms without wide accumulation registers. This enables fixed-point model deployment on embedded compute platforms that are not specifically designed for large kernel computations (i.e. accumulator-constrained processors). We formulate the quantization problem as a function of accumulator size, and aim to maximize the model accuracy by maximizing bit width of input data and weights. To reduce the number of configurations to consider, only solutions that fully utilize the available accumulator bits are being tested. We demonstrate that 16 bit accumulators are able to obtain a classification accuracy within 1{\%} of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks. Additionally, a near-optimal 2 × speedup is obtained on an ARM processor, by exploiting 16 bit accumulators for image classification on the All-CNN-C and AlexNet networks.",
keywords = "Convolutional neural networks, Efficient inference, Fixed-point, Narrow accumulators, Quantization",
author = "{de Bruin}, Barry and Zoran Zivkovic and Henk Corporaal",
year = "2020",
month = "2",
day = "1",
doi = "10.1016/j.micpro.2019.102872",
language = "English",
volume = "72",
journal = "Microprocessors and Microsystems",
issn = "0141-9331",
publisher = "Elsevier",

}

Quantization of deep neural networks for accumulator-constrained processors. / de Bruin, Barry (Corresponding author); Zivkovic, Zoran; Corporaal, Henk.

In: Microprocessors and Microsystems, Vol. 72, 102872, 01.02.2020.

Research output: Contribution to journalConference articleAcademicpeer-review

TY - JOUR

T1 - Quantization of deep neural networks for accumulator-constrained processors

AU - de Bruin, Barry

AU - Zivkovic, Zoran

AU - Corporaal, Henk

PY - 2020/2/1

Y1 - 2020/2/1

N2 - We introduce an Artificial Neural Network (ANN) quantization methodology for platforms without wide accumulation registers. This enables fixed-point model deployment on embedded compute platforms that are not specifically designed for large kernel computations (i.e. accumulator-constrained processors). We formulate the quantization problem as a function of accumulator size, and aim to maximize the model accuracy by maximizing bit width of input data and weights. To reduce the number of configurations to consider, only solutions that fully utilize the available accumulator bits are being tested. We demonstrate that 16 bit accumulators are able to obtain a classification accuracy within 1% of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks. Additionally, a near-optimal 2 × speedup is obtained on an ARM processor, by exploiting 16 bit accumulators for image classification on the All-CNN-C and AlexNet networks.

AB - We introduce an Artificial Neural Network (ANN) quantization methodology for platforms without wide accumulation registers. This enables fixed-point model deployment on embedded compute platforms that are not specifically designed for large kernel computations (i.e. accumulator-constrained processors). We formulate the quantization problem as a function of accumulator size, and aim to maximize the model accuracy by maximizing bit width of input data and weights. To reduce the number of configurations to consider, only solutions that fully utilize the available accumulator bits are being tested. We demonstrate that 16 bit accumulators are able to obtain a classification accuracy within 1% of the floating-point baselines on the CIFAR-10 and ILSVRC2012 image classification benchmarks. Additionally, a near-optimal 2 × speedup is obtained on an ARM processor, by exploiting 16 bit accumulators for image classification on the All-CNN-C and AlexNet networks.

KW - Convolutional neural networks

KW - Efficient inference

KW - Fixed-point

KW - Narrow accumulators

KW - Quantization

UR - http://www.scopus.com/inward/record.url?scp=85074257363&partnerID=8YFLogxK

U2 - 10.1016/j.micpro.2019.102872

DO - 10.1016/j.micpro.2019.102872

M3 - Conference article

VL - 72

JO - Microprocessors and Microsystems

JF - Microprocessors and Microsystems

SN - 0141-9331

M1 - 102872

ER -