POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks?

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Efficient deployment of deep learning models on resource-constrained devices requires balancing accuracy with energy consumption and/or latency. Quantization is a proven method to achieve this balance by reducing the precision of neural network weights and activations. However, simply changing the precision does not enable direct iso-accuracy and iso-energy comparisons. To address this, we combine a realistic processor energy model with a network filter multiplier that scales the number of channels, thereby enabling such comparisons. This work presents a Pareto-Optimal Quantization (POQ) methodology aimed at mapping a neural network architecture to a specific hardware platform while systematically exploring the design space in between to identify the most effective quantization strategy. Our approach evaluates how different design choices impact the accuracy-energy trade-off. Using detailed energy modeling instead of proxy metrics, our results reveal that 8-bit integer (int8) quantization is Pareto-Optimal for MobileNetV2, providing up to 2.8× energy savings or 10% higher accuracy compared to 16-bit floating-point (fp16). Furthermore, employing high-precision residuals shifts the Pareto frontier, making 4-bit integer (int4) quantization optimal, achieving up to 1.9× additional energy reduction or 2% additional accuracy gains. Moreover, our findings emphasize the role of DRAM energy in certain model configurations and highlight the importance of precise energy modeling. These results reflect the application of our POQ methodology to the practical deployment of energy-efficient deep learning models on constrained hardware.

Original languageEnglish
Article number10988610
Pages (from-to)81434-81449
Number of pages16
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Deep learning
  • design space exploration
  • energy efficiency
  • hardware accelerator
  • Pareto analysis
  • quantized neural networks
  • scheduling

Fingerprint

Dive into the research topics of 'POQ: Is There a Pareto-Optimal Quantization Strategy for Deep Neural Networks?'. Together they form a unique fingerprint.

Cite this