Abstract
During the past decade, deep learning has achieved remarkable advances, demonstrating its versatility and potential through applications in image classification, object detection, image segmentation, and generative AI. These accomplishments, which match or even surpass human capabilities, are driven by improvements in algorithms, the availability of large datasets, and advances in computational power. Concurrently, there has been an exponential increase in the number of model parameters, reflecting the growing complexity of deep learning models.
A second significant trend is the shift towards deploying these models on embedded systems (edge AI) instead of the cloud. This transition is motivated by the need for real-time processing, improved system safety and reliability, privacy concerns, and the ability to operate in environments without cloud connectivity. However, the increasing complexity of models poses significant challenges for edge systems, which are inherently limited in resources. To enable broader adoption of deep learning in such constrained environments, efficient deep learning solutions are necessary. This thesis focuses on improving energy efficiency during inference, addressing the most critical constraint for edge devices. Area efficiency is also important, as it influences cost, but it is not the main focus of this work.
There are many research directions to improve deep learning efficiency, including code transformations, compression, and hardware innovations. This thesis aims to optimize the energy efficiency of deep learning models through quantization while minimizing the impact on accuracy. Quantization reduces the precision of the data, implying that fewer bits are used to represent each value. This reduces the required data storage and data movement and enables smaller and simpler compute units. Hence, quantization is a promising approach to improve both energy and area efficiency. However, it poses challenges related to accuracy degradation and its application-specific nature.
As quantization needs vary by model and task, flexible and efficient hardware support is required. To this end, this thesis introduces BrainTTA, a flexible and efficient Transport-Triggered Architecture-based System-on-Chip for neural network inference. BrainTTA enables energy-efficient deep learning through quantization by supporting various precision levels (1/2/4/8-bit) and various types of neural network layers. Through its programmable exposed datapath architecture, BrainTTA achieves up to 3.1x better energy efficiency than other programmable accelerators. Its chip layout is implemented in both 22nm and 28nm processes, and it has been successfully fabricated in 28nm FDSOI, demonstrating practical viability for energy-efficient deep learning on edge devices.
Although BrainTTA supports binary quantization for maximum energy efficiency, employing BNNs introduces significant accuracy challenges. As a result, BNNs require repair methods to minimize the accuracy gap with their full-precision counterparts. This thesis provides a classification and overview of various repair methods proposed in recent literature, together with an empirical review to evaluate the benefits of each method. The review identifies the most effective repair categories, but also reveals that there remains an accuracy gap compared to full-precision models.
To shed light on the previously unknown energy costs of BNNs repair methods, this thesis compares the accuracy-energy trade-off of (repaired) BNNs to int2, int4, and int8 networks using an energy model based on BrainTTA. The results demonstrate that the use of certain high-precision repairs can significantly improve the accuracy gap from 29.1 to 5.4 percentage points on the ImageNet benchmark using ResNet-18. However, further closing this gap proves to be increasingly costly. Furthermore, the analysis reveals great potential for the int4 network, which outperforms some repaired BNNs in both accuracy and energy efficiency, and is more energy-efficient than the int8 network, though slightly less accurate. These findings highlight the varying costs and benefits of different repair strategies, emphasizing the need for effective methods that balance energy efficiency and accuracy.
To thoroughly explore the accuracy-energy trade-off across different precisions, this thesis introduces a Pareto-Optimal Quantization methodology that systematically evaluates quantization strategies using detailed energy modeling and network width scaling to enable iso-accuracy and iso-energy comparisons. The results show that int8 quantization with a SP computational graph (convolutions, activations and residuals are uniformly quantized to e.g. int8) achieves Pareto optimality, offering up to 2.8x energy savings or 10% higher accuracy compared to fp16. When incorporating HPRs, which maintains higher precision for residuals and select outputs, int4 becomes Pareto-optimal, providing an additional 1.9x energy savings or 2% accuracy improvement over int8. Furthermore, the analysis quantifies the dominance of DRAM energy and demonstrates the limitations of proxy metrics, emphasizing the need for detailed energy modeling and flexible quantization strategies for edge deployments.
In conclusion, this thesis advances the understanding of energy-efficient deep learning by addressing key challenges of quantization: accuracy loss and application specificity. It presents BrainTTA, a flexible mixed-precision architecture; maps the landscape of BNN repair methods; quantifies repaired BNNs against int2–int8 models; and proposes a methodology for identifying Pareto-Optimal quantization strategies. Together, these contributions facilitate the broader deployment of deep learning in resource-constrained environments.
| Original language | English |
|---|---|
| Qualification | Doctor of Philosophy |
| Awarding Institution |
|
| Supervisors/Advisors |
|
| Award date | 18 Mar 2026 |
| Place of Publication | Eindhoven |
| Publisher | |
| Print ISBNs | 978-90-386-6631-0 |
| Publication status | Accepted/In press - 18 Mar 2026 |
Bibliographical note
Proefschrift.UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Fingerprint
Dive into the research topics of 'Neural Network Quantization from Training to Silicon: Challenges and Opportunities for Energy-Efficiency'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver