Abstract
Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
Original language | English |
---|---|
Title of host publication | CODAI '23 |
Subtitle of host publication | Proceedings of the 2023 Workshop on Compilers, Deployment, and Tooling for Edge AI |
Place of Publication | New York |
Publisher | Association for Computing Machinery, Inc. |
Number of pages | 5 |
ISBN (Electronic) | 979-8-4007-0337-9 |
DOIs | |
Publication status | Published - 10 Jun 2024 |
Event | 2023 IEEE/ACM International Workshop on Compilers, Deployment, and Tooling for Edge AI , CODAI 2023 - Hamburg, Germany Duration: 21 Sept 2023 → 21 Sept 2023 |
Workshop
Workshop | 2023 IEEE/ACM International Workshop on Compilers, Deployment, and Tooling for Edge AI , CODAI 2023 |
---|---|
Abbreviated title | CODAI 2023 |
Country/Territory | Germany |
City | Hamburg |
Period | 21/09/23 → 21/09/23 |
Funding
This work was supported by Key Digital Technologies Joint Undertaking (KDT JU) in EdgeAI \"Edge AI Technologies for Optimised Performance Embedded Processing\" project, grant agreement No 101097300.