Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge

Yao Lu, Hiram Rayo Torres Rodriguez, Sebastian Vogel, Nick van de Waterlaat, Pavol Jancura

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    2 Citations (Scopus)
    16 Downloads (Pure)

    Abstract

    Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
    Original languageEnglish
    Title of host publicationCODAI '23
    Subtitle of host publicationProceedings of the 2023 Workshop on Compilers, Deployment, and Tooling for Edge AI
    Place of PublicationNew York
    PublisherAssociation for Computing Machinery, Inc.
    Number of pages5
    ISBN (Electronic)979-8-4007-0337-9
    DOIs
    Publication statusPublished - 10 Jun 2024
    Event2023 IEEE/ACM International Workshop on Compilers, Deployment, and Tooling for Edge AI , CODAI 2023 - Hamburg, Germany
    Duration: 21 Sept 202321 Sept 2023

    Workshop

    Workshop2023 IEEE/ACM International Workshop on Compilers, Deployment, and Tooling for Edge AI , CODAI 2023
    Abbreviated titleCODAI 2023
    Country/TerritoryGermany
    CityHamburg
    Period21/09/2321/09/23

    Funding

    This work was supported by Key Digital Technologies Joint Undertaking (KDT JU) in EdgeAI \"Edge AI Technologies for Optimised Performance Embedded Processing\" project, grant agreement No 101097300.

    Fingerprint

    Dive into the research topics of 'Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge'. Together they form a unique fingerprint.

    Cite this