Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

7 Citaten (Scopus)
5 Downloads (Pure)

Samenvatting

This paper introduces Content-aware Token Sharing (CTS), a token reduction approach that improves the computational efficiency of semantic segmentation networks that use Vision Transformers (ViTs). Existing works have proposed token reduction approaches to improve the efficiency of ViT-based image classification networks, but these methods are not directly applicable to semantic segmentation, which we address in this work. We observe that, for semantic segmentation, multiple image patches can share a token if they contain the same semantic class, as they contain redundant information. Our approach leverages this by employing an efficient, class-agnostic policy network that predicts if image patches contain the same semantic class, and lets them share a token if they do. With experiments, we explore the critical design choices of CTS and show its effectiveness on the ADE20K, Pascal Context and Cityscapes datasets, various ViT backbones, and different segmentation decoders. With Content-aware Token Sharing, we are able to reduce the number of processed tokens by up to 44%, without diminishing the segmentation quality.
Originele taal-2Engels
Titel2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
UitgeverijInstitute of Electrical and Electronics Engineers
Pagina's23631-23640
Aantal pagina's10
ISBN van elektronische versie979-8-3503-0129-8
DOI's
StatusGepubliceerd - 22 aug. 2023
Evenement2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - Vancouver, Canada
Duur: 17 jun. 202324 jun. 2023

Congres

Congres2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Land/RegioCanada
StadVancouver
Periode17/06/2324/06/23

Financiering

Acknowledgements This work is supported by Eindhoven Engine, NXP Semiconductors and Brainport Eindhoven. This work made use of the Dutch national e-infrastructure with the support of the SURF Cooperative using grant no. EINF-3836, which is financed by the Dutch Research Council (NWO).

FinanciersFinanciernummer
Surf, StichtingEINF-3836
Nederlandse Organisatie voor Wetenschappelijk Onderzoek

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit