Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

18 Citations (Scopus)
5 Downloads (Pure)

Abstract

This paper introduces Content-aware Token Sharing (CTS), a token reduction approach that improves the computational efficiency of semantic segmentation networks that use Vision Transformers (ViTs). Existing works have proposed token reduction approaches to improve the efficiency of ViT-based image classification networks, but these methods are not directly applicable to semantic segmentation, which we address in this work. We observe that, for semantic segmentation, multiple image patches can share a token if they contain the same semantic class, as they contain redundant information. Our approach leverages this by employing an efficient, class-agnostic policy network that predicts if image patches contain the same semantic class, and lets them share a token if they do. With experiments, we explore the critical design choices of CTS and show its effectiveness on the ADE20K, Pascal Context and Cityscapes datasets, various ViT backbones, and different segmentation decoders. With Content-aware Token Sharing, we are able to reduce the number of processed tokens by up to 44%, without diminishing the segmentation quality.
Original languageEnglish
Title of host publication2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PublisherInstitute of Electrical and Electronics Engineers
Pages23631-23640
Number of pages10
ISBN (Electronic)979-8-3503-0129-8
DOIs
Publication statusPublished - 22 Aug 2023
Event2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - Vancouver, Canada
Duration: 17 Jun 202324 Jun 2023

Conference

Conference2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Country/TerritoryCanada
CityVancouver
Period17/06/2324/06/23

Funding

Acknowledgements This work is supported by Eindhoven Engine, NXP Semiconductors and Brainport Eindhoven. This work made use of the Dutch national e-infrastructure with the support of the SURF Cooperative using grant no. EINF-3836, which is financed by the Dutch Research Council (NWO).

FundersFunder number
SURFEINF-3836
Nederlandse Organisatie voor Wetenschappelijk Onderzoek

    Keywords

    • Computer vision
    • Semantic segmentation
    • Scene analysis and understanding

    Fingerprint

    Dive into the research topics of 'Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers'. Together they form a unique fingerprint.

    Cite this