A study of the potential of locality-aware thread scheduling for GPUs

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Citations (Scopus)
207 Downloads (Pure)

Abstract

Programming models such as CUDA and OpenCL allow the programmer to specify the independence of threads, effectively removing ordering constraints. Still, parallel architectures such as the graphics processing unit (GPU) do not exploit the potential of data-locality enabled by this independence. Therefore, programmers are required to manually perform data-locality optimisations such as memory coalescing or loop tiling. This work makes a case for locality-aware thread scheduling: re-ordering threads automatically for better locality to improve the programmability of multi-threaded processors. In particular, we analyse the potential of locality-aware thread scheduling for GPUs, considering among others cache performance, memory coalescing and bank locality. This work does not present an implementation of a locality-aware thread scheduler, but rather introduces the concept and identifies the potential. We conclude that non-optimised programs have the potential to achieve good cache and memory utilisation when using a smarter thread scheduler. A case-study of a naive matrix multiplication shows for example a 87% performance increase, leading to an IPC of 457 on a 512-core GPU.
Original languageEnglish
Title of host publicationEuro-Par 2014: Parallel Processing Workshops : Euro-Par 2014 International Workshops, Porto, Portugal, August 25-26, 2014, Revised Selected Papers, Part II
EditorsL. Lopes, J. Zilinskas
Place of PublicationBerlin
PublisherSpringer
Pages146-157
ISBN (Print)978-3-319-14312-5
DOIs
Publication statusPublished - 2014
Eventconference; 7th International Workshop on Multi-/Many-Core Computing Systems; 2014-08-26; 2014-08-26 -
Duration: 26 Aug 201426 Aug 2014

Publication series

NameLecture Notes in Computer Science
Volume8806
ISSN (Print)0302-9743

Conference

Conferenceconference; 7th International Workshop on Multi-/Many-Core Computing Systems; 2014-08-26; 2014-08-26
Period26/08/1426/08/14
Other7th International Workshop on Multi-/Many-Core Computing Systems

Fingerprint

Dive into the research topics of 'A study of the potential of locality-aware thread scheduling for GPUs'. Together they form a unique fingerprint.

Cite this