Fine-grained synchronizations and dataflow programming on GPUs

A. Li, G.-J. van den Braak, H. Corporaal, A. Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

31 Citations (Scopus)


The last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The data-parallel programming model assumes a single instruction stream for multiple concurrent threads (SIMT); therefore little support is offered to enforce thread ordering and finegrained synchronizations. This becomes an obstacle when migrating algorithms which exploit fine-grained parallelism, to GPUs, such as the dataow algorithms. In this paper, we propose a novel approach for fine-grained inter-thread synchronizations on the shared memory of modern GPUs. We demonstrate its performance and compare it with other fine-grained and medium-grained synchronization approaches. Our method achieves 1.5x speedup over the warp-barrier based approach and 4.0x speedup over the atomic spin-lock based approach on average. To further explore the possibility of realizing fine-grained dataow algorithms on GPUs, we apply the proposed synchronization scheme to Needleman-Wunsch-a 2D wavefront application involving massive cross-loop data dependencies. Our implementation achieves 3.56x speedup over the atomic spin-lock implementation and 1.15x speedup over the conventional data-parallel implementation for a basic sub-grid, which implies that the fine-grained, lock-based programming pattern could be an alternative choice for designing general-purpose GPU applications (GPGPU).

Original languageEnglish
Title of host publicationICS 2015 - Proceedings of the 29th ACM International Conference on Supercomputing
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Number of pages10
ISBN (Electronic)978-1-4503-3559-1
Publication statusPublished - 8 Jun 2015
Event29th ACM International Conference on Supercomputing (ICS 2015), June 8-11, 2015, Newport Beach, - Newport Beach, United States
Duration: 8 Jun 201511 Jun 2015


Conference29th ACM International Conference on Supercomputing (ICS 2015), June 8-11, 2015, Newport Beach,
Abbreviated titleICS 2015
Country/TerritoryUnited States
CityNewport Beach


  • Dataow
  • Fine-grained synchronization
  • GPU
  • Spin-lock


Dive into the research topics of 'Fine-grained synchronizations and dataflow programming on GPUs'. Together they form a unique fingerprint.

Cite this