Bones : an automatic skeleton-based C-to-CUDA compiler for GPUs

C. Nugteren, H. Corporaal

Research output: Contribution to journalArticleAcademicpeer-review

25 Citations (Scopus)
4 Downloads (Pure)


The shift toward parallel processor architectures has made programming and code generation increasingly challenging. To address this programmability challenge, this article presents a technique to fully automatically generate efficient and readable code for parallel processors (with a focus on GPUs). This is made possible by combining algorithmic skeletons, traditional compilation, and "algorithmic species," a classification of program code. Compilation starts by automatically annotating C code with class information (the algorithmic species). This code is then fed into the skeleton-based source-to-source compiler bones to generate CUDA code. To generate efficient code, bones also performs optimizations including host-accelerator transfer optimization and kernel fusion. This results in a unique approach, integrating a skeleton-based compiler for the first time into an automated flow. The benefits are demonstrated experimentally for PolyBench GPU kernels, showing geometric mean speed-ups of 1.4× and 2.4× compared to ppcg and Par4All, and for five Rodinia GPU benchmarks, showing a gap of only 1.2× compared to hand-optimized code.
Original languageEnglish
Article number35
Pages (from-to)35-1-35-25
JournalACM Transactions on Architecture and Code Optimization
Issue number4
Publication statusPublished - 2014


Dive into the research topics of 'Bones : an automatic skeleton-based C-to-CUDA compiler for GPUs'. Together they form a unique fingerprint.

Cite this