Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to take advantage of such processors, but still require a large amount of effort from programmers. A number of parallelizing source-to-source compilers have recently been developed to ease programming of multi-core and many-core processors. This work presents and evaluates a number of such tools, focused in particular on C-to-CUDA transformations targeting GPUs. We compare these tools both qualitatively and quantitatively to each other and identify their strengths and weaknesses. In this paper, we address the weaknesses by presenting a new classification of algorithms. This classification is used in a new source-to-source compiler, which is based on the algorithmic skeletons technique. The compiler generates target code based on skeletons of parallel structures, which can be seen as parameterisable library implementations for a set of algorithm classes. We furthermore demonstrate that the presented compiler requires little modifications to the original sequential source code, generates readable code for further fine-tuning, and delivers superior performance compared to other tools for a set of 8 image processing kernels.
|Title of host publication||Proceedings of the 5th Workshop on General Purpose Processing on Graphics Processing Units at ASPLOS'12, March 3-7, 2012, London, United Kingdom|
|Place of Publication||New York, USA|
|Publisher||Association for Computing Machinery, Inc|
|Publication status||Published - 2012|
|Event||conference; ASPLOS'12 - |
Duration: 1 Jan 2012 → …
|Period||1/01/12 → …|