On the use of small 2D convolutions on GPUs

S.A.H. Al Umairy, I.D. Setija, M.C. Beurden, van, H.J. Sips, A.S. Amesfoort, van

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Downloads (Pure)


Computing many small 2D convolutions using FFTs is a basis for a large number of applications in many domains in science and engineering, among them electromagnetic di??raction modeling in physics. The GPU architecture seems to be a suitable architecture to accelerate these convolutions, but reaching high application performance requires substantial development time and non-portable optimizations. In this work, we present the techniques, performance results and considerations to accelerate small 2D convolutions using CUDA, and compare performance to a multi-threaded CPU implementation. To improve programmability and performance of applications that make heavy use of small convolutions, we argue that two improvements to software and hardware are needed: FFT libraries must be extended with a single convolution function and communication bandwidth between CPU and GPU needs to be drastically improved.
Original languageEnglish
Title of host publicationProceedings of the First Workshop on Applications for Multi and Many Core Processors, A4MMC 2010, held in conjunction with ISCA 2010, 19 June 2010, St. Malo, France
Place of PublicationZ.pl.
Publication statusPublished - 2010


Dive into the research topics of 'On the use of small 2D convolutions on GPUs'. Together they form a unique fingerprint.

Cite this