Samenvatting
GPUs are increasingly used as compute accelerators. With a large number of cores executing an even larger number of threads, significant speed-ups can be attained for parallel workloads. Applications that rely on atomic operations, such as histogram and Hough transform, suffer from serialization of threads in case they update the same memory location. Previous work shows that reducing this serialization with software techniques can increase performance by an order of magnitude. We observe, however, that some serialization remains and still slows down these applications. Therefore, this paper proposes to use a hash function in both the addressing of the banks and the locks of the scratchpad memory. To measure the effects of these changes, we first implement a detailed model of atomic operations on scratchpad memory in GPGPU-Sim, and verify its correctness. Second, we test our proposed hardware changes. They result in a speed-up up to 4.9× and 1.8× on implementations utilizing the aforementioned software techniques for histogram and Hough transform applications respectively, with minimum hardware costs.
Originele taal-2 | Engels |
---|---|
Titel | Proceedings of the IEEE 31st International Conference on Computer Design (ICCD) 2013, 6-9 October 2013, |
Plaats van productie | Piscataway |
Uitgeverij | Institute of Electrical and Electronics Engineers |
Pagina's | 357-362 |
ISBN van geprinte versie | 978-1-4799-2987-0 |
DOI's | |
Status | Gepubliceerd - 2013 |
Evenement | 31st IEEE International Conference on Computer Design (ICCD 2013) - Asheville, NC, Verenigde Staten van Amerika Duur: 6 okt. 2013 → 9 okt. 2013 Congresnummer: 31 http://www.iccd-conf.com/2013/Home.html |
Congres
Congres | 31st IEEE International Conference on Computer Design (ICCD 2013) |
---|---|
Verkorte titel | ICCD 2013 |
Land/Regio | Verenigde Staten van Amerika |
Stad | Asheville, NC |
Periode | 6/10/13 → 9/10/13 |
Ander | International Conference on Computer Design (ICCD) |
Internet adres |