Optimal iteration scheduling for intra- and inter-tile reuse in nested loop accelerators

M.C.J. Peemen, B. Mesman, H. Corporaal

Research output: Book/ReportReportAcademic

527 Downloads (Pure)


High Level Synthesis tools have reduced accelerator design time. However, a complex scaling problem that remains is the data transfer bottleneck. Accelerators require huge amounts of data and are often limited by interconnect resources. Local buffers can reduce communication by exploiting data reuse, but the data access order has a substantial impact on the amount of reuse that can be utilized. With loop transformations such as interchange and tiling the data access order can be modified. However, for real applications the design space is huge, finding the best set of transformations is often intractable. Therefore, we present a new methodology that minimizes the data transfer by loop interchange and tiling. In contrast to other methods we take inter-tile reuse and loop bounds into account. For real-world applications we show buffer size trade-offs that can give speedups up to 14x, alternatively these can reduce the required FPGA resources substantially.
Original languageEnglish
Place of PublicationEindhoven
PublisherEindhoven University of Technology
Number of pages22
Publication statusPublished - 2013

Publication series



Dive into the research topics of 'Optimal iteration scheduling for intra- and inter-tile reuse in nested loop accelerators'. Together they form a unique fingerprint.

Cite this