Abstract

The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization schedule. However, automatic schedule generation is currently only possible for multi-core CPU architectures. As a result, expert knowledge is still required when optimizing for platforms with GPU capabilities. In this work, we extend the current Halide Autoscheduler with novel optimization passes to efficiently generate schedules for CUDA-based GPU architectures. We evaluate our proposed method across a variety of applications and show that it can achieve performance competitive with that of manually tuned Halide schedules, or in many cases even better performance. Experimental results show that our schedules are on average 10% faster than manual schedules and over 2× faster than previous autoscheduling attempts.

Original languageEnglish
Article number3406117
JournalACM Transactions on Architecture and Code Optimization
Volume17
Issue number3
DOIs
Publication statusPublished - Aug 2020

Keywords

  • GPU
  • Halide
  • image processing
  • Loop optimizations
  • scheduling

Fingerprint Dive into the research topics of 'Schedule Synthesis for Halide Pipelines on GPUs'. Together they form a unique fingerprint.

Cite this