In this dissertation, we address the problem of performance efficient multithreading execution on heterogeneous multicore embedded systems. By heterogeneous multicore embedded systems we refer to those, which have real-time requirements and consist of processor tiles with General Purpose Processor (GPP), local memory, and one or more coprocessors running on reconfigurable logic ((e)FPGA). We improve system performance by combining two common methods. The first method is to exploit the available application parallelism by means of multithreading program execution. The second method is to provide hardware acceleration for the most computationally intensive kernels. More specifically our scientific approach is as follows: we categorize the existing program execution models from the processor-coprocessor synchronization prospective and we introduce new parallel execution models. Then, we provide a high-level architectural abstraction of those execution models and programming paradigm that describes and utilizes them. Furthermore, we propose a microarchitectural support for the identified execution models. The functionality of the microarchitectural extensions is encapsulated in a new reconfigurable coprocessor, called Thread Interrupt State Controller (TISC). To improve the overall system performance, we employ the newly proposed program execution models to transfer highly time-variable and time-consuming Real-Time Operating System (RTOS) and application kernels from software, i.e., executed on the GPPs, to hardware, i.e., executed on the reconfigurable coprocessors. We refer to this reconfigurable coprocessor as Hardware Task Status Manager (HWTSM). Due to the properties of the newly introduced execution models such as parallel execution and constant response time, we preserve the predictability and composability at application level. Last but not least, we introduce a framework for distribution of slack information (idle processor time) among processor tiles. In the proposed framework we employ one of the newly introduced parallel processor-coprocessor execution models. We refer to the new reconfigurable coprocessor as RS. We use the extra slack information obtained through our framework for Dynamic Voltage Frequency Scaling that reduces the overall energy consumption. Based on the available experimental results with synthetic and real applications, we improve the system speedup up to 19.6 times with the help of the Thread Interrupt State Controller. Furthermore, we reduce RTOS cost with the help of the Hardware Task Status Manager, which results in additional application acceleration up to 13.3%. Last but not least, we improve the system energy consumption up to 56.7% over current state of the art with the help of inter-tile remote slack information distribution framework. Overall, with the help of our contributions, the system performance is improved, the predictability and composability are preserved, all with reduced energy consumption.
|Qualification||Doctor of Philosophy|
|Award date||1 Jan 2014|
|Place of Publication||Delft|
|Publication status||Published - 2014|