Many networked embedded systems combine sensing using cameras with processing to achieve certain communication, measurement or control goals. Video Camcorders, web cameras and video phones are examples of products where the combination of image sensing, digital storage and transmission is penetrating the mass electronics market. Other applications can be found in inspection, surveillance and robotic applications. Many of these applications easily require tens of billions of arithmetic operations per second of sustained performance, yet also have tight power constraints in many systems. These requirements make the design very challenging. Often, digital signal processors or general-purpose microprocessors are used for these applications, but the field of image processing allows for many architectural optimizations, such as the use of single instruction multiple data (SIMD) processors for pixel-level operations, and instruction level parallelism (ILP) processors for feature-extraction and object-based operations. In this dissertation, we foresee a further integration, resulting in a combination of at least one or more sensors, SIMD processors and ILP processors. The result is a low-cost smart camera (socalled SmartCam) solution. Constraints such as processing speed, power consumption and cost vary wildly between applications, and thus there is no single solution that fits all needs. We are interested in quantifying the design flow of application-specific smart cameras via the use of simulation and analysis in a design space exploration (DSE) environment, and in the development of an intuitive programming model. It is totally unclear what the right architectural parameters are for a given application domain. There are many parameters, like number of processing elements (PEs) in SIMD processors, number of SIMD processors, number of ILP processors, inter-PE communication organization, number of arithmetic logic units (ALUs) in each PE, etc. For finding appropriate values for these parameters, we propose a DSE framework to find an efficient architecture for a SmartCam with respect to constraints such as area, performance and energy. As a programming model for SmartCam solutions, we propose a framework based on algorithmic skeletons. An algorithmic skeleton implements an image processing operation for a specific SmartCam architecture, hiding the parallelism for the programmer. Algorithmic skeletons provide ease of programming and code portability at the cost of only a small performance loss. As mentioned for image processing applications, SIMD architectures can be very efficient. However, one of the problems in current SIMD processors is efficient inter-PE communication. Often the PEs of an SIMD processor are only locally connected (LC-SIMD). This may result in a communication bottleneck (many communication operations are needed). One way to solve this is to use a fully connected communication network between PEs (FC-SIMD). However, this solution leads to an excessive communication area cost, low communication network utilization, and scalability problems. In this thesis, we introduce a new type of SIMD architecture, called RC-SIMD, with a run-time reconfigurable communication network. It uses a delay-line in the instruction bus, causing the accesses to the communication network to be distributed over time. This architecture requires only a very cheap communication network (the area overhead is about 10-12% in comparison with LC-SIMD) while performing much better than LC-SIMD and often the same as expensive FC-SIMD architectures. An additional problem for the communication between PEs is the fact that the SIMD concept does not match with variable distance communication between PEs. If a particular PE needs to communicate with another PE at a certain distance, all PEs need to communicate with the same distance (due to the SIMD concept). Therefore, traditional SIMD processors can not implement efficiently certain applications, like lens distortion compensation. In this thesis, we consider two variants of the communication infra-structure of SIMD processors that enable dynamic distance communication of pixel data (called DC-SIMD). The results show that variable distance communication can be achieved at a reasonable cost of about 30% in area and substantial performance improvement (67.8% for lens distortion compensation). Thus DC-SIMD processors provide for certain algorithms a good alternative compared to ILP or general-purpose processors.
|Qualification||Doctor of Philosophy|
|Award date||21 Mar 2007|
|Place of Publication||Eindhoven|
|Publication status||Published - 2007|