Abstract
Many networked embedded systems combine sensing using cameras with processing
to achieve certain communication, measurement or control goals. Video Camcorders,
web cameras and video phones are examples of products where the combination
of image sensing, digital storage and transmission is penetrating the mass
electronics market. Other applications can be found in inspection, surveillance
and robotic applications. Many of these applications easily require tens of billions
of arithmetic operations per second of sustained performance, yet also have tight
power constraints in many systems. These requirements make the design very
challenging.
Often, digital signal processors or general-purpose microprocessors are used for
these applications, but the field of image processing allows for many architectural
optimizations, such as the use of single instruction multiple data (SIMD) processors
for pixel-level operations, and instruction level parallelism (ILP) processors
for feature-extraction and object-based operations. In this dissertation, we foresee
a further integration, resulting in a combination of at least one or more sensors,
SIMD processors and ILP processors. The result is a low-cost smart camera (socalled
SmartCam) solution.
Constraints such as processing speed, power consumption and cost vary wildly
between applications, and thus there is no single solution that fits all needs. We
are interested in quantifying the design flow of application-specific smart cameras
via the use of simulation and analysis in a design space exploration (DSE) environment,
and in the development of an intuitive programming model. It is totally
unclear what the right architectural parameters are for a given application domain.
There are many parameters, like number of processing elements (PEs) in SIMD
processors, number of SIMD processors, number of ILP processors, inter-PE communication
organization, number of arithmetic logic units (ALUs) in each PE, etc.
For finding appropriate values for these parameters, we propose a DSE framework
to find an efficient architecture for a SmartCam with respect to constraints such
as area, performance and energy.
As a programming model for SmartCam solutions, we propose a framework based
on algorithmic skeletons. An algorithmic skeleton implements an image processing
operation for a specific SmartCam architecture, hiding the parallelism for
the programmer. Algorithmic skeletons provide ease of programming and code
portability at the cost of only a small performance loss.
As mentioned for image processing applications, SIMD architectures can be very
efficient. However, one of the problems in current SIMD processors is efficient
inter-PE communication. Often the PEs of an SIMD processor are only locally
connected (LC-SIMD). This may result in a communication bottleneck (many
communication operations are needed). One way to solve this is to use a fully
connected communication network between PEs (FC-SIMD). However, this solution
leads to an excessive communication area cost, low communication network
utilization, and scalability problems.
In this thesis, we introduce a new type of SIMD architecture, called RC-SIMD,
with a run-time reconfigurable communication network. It uses a delay-line in
the instruction bus, causing the accesses to the communication network to be
distributed over time. This architecture requires only a very cheap communication
network (the area overhead is about 10-12% in comparison with LC-SIMD) while
performing much better than LC-SIMD and often the same as expensive FC-SIMD
architectures.
An additional problem for the communication between PEs is the fact that the
SIMD concept does not match with variable distance communication between PEs.
If a particular PE needs to communicate with another PE at a certain distance,
all PEs need to communicate with the same distance (due to the SIMD concept).
Therefore, traditional SIMD processors can not implement efficiently certain applications,
like lens distortion compensation. In this thesis, we consider two variants
of the communication infra-structure of SIMD processors that enable dynamic
distance communication of pixel data (called DC-SIMD). The results show that
variable distance communication can be achieved at a reasonable cost of about
30% in area and substantial performance improvement (67.8% for lens distortion
compensation). Thus DC-SIMD processors provide for certain algorithms a good
alternative compared to ILP or general-purpose processors.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 21 Mar 2007 |
Place of Publication | Eindhoven |
Publisher | |
Print ISBNs | 978-90-386-19-83-5 |
DOIs | |
Publication status | Published - 2007 |