Processor architecture design for smart cameras

H. Fatemi

Research output: ThesisPhd Thesis 1 (Research TU/e / Graduation TU/e)Academic

Abstract

Many networked embedded systems combine sensing using cameras with processing to achieve certain communication, measurement or control goals. Video Camcorders, web cameras and video phones are examples of products where the combination of image sensing, digital storage and transmission is penetrating the mass electronics market. Other applications can be found in inspection, surveillance and robotic applications. Many of these applications easily require tens of billions of arithmetic operations per second of sustained performance, yet also have tight power constraints in many systems. These requirements make the design very challenging. Often, digital signal processors or general-purpose microprocessors are used for these applications, but the field of image processing allows for many architectural optimizations, such as the use of single instruction multiple data (SIMD) processors for pixel-level operations, and instruction level parallelism (ILP) processors for feature-extraction and object-based operations. In this dissertation, we foresee a further integration, resulting in a combination of at least one or more sensors, SIMD processors and ILP processors. The result is a low-cost smart camera (socalled SmartCam) solution. Constraints such as processing speed, power consumption and cost vary wildly between applications, and thus there is no single solution that fits all needs. We are interested in quantifying the design flow of application-specific smart cameras via the use of simulation and analysis in a design space exploration (DSE) environment, and in the development of an intuitive programming model. It is totally unclear what the right architectural parameters are for a given application domain. There are many parameters, like number of processing elements (PEs) in SIMD processors, number of SIMD processors, number of ILP processors, inter-PE communication organization, number of arithmetic logic units (ALUs) in each PE, etc. For finding appropriate values for these parameters, we propose a DSE framework to find an efficient architecture for a SmartCam with respect to constraints such as area, performance and energy. As a programming model for SmartCam solutions, we propose a framework based on algorithmic skeletons. An algorithmic skeleton implements an image processing operation for a specific SmartCam architecture, hiding the parallelism for the programmer. Algorithmic skeletons provide ease of programming and code portability at the cost of only a small performance loss. As mentioned for image processing applications, SIMD architectures can be very efficient. However, one of the problems in current SIMD processors is efficient inter-PE communication. Often the PEs of an SIMD processor are only locally connected (LC-SIMD). This may result in a communication bottleneck (many communication operations are needed). One way to solve this is to use a fully connected communication network between PEs (FC-SIMD). However, this solution leads to an excessive communication area cost, low communication network utilization, and scalability problems. In this thesis, we introduce a new type of SIMD architecture, called RC-SIMD, with a run-time reconfigurable communication network. It uses a delay-line in the instruction bus, causing the accesses to the communication network to be distributed over time. This architecture requires only a very cheap communication network (the area overhead is about 10-12% in comparison with LC-SIMD) while performing much better than LC-SIMD and often the same as expensive FC-SIMD architectures. An additional problem for the communication between PEs is the fact that the SIMD concept does not match with variable distance communication between PEs. If a particular PE needs to communicate with another PE at a certain distance, all PEs need to communicate with the same distance (due to the SIMD concept). Therefore, traditional SIMD processors can not implement efficiently certain applications, like lens distortion compensation. In this thesis, we consider two variants of the communication infra-structure of SIMD processors that enable dynamic distance communication of pixel data (called DC-SIMD). The results show that variable distance communication can be achieved at a reasonable cost of about 30% in area and substantial performance improvement (67.8% for lens distortion compensation). Thus DC-SIMD processors provide for certain algorithms a good alternative compared to ILP or general-purpose processors.
LanguageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Department of Electrical Engineering
Supervisors/Advisors
  • Corporaal, Henk, Promotor
  • Basten, Twan, Copromotor
  • Mesman, Bart, Copromotor
Award date21 Mar 2007
Place of PublicationEindhoven
Publisher
Print ISBNs978-90-386-19-83-5
DOIs
StatePublished - 2007

Fingerprint

Cameras
Processing
Communication
Telecommunication networks
Costs
Lenses
Image processing
Pixels
Digital storage
Electric delay lines
Digital signal processors
Video cameras
Embedded systems
Microprocessor chips
Scalability
Feature extraction
Robotics
Electric power utilization
Inspection
Sensors

Cite this

Fatemi, H. (2007). Processor architecture design for smart cameras Eindhoven: Technische Universiteit Eindhoven DOI: 10.6100/IR624513
Fatemi, H.. / Processor architecture design for smart cameras. Eindhoven : Technische Universiteit Eindhoven, 2007. 143 p.
@phdthesis{7af1b7597b7b4db389c51cbac2f29f85,
title = "Processor architecture design for smart cameras",
abstract = "Many networked embedded systems combine sensing using cameras with processing to achieve certain communication, measurement or control goals. Video Camcorders, web cameras and video phones are examples of products where the combination of image sensing, digital storage and transmission is penetrating the mass electronics market. Other applications can be found in inspection, surveillance and robotic applications. Many of these applications easily require tens of billions of arithmetic operations per second of sustained performance, yet also have tight power constraints in many systems. These requirements make the design very challenging. Often, digital signal processors or general-purpose microprocessors are used for these applications, but the field of image processing allows for many architectural optimizations, such as the use of single instruction multiple data (SIMD) processors for pixel-level operations, and instruction level parallelism (ILP) processors for feature-extraction and object-based operations. In this dissertation, we foresee a further integration, resulting in a combination of at least one or more sensors, SIMD processors and ILP processors. The result is a low-cost smart camera (socalled SmartCam) solution. Constraints such as processing speed, power consumption and cost vary wildly between applications, and thus there is no single solution that fits all needs. We are interested in quantifying the design flow of application-specific smart cameras via the use of simulation and analysis in a design space exploration (DSE) environment, and in the development of an intuitive programming model. It is totally unclear what the right architectural parameters are for a given application domain. There are many parameters, like number of processing elements (PEs) in SIMD processors, number of SIMD processors, number of ILP processors, inter-PE communication organization, number of arithmetic logic units (ALUs) in each PE, etc. For finding appropriate values for these parameters, we propose a DSE framework to find an efficient architecture for a SmartCam with respect to constraints such as area, performance and energy. As a programming model for SmartCam solutions, we propose a framework based on algorithmic skeletons. An algorithmic skeleton implements an image processing operation for a specific SmartCam architecture, hiding the parallelism for the programmer. Algorithmic skeletons provide ease of programming and code portability at the cost of only a small performance loss. As mentioned for image processing applications, SIMD architectures can be very efficient. However, one of the problems in current SIMD processors is efficient inter-PE communication. Often the PEs of an SIMD processor are only locally connected (LC-SIMD). This may result in a communication bottleneck (many communication operations are needed). One way to solve this is to use a fully connected communication network between PEs (FC-SIMD). However, this solution leads to an excessive communication area cost, low communication network utilization, and scalability problems. In this thesis, we introduce a new type of SIMD architecture, called RC-SIMD, with a run-time reconfigurable communication network. It uses a delay-line in the instruction bus, causing the accesses to the communication network to be distributed over time. This architecture requires only a very cheap communication network (the area overhead is about 10-12{\%} in comparison with LC-SIMD) while performing much better than LC-SIMD and often the same as expensive FC-SIMD architectures. An additional problem for the communication between PEs is the fact that the SIMD concept does not match with variable distance communication between PEs. If a particular PE needs to communicate with another PE at a certain distance, all PEs need to communicate with the same distance (due to the SIMD concept). Therefore, traditional SIMD processors can not implement efficiently certain applications, like lens distortion compensation. In this thesis, we consider two variants of the communication infra-structure of SIMD processors that enable dynamic distance communication of pixel data (called DC-SIMD). The results show that variable distance communication can be achieved at a reasonable cost of about 30{\%} in area and substantial performance improvement (67.8{\%} for lens distortion compensation). Thus DC-SIMD processors provide for certain algorithms a good alternative compared to ILP or general-purpose processors.",
author = "H. Fatemi",
year = "2007",
doi = "10.6100/IR624513",
language = "English",
isbn = "978-90-386-19-83-5",
publisher = "Technische Universiteit Eindhoven",
school = "Department of Electrical Engineering",

}

Fatemi, H 2007, 'Processor architecture design for smart cameras', Doctor of Philosophy, Department of Electrical Engineering, Eindhoven. DOI: 10.6100/IR624513

Processor architecture design for smart cameras. / Fatemi, H.

Eindhoven : Technische Universiteit Eindhoven, 2007. 143 p.

Research output: ThesisPhd Thesis 1 (Research TU/e / Graduation TU/e)Academic

TY - THES

T1 - Processor architecture design for smart cameras

AU - Fatemi,H.

PY - 2007

Y1 - 2007

N2 - Many networked embedded systems combine sensing using cameras with processing to achieve certain communication, measurement or control goals. Video Camcorders, web cameras and video phones are examples of products where the combination of image sensing, digital storage and transmission is penetrating the mass electronics market. Other applications can be found in inspection, surveillance and robotic applications. Many of these applications easily require tens of billions of arithmetic operations per second of sustained performance, yet also have tight power constraints in many systems. These requirements make the design very challenging. Often, digital signal processors or general-purpose microprocessors are used for these applications, but the field of image processing allows for many architectural optimizations, such as the use of single instruction multiple data (SIMD) processors for pixel-level operations, and instruction level parallelism (ILP) processors for feature-extraction and object-based operations. In this dissertation, we foresee a further integration, resulting in a combination of at least one or more sensors, SIMD processors and ILP processors. The result is a low-cost smart camera (socalled SmartCam) solution. Constraints such as processing speed, power consumption and cost vary wildly between applications, and thus there is no single solution that fits all needs. We are interested in quantifying the design flow of application-specific smart cameras via the use of simulation and analysis in a design space exploration (DSE) environment, and in the development of an intuitive programming model. It is totally unclear what the right architectural parameters are for a given application domain. There are many parameters, like number of processing elements (PEs) in SIMD processors, number of SIMD processors, number of ILP processors, inter-PE communication organization, number of arithmetic logic units (ALUs) in each PE, etc. For finding appropriate values for these parameters, we propose a DSE framework to find an efficient architecture for a SmartCam with respect to constraints such as area, performance and energy. As a programming model for SmartCam solutions, we propose a framework based on algorithmic skeletons. An algorithmic skeleton implements an image processing operation for a specific SmartCam architecture, hiding the parallelism for the programmer. Algorithmic skeletons provide ease of programming and code portability at the cost of only a small performance loss. As mentioned for image processing applications, SIMD architectures can be very efficient. However, one of the problems in current SIMD processors is efficient inter-PE communication. Often the PEs of an SIMD processor are only locally connected (LC-SIMD). This may result in a communication bottleneck (many communication operations are needed). One way to solve this is to use a fully connected communication network between PEs (FC-SIMD). However, this solution leads to an excessive communication area cost, low communication network utilization, and scalability problems. In this thesis, we introduce a new type of SIMD architecture, called RC-SIMD, with a run-time reconfigurable communication network. It uses a delay-line in the instruction bus, causing the accesses to the communication network to be distributed over time. This architecture requires only a very cheap communication network (the area overhead is about 10-12% in comparison with LC-SIMD) while performing much better than LC-SIMD and often the same as expensive FC-SIMD architectures. An additional problem for the communication between PEs is the fact that the SIMD concept does not match with variable distance communication between PEs. If a particular PE needs to communicate with another PE at a certain distance, all PEs need to communicate with the same distance (due to the SIMD concept). Therefore, traditional SIMD processors can not implement efficiently certain applications, like lens distortion compensation. In this thesis, we consider two variants of the communication infra-structure of SIMD processors that enable dynamic distance communication of pixel data (called DC-SIMD). The results show that variable distance communication can be achieved at a reasonable cost of about 30% in area and substantial performance improvement (67.8% for lens distortion compensation). Thus DC-SIMD processors provide for certain algorithms a good alternative compared to ILP or general-purpose processors.

AB - Many networked embedded systems combine sensing using cameras with processing to achieve certain communication, measurement or control goals. Video Camcorders, web cameras and video phones are examples of products where the combination of image sensing, digital storage and transmission is penetrating the mass electronics market. Other applications can be found in inspection, surveillance and robotic applications. Many of these applications easily require tens of billions of arithmetic operations per second of sustained performance, yet also have tight power constraints in many systems. These requirements make the design very challenging. Often, digital signal processors or general-purpose microprocessors are used for these applications, but the field of image processing allows for many architectural optimizations, such as the use of single instruction multiple data (SIMD) processors for pixel-level operations, and instruction level parallelism (ILP) processors for feature-extraction and object-based operations. In this dissertation, we foresee a further integration, resulting in a combination of at least one or more sensors, SIMD processors and ILP processors. The result is a low-cost smart camera (socalled SmartCam) solution. Constraints such as processing speed, power consumption and cost vary wildly between applications, and thus there is no single solution that fits all needs. We are interested in quantifying the design flow of application-specific smart cameras via the use of simulation and analysis in a design space exploration (DSE) environment, and in the development of an intuitive programming model. It is totally unclear what the right architectural parameters are for a given application domain. There are many parameters, like number of processing elements (PEs) in SIMD processors, number of SIMD processors, number of ILP processors, inter-PE communication organization, number of arithmetic logic units (ALUs) in each PE, etc. For finding appropriate values for these parameters, we propose a DSE framework to find an efficient architecture for a SmartCam with respect to constraints such as area, performance and energy. As a programming model for SmartCam solutions, we propose a framework based on algorithmic skeletons. An algorithmic skeleton implements an image processing operation for a specific SmartCam architecture, hiding the parallelism for the programmer. Algorithmic skeletons provide ease of programming and code portability at the cost of only a small performance loss. As mentioned for image processing applications, SIMD architectures can be very efficient. However, one of the problems in current SIMD processors is efficient inter-PE communication. Often the PEs of an SIMD processor are only locally connected (LC-SIMD). This may result in a communication bottleneck (many communication operations are needed). One way to solve this is to use a fully connected communication network between PEs (FC-SIMD). However, this solution leads to an excessive communication area cost, low communication network utilization, and scalability problems. In this thesis, we introduce a new type of SIMD architecture, called RC-SIMD, with a run-time reconfigurable communication network. It uses a delay-line in the instruction bus, causing the accesses to the communication network to be distributed over time. This architecture requires only a very cheap communication network (the area overhead is about 10-12% in comparison with LC-SIMD) while performing much better than LC-SIMD and often the same as expensive FC-SIMD architectures. An additional problem for the communication between PEs is the fact that the SIMD concept does not match with variable distance communication between PEs. If a particular PE needs to communicate with another PE at a certain distance, all PEs need to communicate with the same distance (due to the SIMD concept). Therefore, traditional SIMD processors can not implement efficiently certain applications, like lens distortion compensation. In this thesis, we consider two variants of the communication infra-structure of SIMD processors that enable dynamic distance communication of pixel data (called DC-SIMD). The results show that variable distance communication can be achieved at a reasonable cost of about 30% in area and substantial performance improvement (67.8% for lens distortion compensation). Thus DC-SIMD processors provide for certain algorithms a good alternative compared to ILP or general-purpose processors.

U2 - 10.6100/IR624513

DO - 10.6100/IR624513

M3 - Phd Thesis 1 (Research TU/e / Graduation TU/e)

SN - 978-90-386-19-83-5

PB - Technische Universiteit Eindhoven

CY - Eindhoven

ER -

Fatemi H. Processor architecture design for smart cameras. Eindhoven: Technische Universiteit Eindhoven, 2007. 143 p. Available from, DOI: 10.6100/IR624513