Voltage scaling is one of the most effective and straightforward means for CMOS digital circuit’s energy reduction. Aggressive voltage scaling to the near or sub-threshold region helps achieving ultra-low energy consumption. However, it brings along big challenges to reach the required throughput and to have good tolerance of process variations. This thesis presents our research work in designing robust near/sub-threshold CMOS digital circuit. Our work has two features. First, unlike the other research work that uses subthreshold operation only for low-frequency low-throughput applications, we use architectural-level parallelism to compensate throughput degradation, so a medium throughput of up to 100MB/s suitable for digital consumer electronic applications can be achieved. Second, several new techniques are proposed to mitigate the yield degradation due to process variations. These techniques include: (a) Configurable V T balancer to control the V T spread. When facing process corners in the sub-threshold, our balancer will balance the V T of p/nMOS transistors through bulk-biasing. (b) Transistor sizing to combat V T mismatch between transistors. This is needed if the circuit needs to be operated with very deep sub-threshold supply voltage, i.e., below 250mV for 65nm CMOS standard V T process. (c) Improving sub-threshold drivability by exploiting the V T mismatch between parallel transistors. While the V T mismatch between parallel transistors is always known as notorious, we proposed to utilize it to boost the driving current in the sub-threshold. This interesting approach also suggests using multiple-finger layout style, which helps reducing silicon area considerably. (d) Selection procedure of the standard cells and how they were modified for higher reliability in the sub-threshold regime. Standard library cells that are sensitive to process variations must be eliminated in the synthesis flow. We provided the basic guideline to select "safe" cells. (e) The method that turns dangerous ratioed logic such as latch and register into non-ratioed logic. SubJPEG, an ultra low-energy multi-standard JPEG encoder co-processor with a sub/near threshold power supply is designed and implemented to demonstrate all these ideas. This 8-bit resolution DMA based co-processor has multiple power domains and multiple clock domains. It uses 4 parallel DCTQuantization engines in the data path. Instruction-level parallelism is also used. All the parallelism is implemented in an efficient manner so as to minimize the associated area overhead. Details about this co-processor architecture and implementation issues are covered in this thesis. The prototype chip is fabricated in TSMC 65nm 7-layer Low-Power Standard V T CMOS process. The core area is 1.4×1.4mm2. Each engine has its own V T balancer. Each V T balancer is 25×30µm2. The measurement results show that our V T balancer has very good balancing effect. In the sub-threshold mode the engines can operate with 2.5MHz clock frequency at 0.4V supply, with 0.75pJ energy per cycle per single engine for DCT and Quantization processing, i.e. 0.75pJ/(engine·cycle). This leads to 8.3× energy/(engine·cycle) reduction when compared to using a 1.2V nominal supply. In the near-threshold regime the energy dissipation is about 1.1pJ/(engine·cycle) with a 0.45V supply voltage at 4.5MHz. The system throughput can meet 15fps 640×480 pixel VGA compression standard. By further increasing the supply, the test chip can satisfy multi-standard image encoding. Our methodology is largely applicable to designing other sound/graphic and streaming processors.
|Qualification||Doctor of Philosophy|
|Award date||23 Sep 2009|
|Place of Publication||Eindhoven|
|Publication status||Published - 2009|