We present a design technique for (near) subthreshold operation that achieves ultra low energy dissipation at throughputs of up to 100 MB/s suitable for digital consumer electronic applications. Our approach employs i) architecture-level parallelism to compensate throughput degradation, ii) a configurable V T balancer to mitigate the V T mismatch of nMOS and pMOS transistors operating in sub/near threshold, and iii) a fingered-structured parallel transistor that exploits V T mismatch to improve current drivability. Additionally, we describe the selection procedure of the standard cells and how they were modified for higher reliability in the subthreshold regime. All these concepts are demonstrated using SubJPEG, a 1.4 Ã¿1.4 mm2 65 nm CMOS standard-V T multi-standard JPEG co-processor. Measurement results of the discrete cosine transform (DCT) and quantization processing engines, operating in the subthreshold regime, show an energy dissipation of only 0.75 pJ per cycle with a supply voltage of 0.4 V at 2.5 MHz. This leads to 8.3Ã¿ energy reduction when compared to using a 1.2 V nominal supply. In the near-threshold regime the energy dissipation is 1.0 pJ per cycle with a 0.45 V supply voltage at 4.5 MHz. The system throughput can meet 15 fps 640 Ã¿ 480 pixel VGA standard. Our methodology is largely applicable to designing other sound/graphic and streaming processors.