## 16.5 A 230mV-to-500mV 375KHz-to-16MHz 32b RISC Core in 0.18μm CMOS

Jinn-Shyan Wang, Jian-Shiun Chen, Yi-Ming Wang, Chingwei Yeh

National Chung-Cheng University, Chia-Yi, Taiwan

Aggressive voltage scaling to 0.5V and below is gaining attention due to the interest of using solar cells for portable systems. Static swapped-body biasing [1] and ultra-dynamic voltage scaling (UDVS) [2] are two representative designs in this respect. However, neither of them offers comprehensive treatment of the related design problems. This paper presents a 230-to-500mV 32b RISC core design using the ultra-low voltage CMOS (ULV-CMOS) technique. ULV-CMOS is built from the SLVCMOS scheme [3] with several add-on techniques. Compared to UDVS, which also has active and sleep operations, ULV-CMOS achieves 125× and  $6.7\times$  performance improvement when V<sub>DD</sub> is 250mV and 500mV, respectively.

Figure 16.5.1 shows the architecture of ULV-CMOS. For combinational circuits that can be fully turned off in sleep mode, low- $V_{T}$  devices are used under the super cut-off scheme [4], which is implemented with a PMOS power switch overdriven by 0.4V via a charge pump. For flip-flops whose contents have to be preserved during sleep, a sleepless design [3] with carefully chosen (mixed) V's is adopted so that the backup hardware and the negative power supply (V<sub>SS</sub>) required by [4] are both eliminated. On top of Vt setting and power gating, a dynamic NP-swappable body bias (D-NPSBB) scheme is employed on the cell-based core circuits and power switches, as well as a handcrafted clock driver. In sleep mode, the D-NPSBB assumes a zero bias. At the same time, a charge pump intermittently generates a voltage higher than V<sub>DD</sub> to overdrive the power switch and the clocked PMOS transistors to reduce leakage. When entering the active mode, the N(P)well body of the active circuitry is switched immediately to GND  $(V_{DD})$  to form a forward-bias for performance increase.

The forward bias of the low-V<sub>t</sub> power switch and the core circuits in active mode is especially useful for power gating under ultralow voltage. As is empirically verified, such a forward bias pushes voltage scaling to a deeper extent and delivers better performance as compared to UDVS where both the high-V<sub>T</sub> power-switch and low-V<sub>T</sub> core circuits have no body bias. Furthermore, the D-NPSBB scheme enable ULV-CMOS to reduce the leakage current dramatically in sleep mode.

Figure 16.5.2 illustrates the schematic design, cell layout, and silicon cross-sectional view of the D-NPSBB scheme. The swappable body biases,  $V_{NW}$ , and,  $V_{PW}$ , are generated from the sleep control signal  $\overline{slpd}$ . Also, a multi-rail layout style is uses one Metal3 (M3) for  $V_{DD}$  rail, one M3 rail for virtual  $V_{DD}$  ( $V_{DDV}$ ), one M1 rail for  $V_{NW}$  in the top area, one M3 rail for GND, and one M1 rail for  $V_{PW}$  in the bottom area. With a typical deep-N-well CMOS process, the above layout style allows direct abutment of gates and flip-flops such that a cell-based design flow is leveraged to ease the design process. Furthermore, to keep the integrity of high- $V_{DD}$  I/O circuitry and to avoid a large DC current induced by the forward bias, the D-NPSBB scheme is confined within the core circuits, the power switch, and the handcrafted clock driver that do not reside in the deep-N-well regions. Therefore, the N-well and P-well of the I/O circuitry are tied directly to  $V_{DD}$  and GND, respectively.

Since the delay of a gate operating in the sub-threshold region is greatly influenced by the supply voltage, the D-NPSBB scheme has a dramatic affect on speed. Figure 16.5.3 shows post-layout evaluation of this scheme for a 2-input NAND gate (NAND2) and a flip-flop designed in a 0.18µm mixed-signal CMOS process. To take the size of the power switch into account, we adopt the same test circuit as in [3] to evaluate different schemes with the same CMOS process. As stated earlier, the D-NPSBB scheme is used with power gating for combinational circuits. The power switch incurs a delay penalty since it introduces extra resistance and capacitance along the otherwise straight connection. This is shown in the non-zero delay penalty of UDVS and SLVCMOS. In contrast, with the performance boost produced by D-NPSBB, the ULV-CMOS gains back even more than that lost from power switches. As a comparison, when a 28µm power switch is used under a supply voltage of 500mV, the UDVS and SLVCMOS have delay penalties of 25.95% and 3.96%, respectively. Yet the ULV-CMOS produces a delay that is 20.25% faster than the plain design with no power switch. The flip-flop case has a similar performance gain. Moreover, the UDVS NAND2 cell and flip-flop fail to work at 230mV and 260mV, respectively, when implemented in the same CMOS process.

Figure 16.5.4 depicts the clock generator of the SLVCMOS. The problem with the design is that there is a DC path when  $V_{\rm p}$  is pumped to more than  $V_{\rm DDV}$  + 0.2V. To eliminate the DC current, we propose a split-style clock generator, as shown in the lower left part of Fig. 16.5.4. The clock generator consists of three clock drivers serving the sleep mode, the wakeup period, and the active mode, respectively. To keep the integrity of  $V_{\rm p}$  in sleep mode with the minimum power consumption, the two PMOSs corresponding to wakeup and active modes are strongly turned off with  $V_{\rm p}$ -driven buffers. With the above arrangements, the  $V_{\rm p}$  is pumped up to a level higher than that of SLVCMOS, improving the leakage reduction capability of the power switch and all clocked transistors.

The measured waveform of the chip operating under 230mV, shown in Fig. 16.5.5, validates the correctness of the ULV-CMOS technique. With such a low voltage, the power supply fluctuation is quite noticeable—62mV on  $V_{\rm DDV}$  and 33mV on  $V_{\rm DD}$ . Due to the resistance effect of the power switch,  $V_{\rm DDV}$  is always lower than  $V_{\rm DDV}$ . Nevertheless, the power supply fluctuation on  $V_{\rm DDV}$  is almost twice that on  $V_{\rm DD}$ . The combined effect of low supply voltage and large voltage fluctuation on  $V_{\rm DDV}$  suggests that the flipflops should be connected to  $V_{\rm DD}$ . The leakage current in sleep mode under various supply voltages is also shown in Fig. 16.5.5. With the power switch, as much as 37.34  $\times$  (14.61  $\times$ ) leakage reduction with  $V_{\rm DD}$  = 500mV (= 230mV) is achieved. The data confirm the necessity of power gating for leakage reduction in ultralow voltage.

The performance of ULV-CMOS is compared with SLVCMOS because both are applied on a 32b RISC core. For completeness, UDVS is also included in the comparison since the reported implementation, a 32-bit accumulator, has a critical path comparable to or shorter than a 32b RISC core. Figure 16.5.6 shows the comparison of frequencies for different  $V_{DD}$  settings. ULV-CMOS displays a clear and uniform advantage over the other two schemes. When operating at the highest voltage of 500mV, ULV-CMOS delivers 16MHz performance, which is at least  $3\times$  faster than the other schemes. When operating UDVS at its lowest voltage of 250mV, the performance gain of ULV-CMOS is even more dramatic—a 125× improvement in speed is achieved. Finally, the ULV-CMOS is able to reach the lowest voltage, 230mV, of all schems while still maintaining 375KHz performance.

## Acknowledgements:

We thank the National Science Council, the Ministry of Economic Affairs, and the National SoC Program of Taiwan for funding the research. We also thank Chip Implementation Center (CIC) for supporting chip fabrication.

## References:

[1] S. Narendra, J. Tschanz, J. Hofsheier, et. al., "Ultra-Low Voltage Circuits and Processor in 180nm to 90nm Technologies with a Swapped-Body Biasing Technique," *ISSCC Dig. Tech. Papers*, pp. 156-157, Feb., 2004.

[2] B. Calhoun and A. Chandrakasan, "Ultra-Dynamic Voltage Scaling Using Sub-Threshold Operation and Local Voltage Dithering in 90nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 300-301, Feb., 2005.
[3] J.-S. Wang, H.-Y. Li, C. Yeh, and T.-F. Chen, "Design Techniques for

[3] J.-S. Wang, H.-Y. Li, C. Yeh, and T.-F. Chen, "Design Techniques for Single-Low-V<sub>DD</sub> CMOS Systems," *IEEE J. Solid-State Circuits*, pp. 1157-1165, vol. 40, no. 5, May, 2005.

[4] H. Kawaguchi, K.-I. Nose, and T. Sakurai, "A CMOS Scheme for 0.5V Supply Voltage with Pico-Ampere Standby Current," ISSCC Dig. Tech. Papers, pp. 192-193, Feb., 1998.



