## **Energy Reduction by Built-in Body Biasing with Single Supply Voltage Operation**

Norihiro Kamae, A.K.M. Mahfuzul Islam, Akira Tsuchiya, Tohru Ishihara, and Hidetoshi Onodera Department of Communications and Computer Engineering, Kyoto University, Kyoto 606-8501, Japan email: {kamae,mahfuz,tsuchiya,ishihara,onodera}@vlsi.kuee.kyoto-u.ac.jp

#### ABSTRACT

Energy-efficiency has become the driving force of today's LSI industry. In order to achieve minimum energy operation of LSI, we propose a built-in body biasing technique which generates independent body biases for nMOSFET and pMOSFET separately. We design and fabricate an application circuit integrated with our proposed built-in body bias generation (BBG) circuits in a 65-nm process. The application circuit consists of AES cipher and decipher modules. The BBG does not require an external supply and it is compatible with a dynamic voltage scaling scheme for the application circuit. Cell-based design of the BBG circuit has been applied to facilitate automatic place and route. Both of the AES and the BBG circuits have been routed simultaneously to reduce design and area overhead. In post-silicon, supply voltage and body bias voltages are selected to achieve the minimum energy consumption for a target frequency. From the measurement results, more than 20% of energy reduction is achieved compared with adjusting supply voltage alone.

## KEYWORDS

Body biasing, energy-per-cycle reduction, adaptive body biasing.

## I. Introduction

A system on chip (SoC) includes several processing units and functional blocks for performing various tasks. Those blocks differ from each other in terms of operation speed, activity, temperature, etc. It is known that the optimum set of supply voltage and threshold voltage of a circuit depends on these parameters. However, often a designer has no choice but to use transistors with default threshold voltages set by the foundries. It is therefore common to use the fixed threshold voltage with the nominal supply, although a much lower supply voltage with other threshold values could achieve the target operation frequency. These design practices lead to large unnecessary energy consumption, therefore there is a strong need to develop new design strategies to minimize the energy losses as much as possible.

Several techniques have been proposed in order to meet the above mentioned need. One technique is to reduce the supply voltage to a point where the target frequency is achieved after a series of tests. This technique, often called as adaptive supply voltage (ASV), provides adequate supply voltage and thus reduces the unnecessary power consumption. ASV can also compensate for process variation as the supply voltage is

detected after the fabrication of the chip. Although ASV is a promising technique, it suffers from some drawbacks. One of the major drawbacks is that ASV may not necessarily result in energy reduction when there is large process skew. For example, when nMOSFET is faster and pMOSFET is slower, ASV fails to compensate these variations. Furthermore, ASV only adjusts the supply voltage, thus the threshold voltage of the devices are fixed which does not give us the optimum energy-efficient operation.

Adaptive body biasing (ABB) is another promising technique which adjusts the transistor threshold voltage by tuning the body voltage [1]. ABB is often applied as a mean to compensate process variations. However, ASV and ABB can be combined together to reduce energy consumption further than applying them individually [2]. Two concerns remain for the use of these techniques especially in a large SoC where different functional blocks require a different set of supply voltages and body bias voltages. One is the cost required for providing separate power sources for generating body voltages. Second is the power loss in the regulators required to generate several supply voltages for body biasing.

In order to overcome the above problem, we propose a built-in body biasing technique where the body voltages are generated inside the circuit block from the single core supply voltage. The proposed technique integrates the body bias generator (BBG) circuit with the target circuit. Cell-based design procedure is used for the BBG so that automated place and route can be performed. Cell-based design reduces area and design overhead thus results in a lower implementation cost. In order to adopt with the varying supply voltage, BBG capable of operating under wide supply voltage range is required. Furthermore, the BBG needs to be tuned depending on the target circuit area. BBG such as the one proposed in [3] is suitable for this implementation. However, [3] does not show how the integrated ASV and ABB technique using the BBG reduces the energy consumption of a practical application circuit.

In this paper, we focus on the energy reduction capability of applying built-in adaptive body biasing with the ASV technique. In order to evaluate real silicon behavior, we design and fabricate an application circuit consisting of AES cipher and decipher in a 65-nm process. We evaluate the energy consumption and maximum operation frequency to show the effect of body biasing. We integrate our proposed built-in body biasing circuits with the target circuit, demonstrate that energy



Fig. 1. Circuit model used for delay and energy evaluation. 50-stage fan-out 4 inverter chain is used as the circuit model.

consumption can be reduced by more than 20% by combining ASV and ABB compared with the ASV alone.

The remainder of the paper is organized as follows. In Section II, energy reduction of the body biasing is discussed using a simple inverter-chain based circuit model. In Section III, evaluation setup and measured results on real chip are described, and we conclude the paper in Section IV.

### II. BODY BIASING FOR ENERGY REDUCTION

### A. Simulation Setup

Analysis based on an inverter chain gives us a general understanding of circuit behavior such as delay and energy. In [4], an inverter chain with identical stages is used for exploring circuit performance. In this paper, an inverter chain of 50 identical stages with fan-out 4 is used to investigate the effects of ASV and ABB on circuit delay and energy. Fig. 1 shows the schematic of the inverter chain used in the simulation. SPICE simulation is performed using a commercial 65-nm process with a standard set of threshold voltages for nMOSFET and pMOSFET. A single rise transient signal is applied to the input of the inverter chain and the delay to propagate the signal to the output is evaluated. Energy is calculated by integrating the current consumed within the transfer time of the input signal. For circuit activity of  $\alpha$ , energy per cycle is calculated with the following equations.

$$E_{\rm dyn} = V_{\rm dd} \cdot \int_{t_0}^{t_0 + t_p} I_{\rm dd} \cdot dt, \tag{1}$$

$$E_{\text{leak}} = V_{\text{dd}} \cdot I_{\text{leak}} \cdot t_p,$$
 (2)

$$E_{\text{total}} = \alpha \cdot E_{\text{dvn}} + (1 - \alpha) \cdot E_{\text{leak}}.$$
 (3)

Here,  $t_0$  is the start time of the measurement and  $t_p$  is the propagation time of the input signal which is the minimum clock period for the correct operation of the circuit.  $E_{\rm dyn}$ ,  $E_{\rm leak}$ , and  $E_{\rm total}$  are dynamic, leakage, and total energy per cycle, respectively.

## B. Effect of ASV and ABB on Energy Reduction

Reducing the supply voltage is most effective to reduce the energy consumption as both of the dynamic power and leakage power are reduced. Combining ABB with ASV can reduce



Fig. 2. (a) Minimum supply voltage, and (b) energy per cycle for different body biases to achieve target frequency of 300 MHz.

the energy consumption further. Figure 2 shows the effect of body biasing on supply voltage and energy consumption. Target frequency is set to 300 MHz and the activity is assumed to be 10%. Figure 2 shows there exists an optimal set of supply voltage and body voltage, which minimizes the energy consumption of the circuit. The optimum set of supply voltage and body biases depends on the target frequency and activity. Table I shows the optimum set of supply voltage and body bias voltages for different target operation frequencies. Activity of 10% is assumed here. When ASV and ABB is combined, 27.7% of energy reduction can be achieved for target frequency of 10 MHz compared with the ASV technique alone. For ASV only, two tuning knobs are fixed and potential energy loss exists for some target frequencies. When the target operation speed is low, reverse body biasing (RBB) is required to achieve minimum energy per cycle since static leakage energy in a cycle becomes relatively large. When the target operation speed is high, forward body biasing (FBB) is required for minimum energy consumption where dynamic energy becomes dominant.

Simulation based on simple circuit model shows that large amount of leakage reduction is possible by applying ASV and ABB together. However, several points need to be addressed in order to make the technique feasible. Firstly, in a random logic, gates with various activity rates exist. For example, clock buffers and clocked elements have activity rate of 100%. Some logic gates may have very low activity rate whereas some may have high activity. Thus real silicon data with real application circuits is required to validate the effects. A simple circuit model may not represent the actual behavior and the effect of body biasing in real application. Secondly, energy loss occurring for generating the body bias voltages need to be considered.

This paper demonstrates the effect of combined ASV and ABB on energy reduction for a real application circuit consisting of AES cipher and decipher circuit. We propose a built-in approach for generating the body bias voltages so that design and area overhead is minimized.

TABLE I
OPTIMUM SUPPLY VOLTAGE AND BODY BIAS FOR MINIMUM ENERGY OPERATION. 10% ACTIVITY IS ASSUMED.

|              | ASV and ABB combined |           |       |              |       | ASV only     | Energy    |
|--------------|----------------------|-----------|-------|--------------|-------|--------------|-----------|
| Target freq. | nMOS bias            | pMOS bias | Vdd   | Energy/cycle | Vdd   | Energy/cycle | Reduction |
| [MHz]        | [V]                  | [V]       | [V]   | [fJ]         | [V]   | [fJ]         | [%]       |
| 10           | -0.265               | -0.495    | 0.415 | 1.71         | 0.420 | 2.36         | 27.7      |
| 25           | -0.050               | -0.290    | 0.450 | 2.01         | 0.440 | 2.34         | 14.1      |
| 50           | 0.050                | -0.175    | 0.480 | 2.27         | 0.465 | 2.38         | 4.38      |
| 100          | 0.160                | -0.020    | 0.510 | 2.64         | 0.525 | 2.69         | 2.15      |
| 200          | 0.295                | 0.125     | 0.550 | 3.21         | 0.605 | 3.45         | 6.82      |
| 500          | 0.395                | 0.235     | 0.670 | 4.74         | 0.755 | 5.29         | 10.3      |
| 1000         | 0.485                | 0.380     | 0.840 | 7.64         | 0.975 | 8.75         | 12.7      |



Fig. 3. (a) fine-grained body biasing, (b) conventional body biasing with additional supply voltage, and (c) built-in body biasing under single supply voltage



Fig. 4. Single supply voltage BBG which is capable of generating both RBB and FBB

## C. Single Supply Voltage Built-in Body Biasing

Figure 3 shows an example of implementation for our proposed built-in body biasing technique. An SoC chip is assumed here which contains several functional blocks. Each block has different area and requires different set of supply voltages and body bias voltages for minimum energy operation. As the area of each functional block differs, BBG in each of the blocks needs to be customized for the target area. Thus, the topology and design procedure of BBG is chosen such that it is parameterized, and can be implemented with cell-based design flow. Cell-based design flow enables seamless integration with the digital circuits resulting in lower design

and area overhead. The BBG is capable of generating both of the RBB and FBB. Furthermore, the BBG can operate at a wide supply voltage range so that optimum set of supply voltage and body bias voltages can be chosen. The BBG uses the core supply voltage of the block only so that no additional supply voltage is required. The proposed approach can be applied to the whole chip also where the BBG uses the power supply of the chip.

Detail of the BBG is given in [3]. The block diagram of the BBG is shown in Fig. 4. Thanks to the voltage isolation scheme, the BBG achieves both of the following requirements; cell-based design and wide output voltage range including both RBB and FBB. Every circuits in the core voltage region operates under the single supply voltage and it enables cell-based design of the BBG circuit integrated with the target digital circuit. To cover RBB voltage, which exceeds the voltage range of the single supply rail. two charge pumps (CP1 and CP2) bridge between the core voltage region and the high voltage region so that each block operates with the single supply voltage.

# III. REAL CHIP EVALUATION OF BODY BIASING EFFECT A. Test Chip Design

A test chip is fabricated with a 128-bit AES cipher/decipher core in a 65-nm low power triple-well bulk process. The block diagram of the AES cipher/decipher core is shown in Fig. 5. The micro-graph and layout of the chip is shown in Fig. 6. A BBG module is integrated with the AES cipher/decipher core. The area of the AES cipher/decipher core including the BBG and the comparison circuits is  $0.22\,\mathrm{mm}^2$ . The area required for the BBG is only  $0.0052\,\mathrm{mm}^2$ , thus the area overhead of the BBG is  $2.3\,\%$ . The BBG is capable of generating both of the RBB and FBB. The AES core also includes several delay monitors [5] to evaluate circuit performance and process variation. Physical layout of the BBG and AES core is designed by EDA tools as shown in Fig. 6. Table II summarizes the chip specification.

### B. Operation

A 128-bit linear feedback shift register (LFSR) is used to generate random texts to cipher and decipher. In order to



Fig. 5. AES, BBG, and function test block in the chip.



Fig. 6. Chip micro-graph and placement.

facilitate easy testing of the maximum operating frequency, the following functional testing is used. First, the input text is encoded by the cipher module and then the encoded code is decoded by the decipher module. The input text and the decoded text is then compared. If the input text and the decoded text match, correct operation is confirmed and a pass signal is generated which is captured outside the chip. The LFSR and the comparison block use a higher separate supply voltage to ensure correct operation. The supply voltage of the cipher/decipher is reduced until the first failure is

TABLE II SYNTHESIS CONSTRAINTS AND CHARACTERISTICS OF THE AES CIPHER

| Process         | 65 nm bulk CMOS                        |
|-----------------|----------------------------------------|
| Supply voltage  | $1.2\mathrm{V}$                        |
| Frequency       | $400\mathrm{MHz}$                      |
| Number of gates | 28k                                    |
| Area            | $0.6\mathrm{mm} \times 0.4\mathrm{mm}$ |



Fig. 7. Operating ranges of the AES, (a) pass without body biasing, (b) failed without body biasing but pass with body biasing, (c) failed with and without body biasing.

observed. Then body bias voltages and supply voltage are tuned accordingly to find the minimum energy consumption.

## C. Measurement Results

The AES core was synthesized using the foundry provided standard cell. The nominal supply voltage is 1.2 V. Maximum clock frequency achieved during the design is 400 MHz. Figure 7 shows the Shmoo plot of the AES core. (a) region in the Fig. 7 shows the passed combination of operation frequency and supply voltage. (a) + (b) region shows the area where correct operation has been achieved when built-in body biasing is also applied. Increase of operating frequency and



Fig. 8. Energy/cycle reduction by forward body bias.

lowering of supply voltage have been confirmed by applying body biasing. For example, the target operating frequency of 400 MHz is achieved by lowering the supply voltage down to 1.14 V when no bias is applied. However, when supply voltage and body bias is tuned simultaneously to find the minimum energy operation, the minimum supply voltage is reduced to 0.96 V. The optimum body bias value required is 0.6 V of forward bias in this case. We have used a bulk CMOS process and therefore 0.6 V is the maximum forward bias that can be applied.

For a fixed supply voltage, higher frequency can be achieved by applying body bias. In this case, for 1.0 V operation, frequency increase of 60% is achieved by forward body biasing.

In Fig. 7, we have confirmed the effect of body biasing on the increase of the frequency and lowering of the supply voltage. However, our main concern is the energy consumption which we want to minimize. Figure 8 shows the average energy per cycle measured for different frequency operation. At 400 MHz, the reduction rate of energy consumption with ASV and ABB combined is 20% compared with the ASV technique only. As only a single supply is used, the energy consumption shown here includes the energy consumption of the BBG. At 200 MHz of frequency operation, 24% of energy reduction is confirmed. Thus, it is confirmed that combining

ASV and ABB is highly effective in reducing overall energy consumption. The concept of our built-in body biasing is also validated and thus can be used in any part of an SoC.

## IV. CONCLUSION

A built-in body biasing techniques was proposed to enable single supply voltage operation. The body bias generation circuit was integrated with the AES cipher/decipher core. AES cipher/decipher core with the BBG was then designed and fabricated in a 65-nm bulk process. Test chip measurements of maximum operating frequency, minimum supply voltage and energy consumption of a AES cipher/decipher core circuit were demonstrated in the paper. Measurement results confirm that more than 20% of energy reduction is possible by combining ASV and ABB together. The proposed built-in ABB technique enables low cost implementation and eliminated the need for external power sources. The proposed technique can be used in different parts of an SoC as well as for the whole chip.

#### ACKNOWLEDGMENT

The VLSI test chips in this study has been fabricated in the chip fabrication program of VDEC, the University of Tokyo in collaboration with STARC, e-Shuttle, Inc., and Fujitsu Ltd.

#### REFERENCES

- [1] M. Miyazaki, G. Ono, and K. Ishibashi, "A 1.2-GIPS/W microprocessor using speed-adaptive threshold-voltage CMOS with forward bias," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 2, pp. 210–217, Feb 2002.
- [2] J. Tschanz, S. Narendra, and A. Keshavarzi, "Adaptive Circuit Techniques to Minimize Variation Impacts on Microprocessor Performance and Power," in *IEEE International Symposium on Circuits and Systems*, 2005, pp. 9–12.
- [3] N. Kamae, A. Tsuchiya, and H. Onodera, "A Body Bias Generator Compatible with Cell-based Design Flow for Within-die Variability Compensation," in *IEEE Asian Solid State Circuits Conference*, 2012, pp. 389–392.
- [4] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, "The Limit of Dynamic Voltage Scaling and Insomniac Dynamic Voltage Scaling," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 13, no. 11, pp. 1239–1252, 2005.
- [5] A. M. Islam and H. Onodera, "On-Chip Detection of Process Shift and Process Spread for Post-Silicon Diagnosis and Model-Hardware Correlation," *IEICE Transactions on Information and Systems*, vol. E96-D, no. 9, pp. 1971–1979, 2013.