## DIGITAL ARITHMETIC

Miloš D. Ercegovac and Tomás Lang
with contributions by Elisardo Antelo and Fabrizio Lamberti
Morgan Kaufmann Publishers, an imprint of Elsevier, ©(c2004

## Chapter 2: Solutions to Exercises

## Exercise 2.1

Assuming that $c_{i}$ is connected to the XOR input with load factor 1.1 (Fig. $2.5(\mathrm{c})$ ), the average delay of the carry-out is
$T_{1}=t_{N A N D}(1)+t_{N A N D}(2.1)=0.07+0.033+0.07+0.033 \times 2.1=0.242 \mathrm{~ns}$
Adding an inverter and changing the XOR into XNOR, we obtain for the carry delay:

$$
T_{2}=t_{N A N D}(1)+t_{N A N D}(2)=0.239 n s
$$

This represents a $1.4 \%$ reduction in the carry delay. Note that the difference is very small because of the XOR input with load factor 1.1. A larger reduction would result if the XOR input load factors were symmetrical at 2.

## Exercise 2.2

$T_{32}=\max \left(t_{c-c}, t_{x y-c}\right)+30 t_{c-c}+\max \left(t_{c-s}, t_{c-c}\right)$ where, according to Table 2.2,
$t_{c-c}=0.38+0.03(1.3)=0.419 n s$
where 1.3 is the load of the carry-out signal.
$t_{x y-c}=0.72+0.03(1.3)=0.759 n s$
$t_{c-c_{o} u t}=0.38$ and $t_{c-s_{31}}=0.46(L=0$ for both outputs)
$T_{32}=0.76+30 \times 0.419+0.46 \approx 13.8 \mathrm{~ns}$

## Exercise 2.3

A radix- 4 full adder satisfies $x_{i}+y_{i}+c_{i}=4 c_{i+1}+z_{i}$ where $x_{i}, y_{i}, z_{i} \in$ $\{0,1,2,3\}$ and the carries are in $\{0,1\}$. The radix- 4 digits are encoded on two binary variables as $x_{i}=\left(x_{i, 1}, x_{i, 0}\right)$, etc. For simplicity the $x, y, z$ binary variables are denoted as $\left(x_{1}, x_{0}\right)$, etc.

The radix- 4 full adder is defined by the following switching expressions.

$$
\begin{aligned}
p_{j} & =x_{j} \oplus y_{j} ; j=0,1 \\
g_{j} & =x_{j} y_{j} ; j=0,1 \\
z_{0} & =p_{0} \oplus c_{i} \\
z_{1} & =p_{1} \oplus\left(g_{0}+p_{0} c_{i}\right)=p_{0} \oplus g_{0} \oplus p_{0} c_{i} \\
c_{i+1} & =g_{1}+p_{1} g_{0}+p_{1} p_{0} c_{i}
\end{aligned}
$$

The radix- 4 full adder and the radix- 2 2-bit adder are compared with respect to delays in the carry-in - carry-out path, and with respect to circuit size. The following gates are used:

| Type | Avg. delay | Eq. size |
| :--- | :--- | :---: |
| NOT | NA | 1 |
| NAND2 | $0.07+0.033 L$ | 1 |
| NAND3 | $0.08+0.039 L$ | 2 |
| XOR(XNOR) | NA | 3 |

The carry path delays are:

$$
\begin{aligned}
& T_{4}=t_{N A N D 3(L=1)}+t_{N A N D 3(L=3.1)}=0.32 n s \\
& T_{2}=2 \times\left(t_{N A N D 2(L=1)}+t_{N A N D 2(L=2.1)}\right)=0.48 n s
\end{aligned}
$$

where the loads are calculated as follows:
For T4: the second $N A N D$ goes to $c$ input of the next digit so $L=1+1+$ $1.1=3.1$

For $T 2$ : the output NAND goes to a NAND and to an XOR in the next bit-position so $L=1+1.1=2.1$

The sizes in equivalent gates are:

$$
\begin{aligned}
S_{4} & =5 \times X O R(X N O R)+N O T+A N D 2+3 \times N A N D 2+2 \times N A N D 3 \\
& =5 \times 3+1+2+3 \times 1+2 \times 2=25 \\
S_{2} & =2 \times(2 \times(N A N D 2+X O R(X N O R))+N A N D 2) \\
& =2 \times(2 \times(1+3)+1)=18
\end{aligned}
$$

## Exercise 2.4

$T_{S R A}=t_{s w}+(n-1) t_{p}+(n / m) t_{b u f}+t_{s}($ Expression $(2.27))$
$t_{s w}=\max \left(t_{g i}, t_{k i}, t_{p i}\right)+t_{N A N D-2(L=2)}=t_{p i}+t_{N A N D-2(L=2)}=0.329+$ $0.136=0.465 \mathrm{~ns}$
where, assuming a switch has one standard load,

$$
\begin{aligned}
& t_{g i}=t_{A N D-2}=0.16+0.027 \times 1=0.187 n s \\
& \quad t_{k i}=t_{N O R-2}=0.07+0.046 \times 1=0.116 n s \\
& \quad t_{p i}=t_{X O R-2}=0.30+0.029 \times 1=0.329 n s \\
& t_{p}=t_{N A N D-2}=0.07+0.033 \times 2=0.136 n s(L=2) \\
& t_{b u f}=1.5 \times 0.136=0.204 \\
& \left.t_{s}=0.46+0.03 \times L=0.46 n s \text { (Table 2.2, delay } c_{i} \text { to } s_{i} \text { with } L=0\right)
\end{aligned}
$$

Therefore,
$T_{S R A}=0.465+31 \times 0.136+8 \times 0.204+0.46 \approx 6.8 n s$
From Exercise 2.2, $T_{C R A}=13.8 n s$ so the SRA aproximately halves the delay. Note that to reduce the load the network for computing the sum bits uses separately obtained $p_{i}$ signals

## Exercise 2.5

Figure E2.5 shows the carry chains for the given operands.


Figure E2.5: Carry chains in carry-skip adder (Exercise 2.5).

## Exercise 2.6

We use the same assumption as in Exercise 2.2: $t_{p g}=t_{c}=t_{A N D-O R}=1$.
We assume $c_{i n}=0$

| Operation 1 | $x$ | 0000 | 0111 | 0000 | 1111 |
| :--- | :--- | ---: | ---: | ---: | ---: |
|  | $y$ | 1111 | 0000 | 1111 | 0001 |
|  | $P$ | $P_{3}=1$ | $P_{2}=0$ | $P_{1}=1$ | $P_{0}=0$ |
|  | $c$ | 00000 | 1111 | 1111 | $111-$ |
| Operation 2 | $x$ | 0000 | 0111 | 0000 | 1111 |
|  | $y$ | 1111 | 0000 | 1111 | 0000 |
|  | $P$ | $P_{3}=1$ | $P_{2}=0$ | $P_{1}=1$ | $P_{0}=1$ |
|  | $c$ | 00000 | 0000 | 0000 | $000-$ |

1. Although $P_{0}=1$ in second operation, it is necessary to wait for the propagation of the new carries in the first group $\left(t=5\right.$ to set $\left.c_{4}=0\right)$. Propagation through the OR adds 1 unit delay and, therefore, $t=6$.
2. The same occurs in the second group. Although $P_{1}=1$, it is necessary to wait 4 units until $c_{8}=0$, so $t=10$. Add 1 unit delay due to the OR gate and $t=11$.
3. A correct carry is propagated to the most significant group. Since there are no changes in this group, there is no additional delay.
4. The carry chain propagates through the third group up to $c_{11}(t=14)$. There is one additional delay to produce $s_{11}(t=15)$.

## Exercise 2.7

Expression (2.31) gives the worst-case delay of a carry-skip adder with fixedsize groups as

$$
T_{C S K}=(2 m-1) t_{c}+(n / m-1) t_{m u x}+t_{s}
$$

Differentiating this expression with respect to $m$ and making the derivative equal to 0 we get

$$
\frac{d T_{C S K}}{d m}=2 t_{c}-\frac{n}{m^{2}} t_{m u x}=0
$$

resulting in

$$
m_{o p t}=\left(\frac{n t_{m u x}}{2 t_{c}}\right)^{1 / 2}
$$

and

$$
T_{o p t} \approx 2 \times\left(2 n t_{m u x} t_{c}\right)^{1 / 2}=\left(8 n t_{m u x} t_{c}\right)^{1 / 2}
$$

## Exercise 2.8

a) $m=8$
$t_{C S K A}=(2 m-1) t_{c}+(n / m-1) t_{M U X}+t_{s}$
From Table 2.2:
$t_{c}=0.38+0.03 * L=0.38+0.03(1.3)=0.419 \mathrm{~ns}$
$t_{s}=0.46+0.03 * L=0.46$ assuming $L=0$
From Table 2.4:
$t_{M U X}=0.21+0.050 * L=0.21+0.050(1.8)=0.3 n s$ where $L=0.5(M U X)+$ $1.3\left(c_{i n}\right)=1.8$

Therefore,
$t_{C S K A}=(15) * 0.419+7 * 0.3+0.46=6.285+2.1+0.46=8.85 \mathrm{~ns}$
b) $m_{\text {opt }}=\left(\frac{t_{M U X}}{2 t_{c}} \cdot n\right)^{1 / 2}=\left(\frac{0.3}{2 \cdot 0.419} \cdot 64\right)^{1 / 2}=4.78$

To have a uniform block size which is a divisor of 64 and closest to $m_{\text {opt }}$ we choose $m=4$ (16 groups)
In this case, $t_{C S K A}=7 t_{c}+15 * t_{M U X}+t_{s}=7.9 n s$.
If we allow a non-uniform group size and choose $m=5$ which is the integer closest to $m_{\text {opt }}$, we have 12 groups of 5 bits and one of 4 bits:
$t_{C S K A}=5 t_{c}+t_{M U X}+10 * t_{M U X}+4 t_{c}+t_{s}=7.53 n s$
c) Use, for instance, the following sequence: 445668866544 . In this case the delay is:
$t_{C S K A}=4 t_{c}+t_{M U X}+10 * t_{M U X}+3 t_{c}+t_{s}=6.70 n s$

## Exercise 2.9

a) For a 9-bit adder we have two possible paths:
$b_{1} \rightarrow b_{2} \rightarrow S_{1} \rightarrow b_{3} \rightarrow S_{1} \rightarrow b_{6} \rightarrow b_{7} \rightarrow b_{8}=8 \delta$
$b_{0} \rightarrow S_{1} \rightarrow b_{3} \rightarrow S_{1} \rightarrow b_{6} \rightarrow b_{7} \rightarrow b_{8}=7 \delta$
So, the worst-case delay is $8 \delta$.
b) A 36-bit adder using blocks of 9-bit adders. The worst-case consists of the worst-case to obtain the carry-out of the least-significant group plus the maximum skip of intermediate groups plus the worst-case delay to obtain the sums in the most-significant group. That is,

```
[9-bit adder] <-- [9-bit adder] <-- [9-bit adder] <-- [9-bit adder]
worst-case sums max-skip max-skip worst-case carry
```

Worst-case delay of carry-out of l.s. group: $4 \delta$ (several paths, i.e. $b_{4}, b_{5}, M U X, M U X$ )
Delay of max-skip: $b_{0}+M U X+M U X=3 \delta$
Worst-case sums (from $c_{i n}$ of group) $=7 \delta$.
Total delay: $7 \delta+3 \delta+3 \delta+4 \delta=17 \delta$

## Exercise 2.10

(a) $T=m t_{c}+(s-1) t_{m u x}+(p-2) t_{m u x}+(s-1) t_{m u x}+(m-1) t_{c}+t_{s}$.
(b) Let $t_{c}=t_{m u x}$ and $m=s . T=\left(4 m-3+n / m^{2}\right) t_{c}+t_{s}$ and $m_{\text {opt }}=$ $(n / 2)^{1 / 3}$.

## Exercise 2.11

a) Cost and delay of $C L G-4(m=4)$
a) Since there are two passes through the module we design so as to reduce the critical path of both passes.
i) First pass:

The critical path goes from signals a,g to A,G. From Figure 2.14 we see that the critical output is G. We implement as follows:

$$
G=g 3+a 3 g 2+a 3 a 2 g 1+a 3 a 2 a 1 g 0=\left[g 3^{\prime} .(a 3 g 2)^{\prime} .(a 3 a 2 g 1)^{\prime} \cdot(a 3 a 2 a 1 a 0)^{\prime}\right]^{\prime}
$$

Since G is the input to the next level module, we buffer the output of the NAND-4 with two NOT gates.
Moreover,

$$
A=a 3 a 2 a 1 a 0=\left((a 3 a 2 a 1 a 0)^{\prime}\right)^{\prime}
$$

The other parts of the "upper" network are implemented in a similar way.
ii) For the second pass the critical path is c 0 to c 4 . We implement is as

$$
c 4=G+A c 0=\left(G^{\prime} .(A c 0)^{\prime}\right)^{\prime}
$$

The other carries are implemented in a similar way
So,
maximum fanin $=4$
maximum fanout $=7$ (this corresponds to output A , which goes to a1 of another module).
Equivalent gates:
NOT $1(\mathrm{~g} 3)+1(\mathrm{G})+1(\mathrm{~A})+4(\mathrm{Gi}$, for ci$)+8(\mathrm{ci})=15 \mathrm{EQ}=15$
NAND-2 13 EQ=13
NAND-3 $4 \mathrm{EQ}=8$
NAND-4 3 EQ=6
TOTAL EQ $=42$
critical path (pass 1) $t_{a 1, G}=$ NAND4(1) + NAND4(1) + NOT(1) + $\operatorname{NOT}(5)=0.16+0.16+0.07+0.18=0.57$
critical path (pass 2) $t_{c 0, c 4}=$ NAND2(1) + NAND2(1) + NOT(1) + $\operatorname{NOT}(4)=0.10+0.10+0.07+0.15=0.42$
b) Cost and delay of $C L G-8(m=8)$. Following the same implementation approach of part a) we get
maximum fanin $=8$
maximum fanout $=2 \times 7+1=15$

## Equivalent gates

```
NOT \(=1+1+1+8+16=27 \mathrm{EQ}=27\)
NAND2 \(=25 \mathrm{EQ}=25\)
NAND3 \(=8 \mathrm{EQ}=16\)
NAND4 \(=7 \mathrm{EQ}=14\)
NAND5 \(=6 \mathrm{EQ}=24\)
NAND6 \(=5 \mathrm{EQ}=25\)
NAND7 \(=4 \mathrm{EQ}=24\)
NAND8 \(=3 \mathrm{EQ}=18\)
TOTAL \(=173\)
critical path (pass 1) \(t_{a 1, G}=\) NAND8(1) + NAND8(1) + NOT(1) +
\(\mathrm{NOT}(9)=0.36+0.36+0.07+0.29=1.08\)
critical path (pass 2) \(t_{c 0, c 4}=\) NAND2(1) + NAND2(1) + NOT(1) +
\(\mathrm{NOT}(8)=0.10+0.10+0.07+0.26=0.53\)
```

As an illustration consider a 64 -bit adder using 4 -bit modules and 8 -bit modules. In the 4 -bit module case, a 3 -level CLA is used. The delay corresponds to
$\mathrm{t}(\mathrm{a}, \mathrm{g})+2 \mathrm{t}(\mathrm{G})+3 \mathrm{t}(\mathrm{c})+\mathrm{ts}=0.32+1.14+1.26+0.15=2.87$
where $\mathrm{t}(\mathrm{a}, \mathrm{g})$ is the delay of $\operatorname{NOR} 2(1)+\operatorname{NOT}(6)=0.12+0.21=0.32$ and ts is delay of $\operatorname{XOR}(2)$ with $\mathrm{L}=0$.

In the 8 -bit module case, a 2 -level CLA is used. The delay is
$\mathrm{t}(\mathrm{a}, \mathrm{g})+\mathrm{t}(\mathrm{G})+2 \mathrm{t}(\mathrm{c})+\mathrm{ts}=0.39+1.08+1.06+0.15=2.68$
where $\mathrm{t}(\mathrm{a}, \mathrm{g})$ is $\operatorname{NOR} 3(1)+\operatorname{NOT}(5)=0.21+0.18=0.39$ (use 3 NOT gates in parallel for the load of 14)

## Exercise 2.12

The 32 -bit adder consists of 32 PG modules to produce p's and g's, 8 BCLA modules to produce carries $c_{4 i+4}, i=0, \ldots, 7$, and 84 -bit CRA- 4 modules to produce the sum bits. Assuming the gates of Table 2.4, that a full-adder is implemented using the design of Figure 2.3(c), the delays of these modules are:

Module PG (XOR,AND): $t_{P G}=0.30+0.029 \times 4=0.42 n s$
Module CRA-4 (4 FAs): $t_{C P A-4}=2 t_{X O R}+6 t_{N A N D}=0.672+6 \times 0.107=$ 1.31 ns

Module BCLA (NAND-NAND): $t_{B C L A}=0.359+0.34+0.019 \times 2=0.74$
The worst case delay is

$$
\begin{aligned}
& T_{\text {sum }}=0.42+7 \times 0.74+1.31=6.91 \mathrm{~ns} \\
& T_{c_{\text {out }}}=0.42+8 \times 0.74=6.34 \mathrm{~ns}
\end{aligned}
$$

A 32-bit single-level carry-skip adder, with group size 4 , has the following worst-case delay: $T_{C S K}=7 \times t_{c}+7 \times t_{m u x}+t_{s}=7 \times 2 \times 0.107+7 \times$ $0.272+0.336=3.74 n s$

This indicates that the BCLA 32-bit adder is slower 1.85 times than the carry-skip alternative. Moreover, the BCLA module uses more gates than the skip network in the 32 -bit carry-skip adder with $m=4$. In terms of equivalent gates, this cost ratio is about 3 .

## Exercise 2.13

The $g_{i}$ and $a_{i}$ signals are

| $x$ | 0 | 1 | 0 | 1 |
| :---: | :---: | :---: | :---: | :---: |
| $y$ | 1 | 0 | 0 | 1 |
| $g_{i}$ | 0 | 0 | 0 | 1 |
| $a_{i}$ | 1 | 1 | 0 | 1 |

The expressions and values for the CLG-4 carries are

$$
\begin{aligned}
c_{0} & =1 \\
c_{1} & =g_{0}+a_{0} c_{0}=1+1 \cdot 1=1 \\
c_{2} & =g_{1}+a_{1} g_{0}+a_{1} a_{0} c_{0}=0+0 \cdot 1+0 \cdot 1 \cdot 1=0 \\
c_{3} & =g_{2}+a_{2} g_{1}+a_{2} a_{1} g_{0}+a_{2} a_{1} a_{0} c_{0}=0+1 \cdot 0+1 \cdot 0 \cdot 1+1 \cdot 0 \cdot 1 \cdot 1 \cdot 1=0 \\
c_{4} & =g_{3}+a_{3} g_{2}+a_{3} a_{2} g_{1}+a_{3} a_{2} a_{1} g_{0}+a_{3} a_{2} a_{1} a_{0} c_{0} \\
& =0+1 \cdot 0+1 \cdot 1 \cdot 0+1 \cdot 1 \cdot 0 \cdot 1+1 \cdot 1 \cdot 0 \cdot 1 \cdot 1=0
\end{aligned}
$$

## Exercise 2.14



## Exercise 2.15

A 64-bit, three-level carry-lookahead adder is shown in Figure E2.15.


Figure E2.15: 64-bit three-level carry-lookahead adder.

The critical path is
$\left(x_{0}, y_{0}\right) \rightarrow\left(A_{0}, G_{0}\right) \rightarrow\left(A_{3-0}, G_{3-0}\right) \rightarrow c_{48} \rightarrow c_{60} \rightarrow\left(A_{15}, G_{15}\right) \rightarrow\left(A_{15-12}, G_{15-12}\right) \rightarrow c_{64}$

## Exercise 2.16

The number of $C L G \mathrm{~s}$ is divided by $m$ from one level to the next. Therefore, at level $i$ there are $n / m^{i} C L G \mathrm{~s}$. The total number is obtained by adding from $i=1$ to $L$.

## Exercise 2.17

$$
\begin{aligned}
& n=128, m=4, t_{c l g}=t_{A G}=6 t_{a g}=3 t_{s} \\
& T_{1-C L A}=t_{a g}+(n / m) t_{c l g}+t_{s}=1+32 \times 6+2=195 t_{a g} \\
& T_{2-C L A}=t_{a g}+t_{A G}+\left(n / m^{2}\right) t_{c l g}+t_{c l g}+t_{s}=1+6+8 \times 6+6+2=63 t_{a g} \\
& T_{3-C L A}=t_{a g}+2 t_{A G}+\left(n / m^{3}\right) t_{c l g}+2 t_{c l g}+t_{s}=1+12+12+12+2=39 t_{a g}
\end{aligned}
$$

For the 4 -level CLA we use another level with a group size of 2 . Because of the smaller size of this group the delay of this level is smaller, we assume it to be $t_{c l g 2}=2 t_{a, g}$.

$$
T_{4-C L A}=t_{a g}+2 t_{A G 4}+t_{c l g 2}+3 t_{c l g}+t_{s}=1+12+2+18+2=35 t_{a g}
$$

## Exercise 2.18

a) From the definition $h_{i}=c_{i}+c_{i+1}=g_{i}+p_{i} c_{i}+c_{i}=g_{i}+c_{i}$. Moreover, $s_{i}=t_{i} \oplus h_{i}+g_{i} t_{i-1} h_{i-1}$.
1st term: $t_{i} \oplus h_{i}=t_{i}^{\prime} h_{i}+t_{i} h_{i}^{\prime}=\left(x_{i}+y_{i}\right)^{\prime}\left(x_{i} y_{i}+c_{i}\right)+\left(x_{i}+y_{i}\right)\left(x_{i} y_{i}+c_{i}\right)^{\prime}=$ $x_{i}^{\prime} y_{i}^{\prime} c_{i}+\left(x_{i} \oplus y_{i}\right) c_{i}^{\prime}$
2nd term: $g_{i} t_{i-1} h_{i-1}=g_{i}\left(x_{i-1}+y_{i-1}\right)\left(x_{i-1} y_{i-1}+c_{i-1}\right)=g_{i}\left(g_{i-1}+p_{i-1} c_{i-1}\right)=$ $g_{i} c_{i}=x_{i} y_{i} c_{i}$
Therefore,

$$
\begin{aligned}
s_{i} & =x_{i}^{\prime} y_{i}^{\prime} c_{i}+\left(x_{i}^{\prime} y_{i}+x_{i} y_{i}^{\prime}\right) c_{i}^{\prime}+x_{i} y_{i} c_{i} \\
& =x_{i}^{\prime} y_{i}^{\prime} c_{i}+x_{i}^{\prime} y_{i} c_{i}^{\prime}+x_{i} y_{i}^{\prime} c_{i}^{\prime}+x_{i} y_{i} c_{i} \\
& =c_{i}^{\prime}\left(x_{i} \oplus y_{i}\right)+c_{i}\left(x_{i} \oplus y_{i}\right)^{\prime}=c_{i} \oplus p_{i}
\end{aligned}
$$

b) Note that $t_{i} g_{i}=\left(x_{i}+y_{i}\right)\left(x_{i} y_{i}\right)=x_{i} y_{i}=g_{i}$. The switching expressions for the carries in a 4 -bit Ling adder we get

$$
\begin{aligned}
h_{0} & =g_{0}+c_{i n} \\
h_{1} & =g_{1}+g_{0}+t_{0} c_{i n} \\
h_{2} & =g_{2}+g_{1}+t_{1} g_{0}+t_{1} t_{0} c_{i n} \\
h_{3} & =\left(g_{3}+g_{2}\right)+t_{2} g_{1}+t_{2} t_{1} g_{0}+t_{2} t_{1} t_{0} c_{i n}
\end{aligned}
$$

Assuming a two-level implementation these expressions require the following gates: 2 AND-2, 2 AND-3, 1 AND-4, 2 OR-2, 1 OR-3, 2 OR-4. The largest fan-in is 4 .
The corresponding expressions for a conventional CLA described in the text require more gates and a larger fan-in: 4 AND-2, 3 AND-3, 2 AND-4, 1 AND-5, 1 OR-2, 1 OR-3, 1 OR-4, and 1 OR-5. The largest fan-in is 5.

## Exercise 2.19

The 8-bit prefix adder without a carry-in is shown in Figure E2.19.


Figure E2.19: Prefix adder for Exercise 2.19.

## Exercise 2.20

| $i$ | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $x_{i}$ |  | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 |
| $y_{i}$ |  | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 |
| $g_{i}$ |  | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 |
| $a_{i}$ |  | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
| $p_{i}$ |  | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |

Level 1 outputs:

$$
\begin{array}{lll}
g_{(0,-1)} & =1=c_{1} & \\
g_{(1,0)} & =g_{1}+a_{1} g_{0}=1, & a_{(1,0)}=a_{1} a_{0}=1 \\
g_{(2,1)} & =g_{2}+a_{2} g_{1}=1, & a_{(2,1)}=a_{2} a_{1}=1 \\
g_{(3,2)}=g_{3}+a_{3} g_{2}=0, & a_{(3,2)}=a_{3} a_{2}=0 \\
g_{(4,3)}=g_{4}+a_{4} g_{3}=0, & a_{(4,3)}=a_{4} a_{3}=0 \\
g_{(5,4)}=g_{5}+a_{5} g_{4}=0, & a_{(5,4)}=a_{5} a_{4}=1 \\
g_{(6,5)}=g_{6}+a_{6} g_{5}=1, & a_{(6,5)}=a_{6} a_{5}=1 \\
g_{(7,6)}=g_{7}+a_{7} g_{6}=1, & a_{(7,6)}=a_{7} a_{6}=1
\end{array}
$$

Level 2 outputs:

$$
\begin{aligned}
& g_{(1,-1)}=g_{(1,0)}+a_{(1,0)} c_{0}=1=c_{2} \\
& g_{(2,-1)}=g_{(2,1)}+a_{(2,1)} g_{(0,-1)}=1=c_{3} \\
& g_{(3,0)}=g_{(3,2)}+a_{(3,2)} g_{(1,0)}=0, \quad a_{(3,0)}=a_{(3,2)} a_{(1,0)}=0 \\
& g_{(4,1)}=g_{(4,3)}+a_{(4,3)} g_{(2,1)}=0, \quad a_{(4,1)}=a_{(4,3)} a_{(2,1)}=0 \\
& g_{(5,2)}=g_{(5,4)}+a_{(5,4)} g_{(3,2)}=0, \quad a_{(5,2)}=a_{(5,4)} a_{(3,1)}=0 \\
& g_{(6,3)}=g_{(6,5)}+a_{(6,5)} g_{(4,3)}=1, \quad a_{(6,3)}=a_{(6,5)} a_{(4,3)}=0 \\
& g_{(7,4)}=g_{(7,6)}+a_{(7,6)} g_{(5,4)}=1, \quad a_{(7,4)}=a_{(7,6)} a_{(5,4)}=1
\end{aligned}
$$

Level 3 outputs:

$$
\begin{aligned}
& c_{4}=g_{(3,0)}+a_{(3,0)} c_{0}=0 \\
& c_{5}=g_{(4,1)}+a_{(4,1)} g_{(0,-1)}=0 \\
& c_{6}=g_{(5,2)}+a_{(5,2)} g_{(1,-1)}=0 \\
& c_{7}=g_{(6,3)}+a_{(6,3)} g_{(2,0)}=1
\end{aligned}
$$

Level 4 outputs:

$$
\begin{aligned}
& s_{0}=p_{0} \oplus c_{0}=1 \\
& s_{1}=p_{1} \oplus c_{1}=1 \\
& s_{2}=p_{2} \oplus c_{2}=1 \\
& s_{3}=p_{3} \oplus c_{3}=1 \\
& s_{4}=p_{4} \oplus c_{4}=1 \\
& s_{5}=p_{5} \oplus c_{5}=1 \\
& s_{6}=p_{6} \oplus c_{6}=0 \\
& s_{7}=p_{7} \oplus c_{7}=0 \\
& c_{8}=g_{(7,0)}+a_{(7,0)} c_{0}=1
\end{aligned}
$$

## Exercise 2.21

The prefix adder shown in Figure E2.21 produces $x+y$ and $x+y+1$.
As before, for $x+y$ the carry is $c_{i}^{0}=g_{(i-1,0)}$. On the other hand, for the $x+y+1$ case, the carry $c_{i}$ is 1 if a carry is generated in bits 0 to $\mathrm{i}-1\left(g_{(i-1,0)}=1\right)$ and if a carry is propagated from bit 0 to i-1 $\left(p_{(i-1,0)}=1\right)$. Consequently,

$$
c_{i}^{1}=g_{(i-1,0)}+p_{(i-1,0)}=g_{(i-1,0)}+a_{(i-1,0)}
$$



Figure E2.21: Prefix adder for Exercise 2.21.

Exercise 2.22
A diagram of a 16-bit prefix adder is shown in Figure E2.22.


## Exercise 2.23

A diagram of a 4-bit conditional-adder module is shown in Figure E2.23.


Figure E2.23: 4-bit conditional adder for Exercise 2.23.

## Exercise 2.24

| $X$ | 0111 | 1000 | 1010 | 1010 |
| :---: | ---: | ---: | ---: | ---: |
| $Y$ | 1010 | 1011 | 1011 | 0010 |
| $\left(c^{1}, S^{1}\right)$ | 10010 | 10100 | 10110 | 01101 |
| $\left(c^{0}, S^{0}\right)$ | 10001 | 10011 | 10101 | 01100 |
| $(c, S)$ | 10010 | 0100 | 0101 | 1100 |

## Exercise 2.25

a) Assume delay of a $k$-bit CRA to be $k \delta$. The optimum size for each adder is obtained by requiring that the carry out is complete just at the time the decision signal arrives. That is, this condition is

$$
\begin{equation*}
t k(i)=k(i) \delta=t k(0)+(i-1) t_{M U X} \tag{1}
\end{equation*}
$$

where $t k(i)$ is the delay of adder $i$.
Using $m$ adders, the delay of the $n$-bit adder is
$T_{a}=t k(m-1)+t_{M U X}=t k(0)+(m-1) t_{M U X}=k(0) \delta+(m-1) t_{M U X}$
The total length of adders must satisfy

$$
\begin{equation*}
\sum_{i=0}^{m-1} k(i)=n \tag{2}
\end{equation*}
$$

Substituting (1) in (2) we get

$$
\begin{aligned}
n & =\sum_{i=0}^{m-1} k(i)=\frac{1}{\delta} \sum_{i=0}^{m-1}\left(k(0) \delta+(i-1) t_{M U X}\right) \\
& =\sum_{i=0}^{m-1}\left(k(0)+(i-1) \frac{t_{M U X}}{\delta}\right) \\
& =m k(0)+\frac{(m-1)(m-2)}{2}\left(\frac{t_{M U X}}{\delta}\right)
\end{aligned}
$$

From this expression we obtain $k(0)$ in terms of $m$ :

$$
k(0)=\frac{n-\frac{(m-1)(m-2)}{2}\left(\frac{t_{M U X}}{\delta}\right)}{m}
$$

The delay of the $n$-bit adder in terms of $m$ is

$$
T_{a}(m)=k(0) \delta+(m-1) t_{M U X}=\frac{n \delta-\frac{(m-1)(m-2) t_{M U X}}{2}}{m}+(m-1) t_{M U X}
$$

From $d T_{a}(m) / d m=0$ we obtain the following equation

$$
(m-1)^{2}-(m-1)-\frac{2}{3}\left(n \frac{\delta}{t_{M U X}}+2\right)=0
$$

Solving for $m$ we get

$$
m=\frac{2}{3}+\left(\frac{1}{4}+\frac{2}{3}\left(n \frac{\delta}{t_{M U X}}+2\right)^{1 / 2}\right.
$$

For large $n$ and $t_{M U X}=\delta$ we have

$$
m \approx\left\lceil\left(\frac{2}{3}(n+2)\right)^{1 / 2}+\frac{3}{2}\right\rceil
$$

After obtaining $m$ we calculate $k(0)$ and then $k(i)$ 's. For example, for $n=64$ we get $m=9$ and

$$
k(0)=4, \quad k(i)=k(0)+i-1, \quad i=1, \ldots, 8
$$

The delay is $T_{a}(64)=4 \delta+(9-1) t_{M U X}=12 \delta$.
b) Since in this case all carries are computed in log time, the size of the group is not very significant in the delay of the carry. Consequently, to simplify the derivation we assume that the output carries of each block are produced in constant time $c \delta$ and that the block size is constant $(k)$ so that for a $n$-bit adder there are $m=n / k$ blocks. The delay for the sum bits of each block is $k \delta$.
The minimum delay is achieved when the delay of the sum of the last block is equal to the delay of the select signal obtained from previous blocks:

$$
c \delta+(m-2) t_{M U X}=k \delta
$$

For $t+M U X=\delta$ we get $c+m-2=k$. Since $n=m k$

$$
m=\frac{2-c+\sqrt{4 n+(c-2)^{2}}}{2}
$$

The total delay is $T=c \delta+(m-1) t_{M U X}=(c+m-1) \delta$. For example, for $n=64$ and $c=2$, we get $m=8$ and $k=8$. The delay in this case is $T=(2+7) \delta=9 \delta$.

## Exercise 2.26

| $X$ | 01 | 01 | 01 | 11 |
| :--- | :--- | :--- | :--- | :--- |
| $Y$ | 10 | 10 | 11 | 11 |
| $S^{0}$ | 11 | 11 | 00 | 10 |
| $c^{0}$ | 0 | 0 | 1 | 1 |
| $S^{1}$ | 00 | 00 | 01 | 11 |
| $c^{1}$ | 1 | 1 | 1 | 1 |
| $S^{0}$ | 11 | 11 | 01 | 10 |
| $c^{0}$ | 0 |  | 1 |  |
| $S^{1}$ | 00 | 00 | 01 | 11 |
| $c^{1}$ | 1 |  | 1 |  |
|  |  |  |  |  |
| $S^{0}$ | 00 | 00 | 01 | 10 |
| $c^{0}$ | 1 |  |  |  |
| $S^{1}$ | 00 | 00 | 01 | 11 |
| $c^{1}$ | 1 |  |  |  |

The result is $\left(c^{0}, S^{0}\right)$ because $c_{i n}=0$.

## Exercise 2.27

Scheme A: $T_{A}=t_{C R A}(n / 4)+2 t_{M U X}=(n / 4) \times 2 \delta+2 \times \delta=\left(2^{p-1}+2\right) \delta$
Scheme B: $T_{B}=t_{C A}+4 t_{M U X}=(2 \delta+(p-3) \delta)+4 \delta=(p+3) \delta$

- $T_{A}<T_{B}$ if $2^{p-1}+2<p+3$ which holds for $p \leq 2$.


## Exercise 2.28

(a) The initial conditional carries are obtained by CC-1 modules: $c_{i+1}^{0}=$ $x_{i} y_{i}, \quad c_{i+1}^{1}=x_{i}+y_{i}$ (Figure E2.28a). The stages are composed of modules organized as shown in Figure E2.28b and c. An $M k$ module consists of $2 k$ 2-input MUXes.

A 16-bit conditional-carry adder, shown in Figure E2.28c, consists of the following stages:

Stage 0: obtain $\left(c_{i}^{1}, c_{i}^{0}\right)$ using 16 (AND,OR) modules with delay $t_{g}$ and $p_{i}^{\prime} s$ with 16 XORs ( $t_{X O R}$ with delay $2 t_{g}$.)
Stage 1: obtain conditional-carry bit-vectors of size 2: $7 \times 2+1=15$ MUXes $\left(2 t_{g}\right)$


Figure E2.28: (a) Generation of conditional carries. ( $\mathrm{b}+\mathrm{c}$ ) Symbols and implementation of MUX-based modules for conditional carries. (d) 16-bit conditional-carry adder. (Exercise 2.28).

Stage 2: obtain conditional-carry bit-vectors of size $4: 3 \times 4+2=14$
Stage 3: obtain conditional-carry bit-vectors of size 8: $1 \times 8+4=12$
Stage 4: obtain conditional-carry bit-vectors of size $8: 8 \times 1=8$
Stage 5: obtain the sum: $16 \times$ XOR: total $16 ; t_{X O R}=2 t_{g}$
The delay: $T_{C C-16}=t_{g}+4 \times 2 t_{g}+2 t_{g}=11 t_{g}$
The cost: 16 AND $-2+16$ OR- $2+2 \times 16$ XOR +49 MUXes +2 AND- $3+$ OR-2.

From Table 2.4:
Total: $32 \times 2+32 \times 3+49 \times 2+3 \times 2=264$ eqv. gates
(b) Using the conditional-sum adder of Fig.2.24. The COND-ADD-4 is made of modified CLA-4 module: in the CLG-4 we duplicate the AND-OR subnetwork at the bottom to generate conditional carries; we also in the CLA- 4 we duplicate XORs to get conditional sums.

The cost of the COND-ADD-4: 4 AND- $2+4$ OR- $2+3 \times 4$ XOR + CLG- 4 +4 AND- $2+4$ OR- $2=8+8+12 \times 3+44+8+8=112$ eqv. gates.

The delay: $T_{C S U M-16}=t_{C O N D-A D D-4}+2 \times 2 t_{g}=5 t_{g}+4 t_{g}=9 t_{g}$
The cost: 3 COND-ADD- $4+$ CLA- $4+(3 \times 5+9)$ MUXes. Total: $3 \times$ $112+84+24 \times 3=492$ eqv.gates

The conditional-carry adder has a similar delay and a significantly lower cost than a conditional-sum adder with 4-bit conditional adders.

## Exercise 2.29

a) Type 1 adder:

| $x$ | 1000 | 100 | 111 |
| :---: | ---: | ---: | ---: |
| $y$ | 0111 | 000 | 110 |
| $c_{i}^{0}$ | 11111 | 110 | 011 |
| $c_{i}^{1}$ | 00000 | 001 | 100 |
| $c_{i}$ | 00000 | 001 | 100 |
| $s_{i}$ | 01111 | 101 | 101 |

The actual delay, assuming critical path in producing $F$, is

$$
T_{\text {Type } 1}=t_{X O R}+t_{O R-2}+10 \times t_{c}+t_{O R-2}
$$

where $t_{c}$ is the delay of producing a carry:

$$
t_{c}=t_{A N D-2}+t_{O R-2}
$$

Given that $t_{c}$ has the same expression for the carry-ripple adder and that the actual delay of $t_{c}$ is $15 \%$ smaller than its worst-case delay and assuming the same variation for $t_{X O R}$ and $t_{O R-2}$, we get:

$$
T_{\text {Type } 1} \approx 0.85 T_{C R A}
$$

b) Type 2 adder:

| $x$ | 1000100111 |
| :---: | ---: |
| $y$ | 0111000110 |
| chains | jihgfedcba |
| timing | 6543211111 |

In this example, the longest chain is zero-carry chain efghij of 6 positions. The actual delay is

$$
T_{\text {Type } 2}=t_{X O R}+t_{\max }+t_{O R-2}+t_{A N D-10}
$$

where $t_{\text {max }}=6 t_{c}$.
Consequently, including the delay of AND-10, for this input pattern the addition delay is rougly $70 \%$ of that of the adder of type I.

## Exercise 2.30

a) Figure E2.30 shows an implementation of a 32-bit carry-select adder using self-timed modules: 8-bit Type-1 adders and bit-vector multiplexers.

The delay of this carry-select self-timed adder is

$$
T_{C S S T-32}=t_{X O R}+t_{O R}+\sum_{i=0}^{7} t c_{i}+\sum_{i=1}^{3}\left(t_{A N D}+t_{O R}\right)+t_{O R} \approx 12 t c
$$

where $t c$ is the average carry delay and $t_{X O R}+t_{O R}+t_{O R} \approx 2 t c$.
b) The delay of the 32-bit self-timed adder based on a carry-ripple adder (Fig. 2.27) is

$$
T_{C R S T-32}=t_{X O R}+t_{O R}+\sum_{i=0}^{31} t c_{i}+t_{O R} \approx 33 t c
$$

Consequently, we estimate the variable-time adder based on the carry-select scheme to be about 2.75 times faster than the one based on the carry-ripple adder.

Comparing Figure E2.30 with Figure 2.27 shows that the number of adder modules is multiplied by 1.75. Including the multiplexers, we estimate the cost to be 2.5 times higher.

## Exercise 2.31

The critical path in Fig. 2.20 is

$$
g_{0} \rightarrow g_{(1,0)} \rightarrow g_{(3,0)} \rightarrow g_{(7,0)} \rightarrow g_{(7,-1)} \rightarrow g_{(0,-1)} \rightarrow g_{(2,-1)} \rightarrow c_{7} \rightarrow s_{7}
$$

Assuming delay $t$ per ( $\mathrm{g}, \mathrm{a}$ ) module, the delay is

$$
T_{F i g .2 .20}=8 t+t_{X O R} \approx 9 t
$$

The critical path in Fig.2.29 is


Figure E2.30: Carry-select adder with self-timed modules (Exercise 2.30).

$$
g_{0} \rightarrow g_{(1,0)} \rightarrow g_{(3,0)} \rightarrow g_{(7,0)} \rightarrow c_{7} \rightarrow s_{7}
$$

Assuming delay $t$ per ( $\mathrm{g}, \mathrm{a}$ ) module, the delay is

$$
T_{F i g .2 .29}=3 t+t_{b u f f}+t+t_{X O R} \approx 6 t
$$

## Exrecise 2.32

a)

| X |  | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $Y$ |  | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| W |  | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
| S* |  | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| C* | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0a |
| $Z$ |  | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
| $S$ | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| C | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |  |

b)

| $X$ |  | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $Y$ |  | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| $W$ |  | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
| $Z$ |  | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
| $T$ | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 a |
| $P$ |  | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
| $S$ | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| $C$ | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |  |
|  |  |  | a carry in; P output of Odd-parity module. |  |  |  |  |  |  |  |  |  |  |  |  |  |  |

## Exercise 2.33

a) Consider a [4:2] adder made of full adders, arranged in two levels as shown in Figure 2.32 with one bit-slice repeated in Figure E2.33.

Since, according to Table 2.2, in the second FA the delays $t_{c, s}$ and $t_{c, c}$ are smaller than the other delays, the overall delay is reduced if we connect to pin $c$ the output of the first FA with largest delay. However, for this FA implementation the delays $t_{x, s}$ and $t_{x, c}$ are almost the same. So, assume that we connect output $s$ to input $c$ (same result would be obtained by connecting output $c$ to input $c$ ). We get the critical path

$$
T_{[4: 2]}=t_{x, c}+t_{x, s}=(0.72+0.04)+(0.71+0.03)=1.50
$$

This delay is roughly two times the delay of one full adder. This is due to the implementation of the adder of Table 2.2 (an implementation based on two half adders). A reduction in delay would be obtained if the full adder is implemented with a separate network for the carry out signal (a two-level gate network). For


Figure E2.33: [4:2] adder implemented with FAs.
instance, if the delay of this signal would be $0.38+0.03$, we would get a total delay of $(0.72+0.04)+(0.38+0.04)=1.18$, roughly 1.5 the delay of a FA.
b) To show that the network in Fig. 2.41 implements a [4:2] carry-save adder module, consider the following arithmetic function:

$$
\begin{gathered}
v s=\left(x_{i}+y_{i}+w_{i}+z_{i}+t_{i}\right) \bmod 2 \\
\left(t_{i+1}+c_{i+1}\right)=\left(x_{i}+y_{i}+w_{i}+z_{i}+t_{i}\right) / 2
\end{gathered}
$$

Clearly, the ODD PARITY module followed by the XOR produces the correct $v s$ (odd parity of $x, y, w, z, t$ ).

Now for $t_{i+1}, c_{i+1}$, we see that the implementation is symmetric with respect to $x, y, w$ so we can consider the sum $(x+y+w)$. We have

If $x+y+w=2$ or 3 then $t=\operatorname{maj}(x, y, w)=1$
Moreover,
$c=1$ if
i) $(x+y+w=1)$ and $(z=1$ or $t=1)$

In this case $x+y+w+z+t=2$ or 3 and we get $(t=0, c=1)$.
ii) $(x+y+w=2)$ and $z=1$ and $t=1$

In this case $x+y+w+z+t=4$ and we get $(t=1, c=1)$
iii) $(x+y+w=3)$ and $(z=1$ or $t=1)$

In this case $x+y+w+z+t=4$ or 5 and we get $(t=1, c=1)$
In summary, if
$x+y+w=1$ and $z=1$ or $t=1$ then SUM $=2$ or 3 and $t=0$ and $c=1$
$x+y+w=2$ and $z+t<2$ then $\mathrm{SUM}=2$ or 3 and $t=1$ and $c=0$
$x+y+w=2$ and $z+t=2$ then $\mathrm{SUM}=4$ and $t=1$ and $c=1$
$x+y+w=3$ and $z=1$ or $t=1$ then $\mathrm{SUM}=4$ or 5 and $t=1$ and $c=1$.
We now compute the delays of the [4:2] adder shown in Figure 2.41.

- Delay of $s_{i}$ :

The module ODD PARITY consists of a two-level tree of XOR gates: XOR2(XOR1,XOR1). The output $s_{i}$ is produced by a level 3 XOR. Assuming a unit load on $s_{i}$, the delay is

$$
\begin{aligned}
t s_{i} & =t_{X O R 1}+t_{X O R 2}+t_{X O R 3} \\
t_{X O R 1} & =0.3+0.029 \times 1.1=0.33 \mathrm{~ns} \\
t_{X O R 2} & =0.3+0.029 \times(2+0.5)=0.38 \mathrm{~ns} \\
t_{X O R 3} & =0.15+0.028 \times 1=0.188 \mathrm{~ns} \\
s_{i} & =0.89 \mathrm{~ns}
\end{aligned}
$$

- Delay of $t_{i+1}: t_{t}=t_{N A N D 1}+t_{N A N D 2}$ :
$t_{i+1}$ is produced by a majority network implemented by a two-level NAND network as $N A N D 2(N A N D 1, N A N D 1, N A N D 1)$.

$$
\begin{aligned}
t_{N A N D 1} & =0.07+0.033 \times 1.1=0.10 \mathrm{~ns} \\
t_{N A N D 2} & =0.08+0.039 \times(1.1+0.5)=0.14 \mathrm{~ns} \\
t_{t} & =0.24 \mathrm{~ns}
\end{aligned}
$$

- Delay of $\left.c_{i+1}: t_{c}=\max \left(t_{t}, t_{M U X-S E L}\right)+t_{M U X}\right)$

$$
\begin{aligned}
t_{c} & =t_{M U X-S E L}+t_{M U X} \\
& =t_{X O R 1}+t_{X O R 2}+t_{M U X} \\
& =0.33+0.38+0.21+0.05 \times 1=0.97 \mathrm{~ns}
\end{aligned}
$$

We conclude that the [4:2] adder in (b) is faster than the [4:2] adder in (a).

## Exercise 2.34

A radix-8 carry-save cell and a 12-bit (4-octal digit) carry-save adder are shown in Figure E2.34.

## Exercise 2.35

|  | 101 | 110 | 110 | 011 |
| ---: | ---: | ---: | ---: | ---: |
|  | 1 | 1 | 0 | 1 |
|  | 011 | 100 | 111 | 011 |
|  | 001 | 011 | 101 | 111 |
| 1 | 1 | 1 | 0 | 0 |

## Exercise 2.36

The addition of two radix- 8 carry-save operands is performed in two stages:

1. A row of half-adders and full-adders as shown in Figure E2.36 reduces two radix-8 carry-save operands to an intermediate result consisting of one radix- 8 carry-save and one conventional bit-vector.
2. The intermediate result is reduced to one radix- 8 carry-save result using CSA8 cells defined in Exercise 2.34.

## Exercise 2.37

Radix-4 signed digit addition:


12-bit carry-save adder using CSA8 modules
$\rightarrow$ 3-bit vector
$\square$ redundant operand

| CSA8 | CSA8 | CSA8 | CSA8 |
| :---: | :---: | :---: | :---: |
| $x \times x$ | $x \times x$ | $x \times x$ | $\begin{array}{lll}x & x & x\end{array}$ |
| $x \times x$ | $x \times x$ | $x \times x$ | $x \times$ |
| $x$ | $x$ | $x$ | $x$ |
| $x \times x$ | $x \times x$ | $x \times x$ | $x \times x$ |

Input and output bit-vectors
Figure E2.34: Radix-8 carry-save adder (Exercise 2.34).


Figure E2.36: Adder implementing addition of two radix-8 carry-save operands and producing a radix- 8 carry-save result. (Exercise 2.36)

| $X$ |  | 0 | $\overline{2}$ | 1 | $\overline{3}$ | 1 | 0 | 1 | $\overline{1}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $Y$ |  | 1 | 0 | $\overline{1}$ | $\overline{3}$ | $\overline{1}$ | 1 | 1 | 0 |
| $W$ |  | 1 | $\overline{2}$ | 0 | $\overline{2}$ | 0 | 1 | 2 | $\overline{1}$ |
| $T$ | 0 | 0 | 0 | $\overline{1}$ | 0 | 0 | 0 | 0 |  |
| $S$ | 0 | 1 | $\overline{2}$ | $\overline{1}$ | $\overline{2}$ | 0 | 1 | 2 | $\overline{1}$ |

## Exercise 2.38

The arithmetic expressions for a radix- 4 signed-digit addition are

$$
\begin{aligned}
a_{i}+b_{i} & =4 t_{i+1}+w_{i} \\
s_{i} & =w_{i}+t_{i}
\end{aligned}
$$

where the intermediate variables are $t_{i} \in\{-1,0,1\}$ and $w_{i} \in\{-2,-1,0,1,2\}$. The two-step algorithm is not possible since the condition $a \geq(r+1) / 2$ is not satisfied for $a=2$ and $r=4$. Therefore, a modified signed-digit addition algorithm has to be used. We chose Method 2. Let

$$
P_{i}= \begin{cases}1 & \text { if } a_{i-1} \geq 0 \\ 0 & \text { otherwise }\end{cases}
$$

resulting in the following table:

| $a_{i}+b_{i}$ | $P_{i}$ | $t_{i+1}$ | $w_{i}$ |
| :---: | :---: | :---: | :---: |
| 4 | - | 1 | 0 |
| 3 | - | 1 | -1 |
| 2 | 0 | 0 | 2 |
|  | 1 | 1 | -2 |
| 1 | - | 0 | 1 |
| 0 | - | 0 | 0 |
| -1 | - | 0 | -1 |
| -2 | 0 | -1 | 2 |
|  | 1 | 0 | -2 |
| -3 | - | -1 | -1 |
| -4 | - | -1 | 0 |

The block diagram is shown in Figure E2.38. The delay in the critical path is $t_{P}+t_{T W}+t_{s}$.

On the other hand, a radix- 4 signed-digit adder with $a=3$ requires only TW and S stages with the corresponding delays $t_{T W}^{*}$ and $t_{s}$. The delay $t_{T W}^{*}$ is larger than $t_{T W}$ since the digit sets are larger with $a=3$ with $a=3$. The difference between $t_{P}$ and $\left(t_{T W}^{*}-t_{T W}\right)$ determines which adder is faster.

## Exercise 2.39

Method 1:

| $X$ |  |  | 0 | 1 | $\overline{1}$ | 1 | $\overline{1}$ | 0 | 1 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $\overline{1}$ |  |  |  |  |  |  |  |  |  |
| $Y$ |  | 1 | 0 | 1 | 0 | 1 | $\overline{1}$ | $\overline{1}$ | 1 |
| $H$ | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |  |
| $Z$ |  | $\overline{1}$ | $\overline{1}$ | 0 | $\overline{1}$ | 0 | $\overline{1}$ | 0 | 0 |
| $Q$ |  | 1 | 0 | $\overline{1}$ | 1 | $\overline{1}$ | 0 | $\overline{1}$ | 0 |
| 0 | 0 |  |  |  |  |  |  |  |  |
| $T$ | 0 | 0 | $\overline{1}$ | 0 | $\overline{1}$ | 0 | $\overline{1}$ | 0 | 0 |
| $W$ |  | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
|  | 0 |  |  |  |  |  |  |  |  |
| $S$ |  | 1 | $\overline{1}$ | 1 | 0 | 1 | $\overline{1}$ | 1 | 0 |
| 0 |  |  |  |  |  |  |  |  |  |



Figure E2.38: Radix-4 signed digit adder scheme with $a=2$. (Exercise 2.38)

Method 2:

| $X$ |  | 0 | 1 | $\overline{1}$ | 1 | $\overline{1}$ | 0 | 1 | $\overline{1}$ |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $Y$ |  | 1 | 0 | 1 | 0 | 1 | $\overline{1}$ | $\overline{1}$ | 1 |
| $P$ |  | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 |
| $T$ | 1 | 0 | 0 | 0 | 0 | $\overline{1}$ | 0 | 0 |  |
| $W$ |  | $\overline{1}$ | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| $S$ | 1 | $\overline{1}$ | 1 | 0 | 1 | $\overline{1}$ | 1 | 0 | 0 |

## Exercise 2.40

- A high-level description of the double recoding approach resulting in the redundant adder shown in Figure 2.38:
Recoding 1:
$x_{i}+y_{i}=2 h_{i+1}+z_{i} \in\{-2,-1,0,1,2\}$ such that $h_{i} \in\{0,1\}$ and $z_{i} \in$ $\{-2,-1,0\}$
Recoding 2 :
$h_{i}+z_{i}=2 t_{i+1}+w_{i} \in\{-2,-1,0,1\}$ such that $t_{i} \in\{-1,0\}$ and $w_{i} \in\{0,1\}$
The output $s_{i}=t_{i}+w_{i} \in\{-1,0,1\}$
- A binary-level description:

The inputs and the output are coded as follows:

$$
x_{i}=x_{i}^{+}-x_{i}^{-}, \quad y_{i}=y_{i}^{+}-y_{i}^{-}, \quad s_{i}=s_{i}^{+}-s_{i}^{-}
$$

The non-binary intermediate variable $z_{i} \in\{-2,-1,0\}$ is coded as follows:

$$
z_{i}=-z a_{i}-z b_{i}
$$

All coding variables are binary. In terms of these binary variables we express the high-level expressions as:

$$
x_{i}^{+}-x_{i}^{-}+y_{i}^{+}-y_{i}^{-}=2 h_{i+1}-z a_{i}-z b_{i}
$$

which is decomposed as

$$
x_{i}^{+}-x_{i}^{-}+y_{i}^{+}=2 h_{i+1}-z a_{i}, \quad y_{i}^{-}=z b_{i}
$$

The expression on the left is implemented by a full-adder in which signals with a negative sign are inverted. This corresponds to FAs at the top level of Figure 2.38. In the figure $z a_{i}$ is denoted as $v_{i}$.
Recoding 2 is desribed at the binary level by

$$
-v_{i}-y_{i}^{-}+h_{i}=-2 t_{i+1}+w_{i}
$$

which is implemented by a full-adder in which signals with a negative sign are inverted. This corresponds to the bottom level of full-adders in Figure 2.38. Finaly, the output $s_{i}=s_{i}^{+}-s_{i}^{-}$is obtained as $s_{i}^{+}=w_{i}$ and $s_{i}^{-}=t_{i}$.

## Exercise 2.41

Since $z_{i}, s_{i} \in\{-2,-1,0,1,2\}$, the radix 4 should be used also for $x$ and $y$ operands so that $x_{i}, y_{i} \in\{0,1,2,3\}$ and $x_{n-1}, y_{n-1} \in\{-2,-1,0,1\}$. Consequently,

$$
-2 \leq x_{i}+y_{i}+z_{i} \leq 6
$$

for $0 \leq i \leq n-2$. Moreover,

$$
-6 \leq x_{n-1}+y_{n-1}+z_{n-1} \leq 4
$$

The operation for digits 0 to $n-2$ consists of the following three steps: STEP $1 x_{i}+y_{i}+z_{i}=4 h_{i+1}+v_{i}$ where $h_{i} \in\{-1,0,1,2\}$ and $v_{i} \in\{-1,0,1,2\}$ so that $h_{i}+v_{i} \in\{-2,-1,0,1,2,3,4\}$

$$
\begin{array}{cccccccccccc}
x_{i}+y_{i}+z_{i} & -2 & -1 & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\
\hline h_{i+1} & -1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 2 & 2 \\
v_{i} & 2 & -1 & 0 & 1 & 2 & -1 & 0 & 1 & 2 & -1 & 0
\end{array}
$$

STEP $2 h_{i}+v_{i}=4 t_{i+1}+u_{i}$ where $t_{i} \in\{0,1\}$ and $u_{i} \in\{-2,-1,0,1\}$

| $h_{i}+v_{i}$ | -2 | -1 | 0 | 1 | 2 | 3 | 4 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $t_{i+1}$ | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| $u_{i}$ | -2 | -1 | 0 | 1 | -2 | -1 | 0 |

STEP $3 t_{i}+u_{i}=s_{i} \in\{-2,-1,0,1,2\}$
For the most-significant digit $(i=n-1)$ the algorithm is modified as follows:
STEP 1: $x_{n-1}+y_{n-1}+z_{n-1}=4 h_{n}+v_{n-1}$
Since $h_{n-1} \in\{-1,0,1,2\}$ (from algorithm for $i \leq n-2$ ) we make

$$
v_{n-1} \in\{-2,-1,0,1\}, \quad h_{n} \in\{-1,0,1\}
$$

STEP 2: $v_{n-1}+h_{n-1} \in\{-3, \ldots, 3\}$
Since $t_{n-1} \in\{0,1\}$ (from algorithm for $0 \leq i \leq n-2$ ) we make

$$
u_{n-1} \in\{-2,-1,0,1\}, \quad t_{n} \in\{-1,0,1\}
$$

STEP 3: $s_{n-1}=u_{n-1}+t_{n-1}$
Finally, $s_{n}=t_{n}+h_{n}$ with $s_{n} \in\{-2, \ldots, 2\}$.
The critical path includes HV, TW, and S modules.

## Exercise 2.42

The input operands $x$ and $y$ are in radix 4 with the digit set $\{0,1,2,3\}$. The output $s$ is in radix- 4 with the signed-digit set $\{-2,-1,0,1,2\}$.

1. Method 1 - addition consists of two recodings and a carry-free addition.

Recoding 1: $x_{i}+y_{i}=4 h_{i+1}+z_{i}$. The sum of input digits is in the range $[0,6]$. The intermediate variables are $h_{I} \in\{0,1\}$ and $z_{i} \in\{-1,0,1,2\}$.

| $x_{i}+y_{i}$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $h_{i+1}$ | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| $z_{i}$ | 0 | 1 | 2 | -1 | 0 | 1 | 2 |

Recoding 2: $h_{i}+z_{i}=4 t_{i+1}+w_{i}$. The input to this recoding is in the range $[-1,3]$ and the intermediate variables are $t_{i} \in\{0,1\}$ and $w_{i} \in\{-2,-1,0,1\}$.

| $h_{i}+z_{i}$ | -1 | 0 | 1 | 2 | 3 |
| :---: | ---: | ---: | ---: | ---: | ---: |
| $t_{i+1}$ | 0 | 0 | 0 | 1 | 1 |
| $w_{i}$ | -1 | 0 | 1 | -2 | -1 |

The sum is obtained by a carry-free addition: $s_{i}=w_{i}+t_{i}, s_{i} \in\{-2,-1,0,1,2\}$
2. Method 2 - addition uses information from previous digit.

Consider the following recoding:

| $x_{i}+y_{i}$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| :---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| $t_{i+1}$ | 0 | 0 | 1 | 1 | 1 | 1 | 2 |
| $w_{i}$ | 0 | 1 | -2 | -1 | 0 | 1 | -2 |

This recoding produces an out-of-range output digit $s_{i}=3$ if $t_{i}=2$ and $w_{i}=1$. This can be avoided by using alternate recoding when $x_{i}+y_{i}$ is 1 or 5 , marked by $(*)$. The choice depends on the value of $x_{i-1}+y_{i-1}$.

$$
d_{i}= \begin{cases}1 & \text { if } x_{i-1}+y_{i-1} \geq 5 \\ 0 & \text { otherwise }\end{cases}
$$

If $d_{i}=1$, choose the recoding in ().

| $x_{i}+y_{i}$ | 0 | $1\left({ }^{*}\right)$ | 2 | 3 | 4 | $5(*)$ | 6 |
| :---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| $t_{i+1}$ | 0 | $0(1)$ | 1 | 1 | 1 | $1(2)$ | 2 |
| $w_{i}$ | 0 | $1(-3)$ | -2 | -1 | 0 | $1(-3)$ | -2 |

This recoding guarantees that if $t_{i}=2, w_{i}$ cannot be 1 . Consequently, $s_{i}=t_{i}+w_{i} \in\{-2,1,0,1,2\}$. To simplify implementation, it is possible to relax the condition when $d_{i}=1$ to $x_{i-1}+y_{i-1} \geq 2$.

## Exercise 2.43

Radix-2 signed digit addition of one conventional and one signed-digit operand:

| $X$ |  | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $Y^{+}$ |  | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 |
| $Y^{-}$ |  | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| $W$ |  | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| $T$ | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |  |
| $S^{+}$ | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
| $S^{-}$ |  | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| $S$ | 1 | $\overline{1}$ | 1 | 1 | $\overline{1}$ | 0 | 1 | 1 | $\overline{1}$ |

## Exercise 2.44

Radix-2 signed digit addition of two signed-digit operands:

| $X^{+}$ |  | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| $X^{-}$ |  | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 |
| $Y^{+}$ |  | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 |
| $Z$ |  | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| $H$ | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
| $Y^{-}$ |  | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| $S^{+}$ | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
| $S^{-}$ | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
| $S$ | 1 | 0 | 0 | $\overline{1}$ | $\overline{1}$ | 1 | 1 | $\overline{1}$ | 1 |

## Exercise 2.45

The adder output $s_{i}$ is a switching function of $2 i$ inputs (ignoring the carryin). It is a well known result that using modules of $m$ binary inputs, the implementation requires at least $\left\lceil\log _{m}(2 i)\right\rceil$ levels. The proof of this is based on building a tree of modules: the first level has $\lceil 2 i / m\rceil$ modules and outputs. For the second level, group $m$ outputs from the first level per module, and so on until the last level has one module.

