# CS 252 - Homework #1

Homework 1 is due Monday September 16 at 5 PM. Turn it in in the appropriate box in 283 Soda. Note that the building is locked (for those who do not have keycard access at 6:45 PM during the week, as well as on the weekends.

Homework assignments should be done in pairs. Each pair is to do their own work, separate from the other pairs. Each pair turns in one solution.

• 4.14, all parts (a-k)
• 4.25, table of pros and cons + short essay
• B.3, all parts (a-g)
• B.15, table of pros and cons + short essay

Here are a few simplifications and clarifications regarding question 4.14:

• Assume that Figures 4.2 (page 224) and 4.63 (page 367) have been changed to the following:
instruction
producing result
instruction
using result
latency in
clock cycles
FP ALU opanything3
Figure 4.2

instruction
producing result
instruction
using result
latency in
clock cycles
FP multiplyanything6
integer op
anything0
Figure 4.63

For the purposes of the scoreboard and Tomasulo's algorithm, assume that a latency of n cycles implies an execution time of n+1 cycles.

If you already did this problem with the more complex assumptions written in the book, don't worry. Just state what your assumptions are. There's no need to redo any work.

• Note that the scoreboard and Tomasulo's examples, as presented in the book and in lecture, do not pipeline the floating point units. However, similar behavior can be achieved by either pipelining a unit or replicating the unit.

• As was done in class, it is only necessary to show the floating point unit with respect to the scoreboard and Tomasulo's algorithm.

• When parts (c) and (d) of 4.14 say "there is on integer functional unit that takes only a single execution cycle (the latency to use is 0 cycles, including loads and stores)", that means that the cost of an integer operation is simply 1 clock cycle ticking by. Yes, this is unrealistic, but the point of the question is to concentrate on the behavior of the floating point unit.

• Assume that while DLX has the necessary forwarding to accomplish the latencies mentioned above; the scoreboard implementation can not read an operand until after it has been written back to the register file (scoreboard); and Tomasulo's algorithm can not get a result until it is broadcast on the CDB.

Therefore, given the following instructions:

```	  LD   F2,0(R1)