Solutions for HW4 -------------------- ** 5.6 a) Single cycle datapath (page 372 in book) The first datapath change is to make the 2to1 MUX that chooses the address for the register to be written to a 3to1 MUX. The 3rd input is the number 31 (binary: 11111). The second datapath change is to make PC+4 available as a value to be written in the register file. We just need to make the 2to1 MUX that selects the memory or alu output a 3to1 MUX, so that its third input is PC+4 (output of adder for PC). For the case of control, the signals for jal are the same with those for j with the following exceptions: 1) The RegWrite becomes 1 (is 0 in j) 2) The control for the MUX for the WriteRegister address should have the value 2 in order to select 31 (don't care in j) 3) The control for the MUX for register file WriteData should be 2 in order to select PC+4 (don't care in j) b) Multicycle datapath (page 383 in book) The first datapath change is to make the 2to1 MUX that chooses the address for the register to be written to a 3to1 MUX. The 3rd input is the number 31 (binary: 11111). The second datapath change is to make PC+4 available as a value to be written in the register file. We just need to make the 2to1 MUX that selects the memory or alu output a 3to1 MUX, so that its third input is PC+4 (output of PC register). The FSM for jal is the same with that of j. If state 10 is the second state of jal (page 396), this state is the same with 9 with RegWrite=1, RegDest=2 and MemtoReg=2. The steps of jal are: cycle 1: IR=Memory [PC] PC=PC+4 cycle 2: - cycle 3: REG[31]=PC PC=PC[31-28] || (IR[25-0]<<2) It is obvious that jal can be executed in 2 cycles only, as long as the instruction decode and control can be fast enough... In this case, the (RegWrite=1, RegDest=2 and MemtoReg=2) are enabled in state 1 if the instruction is jal. ** 5.16 addi does not need any additions to the datapath in page 383 Its steps are: cycle 1: IR=Memory [PC] PC=PC+4 cycle 2: A=Reg[ IR[25-21] ] cycle 3: ALUout = A + sign-extend(IR[15:0]) cycle 4: Reh[ IR[20-16] ] The microcode for addi using the terminology of the book is: LABEL ALU SRC1 SRC2 REG REG MEM PCwrite Seq ctrl ctlr dest ctrl -------------------------------------------------------------------- Fetch Add PC 4 Read PC ALU Seq Add PC ExtShft Read Dispatch 1 ADDI Add A Ext Seq WriteALU rt Fech The new column added is REG dest which specifies which register is written in the reg file. For R-type it is rd for i-type it is rt. ** 5.24 Effective instruction mix for gcc after combining instructions that do not exist in the datapath in the book with the corresponding that exist (e.g. lb with lw) R-type (add, sub, slt, etc): 20% I-type arithmetic (addi etc): 23% LW/SW: 23% / 13% branches: 19% jumps: 2% M1: R-type: 4 cycles I-type arithmetic: 4 cycles LW/SW: 5 / 4 cycles: branches: 3 cycles jumps: 3 cycles CPI= 4*(.2+.23)+5*.23+4*.13+3*(.19+.2)=4.56 MIPS= Freq/CPI= 500/4.56=109.6 MIPS M2: R-type: 3 cycles I-type arithmetic: 3 cycles LW/SW: 4 cycles: branches: 3 cycles jumps: 3 cycles CPI= 3*(.2+.23)+4*.23+4*.13+3*(.19+.2)=3.90 MIPS= Freq/CPI= 400/3.9=102.5 MIPS M3: R-type: 3 cycles I-type arithmetic: 3 cycles LW/SW: 3 cycles: branches: 3 cycles jumps: 3 cycles CPI= 3*(.2+.23)+3*.23+3*.13+3*(.19+.2)=3.54 MIPS= Freq/CPI= 500/3.54=70.6 MIPS Machine M1 is 1.06926 times faster than M2 and 1.55 times faster than M3 M2 does lw and arithmetic operations in less cycles than M1. A instruction mix with 100% arithmetic instruction leads to 125MIPS for M1 and 133MIPS for M2. M3 does lw, sw and arithmetic in less cycles than M1. The benefit in terms of cycles is 1.66 (lw) or 1.33 (sw, arithmetic). The loss due to slow frequency is 2, so there is not instruction mix for which M1 is slower than M1. ** 6.5 IF/ID 1) instruction field 32b 2) PC+4 filed 32b ID/EX 1) PC+4 filed 32b 2) Reg Read Data 1 32b 3) Reg Read Data 2 32b 4) rt register address 5b 5) rd register address 5b 6) sign extended immediate 32b EX/MEM 1) PC+4+(sign extended immediate <<2) 32b 2) ALU output 32b 3) zero 1b 4) destination register address 5b 5) Reg Read Data 2 32b MEM/WB 1) Memory data out 32b 2) alu output 32b 3) destination register address 5b ** 6.7 only active components of the corresponding stage mentioned ADD cycle 1: IF instruction memory PC adder ADD cycle 2: ID register file ADD cycle 3: EX alu src2 input mux alu alu control destination register result mux ADD cycle 4: MEM - ADD cycle 5: WB register write data mux register file