























## **Three Generic Data Hazards**

• Write After Read (WAR) Instr J writes operand <u>before</u> Instr, reads it

> I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7

- Called an "anti-dependence" by compiler writers. This results from reuse of the name "r1".
- Can't happen in MIPS 5 stage pipeline because: - All instructions take 5 stages, and
  - Reads are always in stage 2, and
  - Writes are always in stage 5













| Softw     | Software Scheduling to Avoid Load<br>Hazards |                        |             |          |                          |  |  |  |
|-----------|----------------------------------------------|------------------------|-------------|----------|--------------------------|--|--|--|
| Try pro   | ducin                                        | g fast code            | for         |          |                          |  |  |  |
| а         | = b +                                        | ⊦c;                    |             |          |                          |  |  |  |
| d         | = e -                                        | - f:                   |             |          |                          |  |  |  |
|           |                                              | b, c, d ,e, a          | nd f in mem | orv.     |                          |  |  |  |
| Slow code |                                              |                        | ast code:   |          |                          |  |  |  |
| L         | w                                            | Rb,b                   | LW          | Rb,b     |                          |  |  |  |
| L         | w                                            | Rc,c                   | LW          | Rc,c     |                          |  |  |  |
| Α         | DD                                           | Ra,Rb, <mark>Rc</mark> | LW          | Re,e     |                          |  |  |  |
| S         | w                                            | a,Ra 🔶                 | ADD         | Ra,Rb,Rc |                          |  |  |  |
| Ľ         | w                                            | Re,e                   | LW          | Rf,f     |                          |  |  |  |
| L         | w                                            | Rf,f                   | ► SW        | a,Ra     |                          |  |  |  |
| S         | UB                                           | Rd,Re, <mark>Rf</mark> | SUB         | Rd,Re,Rf |                          |  |  |  |
| S         | w                                            | d,Rd                   | SW          | d,Rd     | CS252/Culler<br>Lec 2.20 |  |  |  |



















|                              | Alter                     | natives                                |         |
|------------------------------|---------------------------|----------------------------------------|---------|
| Pipeline speed               | lup= <sub>1 +Bra</sub>    | Pipeline depth<br>nch frequency×Branch | penalty |
| Assume:                      |                           |                                        |         |
| Conditional & l              | <b>Jncondition</b>        | al = 14%, 65% cha                      | ange PC |
|                              | anah CDI                  | speedup v.                             |         |
| Scheduling Bracheme per      | nalty CPI                 | stall                                  |         |
|                              |                           |                                        |         |
| scheme pei                   | nalty                     | stall                                  |         |
| scheme per<br>Stall pipeline | nalty<br>3 1.42<br>1 1.14 | stall<br>1.0<br>1.26                   |         |













## Simplest Cache: Direct Mapped













|         | Q3: Which block should be replaced on a miss?                                                                |       |       |       |       |       |       |                          |  |  |
|---------|--------------------------------------------------------------------------------------------------------------|-------|-------|-------|-------|-------|-------|--------------------------|--|--|
|         | Easy for Direct Mapped                                                                                       |       |       |       |       |       |       |                          |  |  |
|         | <ul> <li>Set Associative or Fully Associative:</li> <li>Random</li> <li>LRU (Least Recently Used)</li> </ul> |       |       |       |       |       |       |                          |  |  |
|         | Assoc:                                                                                                       | 2-way |       | 4-way |       | 8-way |       |                          |  |  |
|         | Size                                                                                                         | LRU   | Ran   | LRU   | Ran   | LRU   | Ran   |                          |  |  |
|         | 16 KB                                                                                                        | 5.2%  | 5.7%  | 4.7%  | 5.3%  | 4.4%  | 5.0%  |                          |  |  |
|         | 64 KB                                                                                                        | 1.9%  | 2.0%  | 1.5%  | 1.7%  | 1.4%  | 1.5%  |                          |  |  |
|         | 256 KB                                                                                                       | 1.15% | 1.17% | 1.13% | 1.13% | 1.12% | 1.12% | ,<br>D                   |  |  |
|         |                                                                                                              |       |       |       |       |       |       |                          |  |  |
| 1/24/02 |                                                                                                              |       |       |       |       |       |       | CS252/Culler<br>Lec 2.44 |  |  |































## How to Summarize Performance

- Arithmetic mean (weighted arithmetic mean) tracks execution time: S(T,)/n or S(Wi\*T,)
- Harmonic mean (weight at the darmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/S (1/R) or n/S (W/R)
- Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10)
- But do not take the arithmetic mean of normalized execution time, use the geometric mean:  $(\mbox{ P } T_i / N_i)^{1/n}$



#### **Performance Evaluation**

- + "For better or worse, benchmarks shape a field"
- Good products created when have:
- Good benchmarks
- Good ways to summarize performance
- Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary
- If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales;
   Sales almost always wins!
- Execution time is the measure of computer performance!







# Review #4/4: TLB, Virtual Memory

- Caches, TLBs, Virtual Memory all understood by examining how they deal with 4 questions: 1)
   Where can block be placed? 2) How is block found?
   What block is repalced on miss? 4) How are writes handled?
- Page tables map virtual address to physical address
- TLBs make virtual memory practical

   Locality in data => locality in addresses of data, temporal and spatial
- TLB misses are significant in processor performance - funny times, as most systems can't access all of 2nd level cache without TLB misses!
- Today VM allows many processes to share single memory without having to swap all processes to disk; today VM protection is more important than memory hierarchy

CS252/Culler Lec 2.67