# Parallel Systems - Outline for lecture 3 -> Caches (a quick review)

- Shared memory multiprocessors
  - Memory hierarchiesCache coherence
  - Snooping protocols
  - » Invalidation protocols (MSI, MESI)
  - » Update protocol (Dragon)
  - Protocol tradeoffs

# 

Caches

- Block placement -

Direct

mapped

01234567

Cache

Fully

associative

01234567

Set

01234567

associative

# Caches Caches memories are small, fast buffers that are used to temporarily hold Recently used information Information that might be needed in the near future Principle of locality Programs access a relatively small portion of their address space at any instant of time Temporal locality (locality in time): If an item is referenced, it will tend to be referenced again soon. Spatial locality (locality in space): If an item is referenced, items whose addresses are close by will tend to be referenced soon

- $\succ$  Q1: Where can a block be placed in a cache?
- > Q2: How is a block found?
- > Q3: What block is replaced on a miss?
- > Q4: How are writes handled?



1







3

## **Cache Coherence**

Some definitions

- Memory operations: Single read, write, or read-modify-write access to a memory location
- □ A memory operation *issues* when it leaves the processor and is presented to the memory system
- A multiprocessor memory system is coherent if
  - » Operations issued by any process occur in the order in which they were issued to the memory system by that process
  - » The value returned by each read operation is the value written by the last write to that location in the serial order
- Two properties follows by the definitions
   Write propagation: writes become visible to other processes
   Write serialization: all writes to a location are seen in the

same order by all processes

#### Snooping Protocols - Basic definitions -

- Coherence is maintained by having all cache controllers "snoop" on the bus and monitor the transactions
- Key properties of a bus that supports coherence
   All transaction that appear on the bus are visible to all cache controllers
  - $\hfill\square$  All transactions are visible in the same order
- Simplest approach
  - □ Single level write through caches
    - » This was the approach used in the first commercial bus-based SMP's
- Coherence protocols
- Invalidation-based
- Update-based

#### Snooping Protocols - Basic definitions [cont.]-

- Snoopy protocols ties together bus transactions and the state transition diagram associated with a cache block
- Bus transactions

arbitration, command (read&write), and data transfer

- State transition diagram
  - □ Each cache block has a state associated with it, along with the tag and data (e.g., valid or invalid)
  - □ State changes is the same for all blocks and all caches, but the current state of a block in different caches may be different
  - Two inputs to cache controller
    - » Memory requests issued by the processor
    - » The bus snooper informs about bus transactions







# Snooping Protocols

- A three-state write-back invalidation protocol [cont.] -

#### Example

| Processor action | State<br>in P1 | State<br>in P2 | State<br>in P3 | Bus<br>action | Data<br>supplied<br>by |
|------------------|----------------|----------------|----------------|---------------|------------------------|
| P1 reads U       | S              | -              | -              | BusRd         | Memory                 |
| P3 reads U       | S              | -              | S              | BusRd         | Memory                 |
| P3 writes U      | I              | -              | М              | BusRdX        | Memory                 |
| P1 reads U       | S              | -              | S              | BusRd         | P3 cache               |
| P2 reads U       | S              | S              | S              | BusRd         | Memory                 |

| Snooping Protocols - A four-state write-back invalidation protocol -                                                                                                                                                                                                |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| <ul> <li>Four states are used (MESI) - Illinois protocol</li> <li>Modified (M)</li> <li>Exclusive (E)         <ul> <li>Only this cache has a copy of the block and it is not modified</li> <li>Shared (S)</li> <li>Invalid (I)</li> </ul> </li> </ul>               |  |
| <ul> <li>A new signal must be available on the interconnect</li> <li>Shared: Determine (on BusRd) if other caches holds a copy of this block</li> </ul>                                                                                                             |  |
| <ul> <li>Problem with three-state (MSI) protocol:</li> <li>Two bus transactions are generated when the processor reads in and modifies a data item (even though there are never any sharers)</li> <li>Exercise: Write a state transition diagram for the</li> </ul> |  |
| MESI protocol.                                                                                                                                                                                                                                                      |  |

| Snooping Protocols<br>- A four-state write-back update protocol -                                                                                                                                           |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| ➢ Four states                                                                                                                                                                                               |  |
| <ul> <li>Exclusive-clean (E)</li> <li>» Same meaning as in MESI</li> </ul>                                                                                                                                  |  |
| Shared-clean (Sc)<br>» Several caches may have a copy of this block                                                                                                                                         |  |
| <ul> <li>Shared-modified (Sm)</li> <li>» Several caches may have a copy of this block, and it is this cache's responsibility to update the main memory when the block is replaced from the cache</li> </ul> |  |
| Modified (M) » Same meaning as in MESI                                                                                                                                                                      |  |
| ➤ Transactions                                                                                                                                                                                              |  |
| No invalid state => two more request types: PrRdMiss,<br>PrWrMiss                                                                                                                                           |  |
| New transaction - BusUpd: Broadcast the updated value on<br>the bus so that all other caches can update themselves                                                                                          |  |



## Snooping Protocols - A four-state write-back update protocol [cont.] -

### Example

| Processor action | State<br>in P1 | State<br>in P2 | State<br>in P3 | Bus<br>action | Data<br>supplied by |
|------------------|----------------|----------------|----------------|---------------|---------------------|
| P1 reads U       | Е              | -              | -              | BusRd         | Memory              |
| P3 reads U       | Sc             | -              | Sc             | BusRd         | Memory              |
| P3 writes U      | Sc             | -              | Sm             | BusUpd        | P3 cache            |
| P1 reads U       | Sc             | -              | Sm             | -             | -                   |
| P2 reads U       | Sc             | Sc             | Sm             | BusRd         | P3 cache            |



Lecture 3

| Paral   | llel  | Systems | 5 |
|---------|-------|---------|---|
| I ui ui | ii Ui |         | , |

|                | State | transition |        | adeoffs<br>00 data | -       | es -   |
|----------------|-------|------------|--------|--------------------|---------|--------|
| Barnes-<br>Hut |       |            |        | То                 |         |        |
|                |       | NP         | I      | E                  | S       | М      |
|                | NP    | 0          | 0      | 0,0011             | 0.0362  | 0,0035 |
| Fro<br>m       | I     | 0,0201     | 0      | 0,0001             | 0,1856  | 0,0010 |
|                | E     | 0          | 0      | 0,0153             | 0,0002  | 0,0010 |
|                | S     | 0,0029     | 0,2130 | 0                  | 97,1712 | 0,1253 |
|                | М     | 0,0013     | 0,0010 | 0                  | 0,1277  | 902,78 |
| Ravt           | race  | NP         |        | E                  | S       | _ м    |
|                | NP    | 0          | 0      | 1,3358             | 0,1549  | 0,0026 |
| Fro<br>m       | I     | 0,0242     | 0      | 0                  | 0,3403  | 0      |
|                | E     | 0,8664     | 0      | 29,02              | 0,3639  | 0,0175 |
|                | S     | 1,1181     | 0,3750 | 0                  | 310,95  | 0,2898 |
|                | М     | 0,0559     | 0,0001 | 0                  | 0,2970  | 661,01 |

## Protocol Tradeoffs

- Cache misses -

Cache misses in uniprocessor context

Compulsory (cold start) misses

» First reference to a memory block by a processor

Capacity misses:

» All blocks that are referenced by a processor during the execution of a program do not fit in the cache

#### Conflict (collision) misses:

» Occurs when the collection of blocks referenced by a program maps to a single cache set does not fit in the set

#### □ Tradeoff examples

» Conflict misses can be reduced by *reducing* the block size

» Cold start misses can be reduced by increasing the block size

# Protocol Tradeoffs - Cache misses [cont.] -> Coherence misses:

- Occurs when blocks of data are shared among multiple caches
- □ *True sharing*: A data word produced (written) by one processor is used (read or written) by another.
- □ False sharing: Independent data words accessed by different processors happen to be placed in the same memory (cache) block
- » False sharing is an example of artifactual communication
- » Increases with larger block size

## Summary

> Memory hierarchies and cache coherence problem

- Cache coherence through snooping protocols
   Invalidation-based protocols
  - » Simple protocol for write-through caches
  - » MSI & MESI for write-back caches

#### Update-based protocol

Coherence protocol tradeoffs
 Frequency analysis of state transitions
 Tradeoffs in cache block sizes