#### Lecture 12: Caches (cont'd)

CS 105

Spring 2024

#### **Review: The CPU-Memory Gap**



## Review: Principle of Locality

Programs tend to use data and instructions with addresses near or equal to those they have used recently

#### Temporal locality:

 Recently referenced items are likely to be referenced again in the near future

#### Spatial locality:

 Items with nearby addresses tend to be referenced close together in time







# Review: Handling Cache Miss

When a cache miss occurs update cache line at that index:

- 1. Set valid bit to 1
- 2. Update tag
- 3. Replace data block with bytes from memory





### Review: Direct-mapped Cache

| 0x74 | 18 |        |
|------|----|--------|
| 0x70 | 17 | 7      |
| 0x6c | 16 | Memory |
| 0x68 | 15 | lor    |
| 0x64 | 14 | <      |
| 0x60 | 13 |        |



#### Assume 8 byte data blocks

| 0.000   |      | 10  |     |      |      |   | L    | .ine 0 |    | Line 1 |      |    |    |
|---------|------|-----|-----|------|------|---|------|--------|----|--------|------|----|----|
| Access  | tag  | idx | off | h/m  |      | 0 | 0000 | 47     | 48 | 0      | 0000 | 47 | 48 |
| rd 0x60 | 0110 | 0   | 000 | Miss |      |   |      |        | -  | 0      | 0000 | 41 | 40 |
| rd 0x64 | 0110 | 0   | 100 | Hit  |      | 1 | 0110 | 13     | 14 |        |      |    |    |
| rd 0x70 | 0111 | 0   | 000 | Miss |      | 1 | 0111 | 47     | 10 |        |      |    |    |
| rd 0x64 | 0110 | 0   | 100 | Miss | Time |   | 0111 | 17     | 18 |        |      |    |    |
| rd 0x64 | 0110 | 0   | 100 |      | F    | 1 | 0110 | 13     | 14 |        |      |    |    |
| rd 0x60 | 0110 | 0   | 000 | Hit  |      |   |      |        |    |        |      |    |    |
| rd 0x70 | 0111 | 0   | 000 | Miss | V    | 1 | 0111 | 17     | 18 |        |      |    |    |

How well does this take advantage of spacial locality? How well does this take advantage of temporal locality?



#### Exercise: 2-way Set Associative Cache

| 0x7     | 4   | 18  |        |             |                         |     |    |       |   |    |       | ( | Ca | che   |   |    |     |    |
|---------|-----|-----|--------|-------------|-------------------------|-----|----|-------|---|----|-------|---|----|-------|---|----|-----|----|
| 0x7     | 0   | 17  | _ <    | Set 0       |                         |     |    |       |   |    |       |   |    |       |   |    |     |    |
| 0x6     | c   | 16  | Memory |             |                         |     |    | Cat   | 4 |    |       |   |    |       | 1 |    |     |    |
| 0x6     | 8   | 15  | _ joj  |             |                         |     |    | Set   |   |    |       |   |    |       | , | 4~ | bla |    |
| 0x6     | 4   | 14  |        |             | Assume 8 byte data bloc |     |    |       |   |    |       |   |    |       |   |    | CKS |    |
| 0x6     | 0   | 13  |        | Set 0 Set 1 |                         |     |    |       |   |    |       |   |    |       |   |    |     |    |
| Access  | tag | idx | off    | h/m         |                         |     | _i | ne 0  |   | Li | ne 1  |   | Li | ne 0  |   | Li | ne  | 1  |
|         | tag | ТИЛ |        | 11/111      | - I                     | 0 0 | C  | 47 48 | 0 | 1  | 47 48 | 0 | 0  | 47 48 | 0 | 1  | 47  | 48 |
| rd 0x60 |     |     |        |             |                         |     |    |       |   |    |       |   |    |       |   |    |     |    |
| rd 0x64 |     |     |        |             |                         |     |    |       |   |    |       |   |    |       |   |    |     |    |
| rd 0x70 |     |     |        |             |                         |     |    |       |   |    |       |   |    |       |   |    |     |    |
| rd 0x64 |     |     |        |             | Time                    |     |    |       |   |    |       |   |    |       |   |    |     |    |
| rd 0x64 |     |     |        |             | F                       |     |    |       |   |    |       |   |    |       |   |    |     |    |
|         |     |     |        |             |                         |     |    |       |   |    |       |   |    |       |   |    |     |    |
| rd 0x60 |     |     |        |             |                         |     |    |       |   |    |       |   |    |       |   |    |     |    |
| rd 0x70 |     |     |        |             | V                       |     |    |       |   |    |       |   |    |       |   |    |     |    |

### **Eviction from the Cache**

On a cache miss, a new block is loaded into the cache

- Direct-mapped cache: A valid block at the same location must be evicted—no choice
- Associative cache: If all blocks in the set are valid, one must be evicted
  - Random policy
  - FIFO
  - LIFO
  - Least-recently used; requires extra data in each set
  - Most-recently used; requires extra data in each set
  - Most-frequently used; requires extra data in each set

#### Exercise: 2-way Set Associative Cache

| 0x74      |    | 18  |     | Cache       |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
|-----------|----|-----|-----|-------------|------|---|---|-------|---------------|-------|---|---|-------|---|---|-------|--|--|--|
| 0x70      |    | 17  | _ < | 2           |      |   |   | Set   |               |       |   |   |       |   |   |       |  |  |  |
| 0x6c      |    | 16  |     | le          |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
| 0x68      |    | 15  |     | Set 1       |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
| 0x64      |    | 14  |     |             |      |   |   | Sa    | Assume 8 byte |       |   |   |       |   |   |       |  |  |  |
| 0x60      |    | 13  |     | Set 0 Set 1 |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
| Access ta | ag | idx | off | h/m         |      |   | _ | ne 0  |               | ine 1 |   |   | ne 0  |   |   | ne 1  |  |  |  |
| rd 0x60   |    |     |     |             |      | 0 | 0 | 47 48 | 0 1           | 47 48 | 0 | 0 | 47 48 | 0 | 1 | 47 48 |  |  |  |
| rd 0x64   |    |     |     |             |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
| rd 0x70   |    |     |     |             |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
| rd 0x64   |    |     |     |             | Time |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
| rd 0x64   |    |     |     |             | F    |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
| rd 0x60   |    |     |     |             |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
| rd 0x70   |    |     |     |             | _ ↓  | , |   |       |               |       |   |   |       |   |   |       |  |  |  |
| rd 0x80   |    |     |     |             |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |
|           |    |     |     |             |      |   |   |       |               |       |   |   |       |   |   |       |  |  |  |

## Caching and Writes

- What to do on a write-hit?
  - Write-through: write immediately to memory
  - Write-back: defer write to memory until replacement of line
    - Need a dirty bit (line different from memory or not)
- What to do on a write-miss?
  - Write-allocate: load into cache, update line in cache
    - Good if more writes to the location follow
  - No-write-allocate: writes straight to memory, does not load into cache
- Typical
  - Write-through + No-write-allocate
  - Write-back + Write-allocate

### Exercise: Write-back + Write-allocate

#### Memory

| Access    | tag | idx | off | h/m |
|-----------|-----|-----|-----|-----|
| rd 0x10   |     |     |     |     |
| wr 8,0x10 |     |     |     |     |
| wr 9,0x24 |     |     |     |     |
| rd 0x24   |     |     |     |     |
| rd 0x20   |     |     |     |     |

| Access    | tag  | idx | off | h/m |
|-----------|------|-----|-----|-----|
| rd 0x10   | 0001 | 00  | 00  | m   |
| wr 8,0x10 | 0001 | 00  | 00  | h   |
| wr 9,0x24 | 0010 | 01  | 00  | m   |
| rd 0x24   | 0010 | 01  | 00  | h   |
| rd 0x20   | 0010 | 00  | 00  | m   |



Assume 4 byte data blocks

| Line 0 |   |    | L | Line 1 |    |   | _ine | e 2 | L | W |    |  |
|--------|---|----|---|--------|----|---|------|-----|---|---|----|--|
| 0      | 0 | 47 | 0 | 1      | 47 | 0 | 2    | 47  | 0 | 3 | 47 |  |
|        |   |    |   |        |    |   |      |     |   |   |    |  |
|        |   |    |   |        |    |   |      |     |   |   |    |  |
|        |   |    |   |        |    |   |      |     |   |   |    |  |
|        |   |    |   |        |    |   |      |     |   |   |    |  |
|        |   |    |   |        |    |   |      |     |   |   |    |  |

## Memory Hierarchy



## Typical Intel Core i7 Hierarchy

#### Processor package



L1 d-cache and i-cache: 32 KB, 8-way Access: 4 cycles

L2 unified cache: 256 KB, 8-way Access: 10 cycles

L3 unified cache: 8 MB, 16-way Access: 40-75 cycles

Block size: 64 bytes for all caches.

## **Caching Organization Summarized**

- A cache consists of lines
- A line contains
  - A block of bytes, the data values from memory
  - A tag, indicating where in memory the values are from
  - A valid bit, indicating if the data are valid
- Lines are organized into sets
  - Direct-mapped cache: one line per set
  - k-way associative cache: k lines per set
  - Fully associative cache: all lines in one set
- Caches handle both reads and writes
  - write-through: write to both cache and memory
  - write-back: write only to cache, write to memory on evict,
  - write-allocate: alloc on any miss
  - no-write allocate: alloc only on read miss