Cache Part 3
Cache miss with all valid entries for direct mapped vs set associative cache
Which requires a replacement algorithm?
helps temporal locality
Why should you repeat references to variables?
step through columns in one row
accesses successive elements. Exploits spatial locality
Blocking (Matrix Mutiplication)
adding subblocks of the matrices to be more efficient then just the matrices themselves.
write-allocate
after write miss, write into cache and then follow write hit procedure
no-write-allocate
after write miss, write only to main mem. data is only loaded to cache on read miss
NRU replacement
blocks are put into a hierarchy on whether they have been referenced or modified during a certain time frame
Effect of Block size to consider
can be used to exploit spatial locality
write-miss
data to be written into the cache is not present
write-hit
data to be written into the cache is present
write-back
defer the write to main mem as long as possible, using dirty bits to denote which blocks have not been added to main memory
Row major order (C)
each row in contiguous memory locations
fully associative cache
entire cache is one sett
set selection of set associative cache
identical to direct-mapped cache
dirty bit
indicated whether a cache block has been written into yet
strictly inclusive Cache
install in both with miss, when lower level need to be evicted, evict block if present in higher level(back invalidation)
Effect of total cache size to consider
keep the working set small to exploit temporal locality (e.g. using blocking)
LRU cache
least recently used block. Keeping track of this can be overly expensive
block offset size
log_2(# of block offset bits)
set index size
log_2(# of sets)
set associative cache
more than one line per set
Line matching and word selection
must compare the tag in each valid line in the selected set
stride 1 reference pattern
reference array elements in succession, helps spatial locality
tag bits size
t = (# of mem add bits) - ((# set index bits) + (# of block offset bits)
Hit Time
time to deliver a line from cache to processor (generally measured in clock cycle, i.e. 1 clock for L1 or 3-8 clock cycle for L2)
Conflict Miss
too many blocks mapped to a single set (increase associativity to reduce)
write-through caches are typically...?
typically no-write-allocate
write-back caches are typically...?
typically write-allocate
write-through
write to both the cache and main mem immediatedly
Average Memory Access Time(AMAT)
Access Time + Miss Rate x Miss Penalty
Cold Miss
Access a location for 1st time (can be decreased with larger blocks, at the cost of spatial locality)
Miss Penalty
Additional time required due to a miss (ex. 25-100 cycles for main mem)
Cache Size
C = 2^(# of sets) x 2^(bytes per cache block) X (lines per set)
FIFO replacement
First block access is the first block to be evicted
Miss Rate
Fraction of mem references not found in cache
evicting blocks with write-back cache
If all blocks have been filled, copy the block to be evicted to a lower level and then write
mostly inclusive cache
Install in both with miss, but does not require back invalidation
Capacity Miss
More Blocks are active than can fit in a cache
exclusive cache
Only place in the lower level when higher level need to be evicted
Prefetch
Predicting what block will be needed next to reduce potential misses(improper use can worsen performance)