CS 354 Exam 2
How many blocks map to each set for a 32-bit AS and a cache with 1024 sets?
(2^(32 -5) ) / 2^10= 2^27 / 2^10 = 2^17 blocks/set -this is 2^t where t -bits are the bits of the addr that identify which block in set
given pointer to the first block in heap, how is next block found
(void *)ptr + current block size
2^10 bytes
1 kilobyte (kb)
2^100 bytes
1 megabyte (mb)
The caching system uses locality to predict what the cpu will need in the near future: 1. temporal 2. spatial
1. anticipates data will be reused so it copies values into cache 2. anticipates nearby data will be used so it copies a block
1. Eb 2. Imm 3. s
1. base register (starting address) 2. offset value 3. scale factor can be 1,2,4,8
1. word offset 2. byte offset
1. identifies which word in block 2. identifies which byte in word
Rethinking Addressing: -An address identifies which byte in VAS to access -An address is divided into parts to access memory in steps Step 1: ? Step 2: ? Step 3: ?
1. identify which block in VAS 2. Identify which word in block (3 bits) 4. Identify which byte in word (2 bits)
The block number bits of an address are divided into 2 parts 1. set 2. tag
1. maps block to specific set in cache 2. uniquely identifies block in the set
Goals of Allocator design: 1.maximize throughput 2.maximize memory utilization
1. number of malloc and frees handled (/time) 2. memory requested/ heap allocated
Free List Ordering 1. address order 2. last-in order
1. order free list from low to high address (+) malloc with FF has better memory utilization (-) free slower O(N) where N is # of free block 2. place most recently freed block at front of EFL (+) malloc with FF for programs that request same size (+) free O(1) just link at head of EFL (+) O(1) -> coalesce with footers
Memory Units 1. word 2. block 3. page
1. size used by CPU (L1 -CPU) 2. size used by C (C levels & MM) 3. size used by MM (transfer between MM && SS)
%ax
2 bytes
How many 32 -byte blocks in a 32-bit address space?
2^32 / 2^5 = 2^27 = 2^7 * 2^20 = 128mB
word double word
4 bytes 8 bytes
Cache Block: How big is a block? Let B be number of bytes per block, IA-32 is 32B/block
B = 2^b 32 = 2^5 b = number of addr bits to determine which byte in block
C = (S,E,B,m)
C(size in bytes of cache) = SxExB S = number of sets E = number of lines / set B = bytes/block m = bit address space, number of bits required to access mem locations
Explicit Free List Layout
Header = block size + 0pa <- pred Free Block (addr of free block before) succ Free Block (addr of free block after)-> possibly more free words Footer = only block size
placement policies
L1 - unrestricted L2 - restricted (block % 16)
Cache Block: How many bytes in an address space? The bits of an address are used to determine if block containing that addr is in cache. Let M be # of bytes in AS, IA-32 is 4GB
M = s^m m is number of bits in an address 4GB = 2^32
void *realloc(void *ptr, size_t size)
Reallocates to size bytes a previously allocated block of heap memory pointed to by ptr, or returns NULL if reallocation fails.
Cache size =
S(number of sets) x B(number of bytes)
void *sbrk(intptr_t incr)
SAFER attempts to change the programs's top of heap by incr(+/-) bytes. Return the old brk if successful, else -1 and sets errno
Free List Segregation
Use an array of free lists segregated by size malloc chooses appropriate free list based on required size simple: one free list per size fitted: one free list for each size range
fully associative cache
a cache having one set with E lines per set where mem block stored in any line -good for small caches
miss penalty
additional time to process a miss
which of the following contribute to external fragmentation A. block padding B. block headers C. adjacent free block D. adjacent allocated blocks E. block payloads F. non adjacent free blocks
adjacent free blocks, non adjacent free blocks
void *calloc(size_t nItems, size_t size)
allocates, clears to 0, and returns a block of heap memory of nItems * size bytes. Or returns NULL if allocation fails
cache hit
block is found in cache
cache miss
block not found in cache
victim block
cache block chosen to be replaced
direct mapped cache
cache have S sets with one line per set where mem blocks map to one set -good for big caches
an alllocater _____ reorder allocate requests to improve heap memory utilization
cannot
called coalescing
coalesce only if use calls coalesce function
delayed coalescing
coalesce only when need by an alloc operation
immediate coalescing
coalesce with next and previous on free operation
unistd.h
collection of system call wrappers
Explicit Free List
data structure with list of free blocks -components point to payload addresses (-)space (+)time
%eax
e = extended 4 bytes
External Fragmentation
enough heap memory but divided into blocks that are too small
register
fastest memory directly accessed by ALU can store 1,2,4 bytes
Memory Hierarchy: CPU L0 -> registers L1 - L3 -> cache L4 -> main memory L5 -> Local Secondary Storage (SS) L6 -> Network Storage
gives illusion of having lots of fast memory
false fragmentation
have large enough contiguous free blocks, but are divided into blocks that are too small
set associative cache
having S sets with E lines per set, mem blocks map to 1 set and can be in any line within that set.
Internal Fragmentation
heap memory is in a block used for overhead
Cache Performance: More Lines
hit rate: better more temporal locality hit time: worse, slower line matching miss penalty: worse, harder/slower to detect miss -therefore faster caches have fewer lines per set
Cache Performance: Larger Blocks
hit rate: better, more spatial locality per block hit time: same miss penalty: worse, more time to transfer larger block -therefore block sizes are small, 32 bytes or 64 bytes
Cache Performance: More Sets
hit rate: better, more temporal locality hit time: worse, more sets slow set selection miss penalty: same -therefore, faster caches have fewer sets
Associativity of a set
how many lines per set (E)
%eip
instruction pointer to next instruction
DIY Heap via Posix Calls
int brk(void *addr) void *sbrk(intptr_t incr) errno
cache line
location in cache that can store one block of memory composed of storage for the block of data and info needed for cache operation
latency
memory access time (delay)
Stride Misses
min( 1, (wordsize * k) / B) * 100 where k is stride lenght in words and B is blocksize in bytes
Least Frequently Used Replacement
must track how often a line is used each line has a counter -zeroed when line gets new block -incremented when line is accessed -if tie, choose randomly for replacement
hit rate
number of hits / number of memory accesses
Write Hits
occur when writing to a block that is in this cache
Write Misses
occur when writing to a block that is not in this cache
Memory operand specifier: Imm
operand value: M[EffAddr] effective address: Imm Addressing mode: Absolute
Memory operand specifier: Imm(%Eb)
operand value: M[EffAddr] effective address: Imm + R[%Eb] addressing mode: base + offset
Memory operand specifier: Imm(%Eb,%Ei)
operand value: M[EffAddr] effective address: Imm + R[%Eb] + R[%Ei] addressing mode: indexed + offset
Memory operand specifier: Imm(%Eb,%Ei,s)
operand value: M[EffAddr] effective address: Imm + R[%Eb] +R[Ei]*s addressing mode: scaled index
Memory operand specifier: (%Ea)
operand value: M[EffAddr] effective address: R[%Ea] Addressing mode: Indirect
Memory operand specifier: (%Eb,%Ei)
operand value: M[EffAddr] effective address: R[%Eb] + R[%Ei] addressing mode: indexed (base + index)
Memory operand specifier: (%Eb,%Ei,s)
operand value: M[EffAddr] effective address: R[%Eb] +R[%Ei]*s addressing mode: no offset
Write Allocate
read block into cache first then write to it (-)must wait to read from lower level
word offset
remainder of b bits that remain after taking the 2 least significant bits as byte offsets of word
cold miss
room in cache but block not there
errno
set by OS function to communicate error #include <errno.h> printf("Error. %s\n", strerror(errno));
working set
set of blocks used during some interval of time
int brk(void *addr)
sets the top of heap to the specified address addr. Returns 0 if successfull, else -1 and sets errno
temporal locality impacts:
size
cache
smaller faster mem that acts as a staging are for data stored in a larger slower mem
Memory Mountain
smaller size, smaller stride => faster throughput (MB/s)
immediate operand
specifies an operand value that's a constant Specifier = $Imm Operand Value = Imm
Register operand
specifies an operand value that's in a register Specifier = %Ea Operand Value = R[%Ea]
Memory operand
specifies an operand value that's in memory effective address
destination (D)
specify location for destination (write)
source (S)
specify location of source (read)
Best-Fit
start from beginning stop at END_MARK and choose best fit closest to required size or stop early if exact match fail if no block is big enough
First-Fit
start from beginning of heap stop at first block that is big enough fail if reach the END_MARK
Next-Fit
start from block most recently allocated stop at first free block that is big enough fail if reach first block we checked (wrap around)
stride
step size is measure in words (4 bytes) good spatial locality when stride is about 1 word
spatial locality impacts:
stride
hit time
time to determine cache hit
Least Recently Used Replacement
track when line was last used use LRU queue - when line is used move to front use status bits to track
conflict miss
two or more blocks map to the same location
cache block
unit of memory transferred between main memory and cache level ex. 32 bytes/block in IA-32
%ah, %al
upper and lower 8-bit halves of the 16-bit %ax register, respectively.
Implicit Free List
use heap blocks to track size and status (+)space (-)time
how do you know if a line in the cache is used or not?
use status bit, v-bit, if v = 1, cache block is copied to cache line
cpu cycles
used to measure time
temporal locality
when a recently access memory location is repeatedly accessed in the near future
spatial locality
when a recently accessed memory location is followed by nearby memory locations being accessed in the near future
capacity miss
when cache is too small for working set
set
where block is uniquely mapped to in a cache
No Write Allocate
write directly to next lower level bypassing this cache
Write Back
write to next lowe level only when changed
Write Through
write to this cache and next lower level cache