Memory
address subdivision -the index needs to be ___ bits because 2^10 = 1024, the number of locations
10
considering cache misses is important - column-major versus row-major - traversal of __x__ matrix of words
16x16
- in MIPS we have been using 32-bit address where the high ___ bits select ___, the lower __ bits select bytes _____ words
30 words 2 within
___ ___ data rate DRAM - transfer on rising and falling clock edges
DDR double
Caching gives us the illusion that we have ____, ____ memory
large fast
cache write policy: write through - a ____-_____ policy says we update the ____ and also ____
write-through cache memory
- what if there is no data in a location? - valid bit = __ if present, __ if empty
1 0
cache characteristics: - direct mapped, write-back, write allocate - block size: ___ words (16 bytes) - cache size: ___KB (_____ blocks) - ___-bit byte addresses - valid bit and dirty bit per ___ - blocking cache - CPU waits until access is ___
4 16 1024 32 block complete
address subdivision each location holds one word, ___ bytes, so we don't need to index the lower 2 bits of the address (byte ______) the remaining upper part of the address will be the ___
4 offset tag
Memory Hierarchy from top to bottom
CPU register cache memory main memory (DRAM) storage device
____ - ___ ____ memory module - used in PCs, laptops, servers - plugs into the ________
DIMM dual inline motherboard
This type of memory has to be periodically refreshed in order to retain its contents.
DRAM
list the 4 major types of memory
DRAM SRAM flash memory magnetic disk
_____-____ ___ ____ ____ - bit stored as a charge in a capacitor - a ____ transistor can be used to access the charge - must be periodically refreshed or contents are lost - refreshing is done by reading/writing - denser and cheaper than _____
DRAM-dynamic random access memory single SRAM
many modern processors also have ___ cache
L3
___ ___ data rate DRAM - separate DDR input and outputs
QDR quad
This type of memory is used for cache.
SRAM
_____-____ ___ ___ ____ - volatile memory - keeps its contents as long as it has power, but minimal power needed - used for ____ - ~ 6-8 transistors per bit
SRAM-static random access memory cache
The maximum size of physical memory that a CPU can access is determined by the size of its ____ ___ - a 16-bit address bus can access up to 2^16 = 64K memory locations - a 32-bit address bus can access up to 2^32 = 4G locations - a 64-bit address bus can access up to 2^64 = 16E locations
address bus
in memory hierarchy, data is only copied between ___ levels, cannot ___ levels
adjacent skip
Block placement is determined by ____
associativity
cache write policy: write ___ - on data-write hit, just update the block in cache and keep track if it is inconsistent with a "____" bit - when a ____ block is replaced in cache, write it back to memory
back dirty dirty
-DRAM is organized into ____ - each bank is a series of ____ - refreshing is done by reading/writing a row at a time - __ ____ opens or closes a bank - Act provides the row address, multiple cols can be read and buffered
banks rows pre signal
how do we know which particular block is stored in a location? - store ____ ____ as well as the data - actually, only need the ____-order bits, called the ___
block address high tag
DRAM ____ mode: supply successive words from a row with reduced ____
burst latency
Which entry to replace on a miss? _____: - LRU (___ ___ ____): complex and costly for ___ associativity - _____: easier to implement
cache least recently used high random
- memory closest to the ALU - very fast but expensive
cache memory
_____ across hierarchy At each level in the hierarchy: - block _____ - finding a ____ - replacement on a ___ - ____ policy
caching placement block miss write
_____ misses - due to ____ cache size - occur when a _____ block is later needed
capacity finite replaced
Cache _____ _____ cores (processors) on a single chip Typically the cores share the _____ ____ space Cores may end up with different values for a given ____
coherence multiple physical address location
Sources of misses - the 3 Cs:
compulsory capacity conflict/collision
____ misses (cold start misses) occur on the ___ access to a block
compulsory first
____ misses aka collision miss - would NOT occur in a ____ ______ cache - due to _____ for entries in a set or entries
conflict fully associative competition
Cache size: - the size of cache is a function of the ___ ____ plus the extra bits for the ___ field and valid __ field - in MIPS, words are aligned as multiples of __ bytes, the least significant __ bits of the address can be ignored
data storage tag bit 4 2
- larger blocks will ____ miss rate due to ____ locality - but, larger blocks mean we have ___ of them and thus more competition for those blocks; this might ____ miss rate - larger blocks mean a larger miss penalty due to the increased ___ to copy a larger block
decrease spatial fewer increase time
Increased associativity ____ ____ rate but with diminishing returns
decreases miss
- location determined by address - ___ ____: only one choice - [block address] modulo [number of blocks in cache] - number of blocks is a power of __ - use ___-order bits for address
direct mapped 2 low
This type of cache organization provides the fastest access.
direct-mapped
three types of cache associativity:
direct-mapped fully associative n-way set associative
the faster the memory, the more ___
expensive
If a block of memory is not found in cache, the block can be copied directly into cache from wherever it is found in the memory hierarchy, including RAM or magnetic disk.
false
On page ____, the page must be fetched from ____ - takes ____ of clock cycles - handled by the ____ system
fault disk millions operating
If page is not in memory (page _____) - OS handles fetching the page and updating the page ______ - then restart the faulting ____
fault table instruction
This type of memory can wear out after 1000s of accesses.
flash memory
- nonvolatile semiconductor storage - EEPROM (electrically erasable programmable read-only memory) - faster, requires less power, more robust than magnetic disk
flash storage
- allow a block to go to any free cache location - requires all entries to be searched
fully associative
This type of cache organization utilizes the most cache locations, keeping the cache fuller.
fully associative
Finding a block: _____ caches - goal is to reduce ____ to reduce time ____
hardware comparison cost
Memory _____: L1, L2 maybe L3 cache, DRAM (main memory), disk
heirarchy
- on a cache ___, CPU proceeds normally
hit
cache write policy: write through on data-write ___, we could just update the block in ___, but then cache and memory would be ______
hit cache inconsistent
cache write policy: write through -________: use a write ____ to hold data to be written to memory; only ___ when buffer is ___
improvement buffer stall full
miss rates tend to go up as the block size ____
increases
- when CPU performance ____, miss penalty becomes more significant - as time is spent on memory stalls, ____ decreases - as clock rates _____, memory stalls account for more ___ ___
increases CPI increase cpu cycles
_____ cache miss requires ____ instruction fetch ____ cache miss requires ______ the processor until we have the data
instruction restart data stalling
To reduce page fault rate, prefer LRU (___ ___ ___) _____ - _____ (use) bit in PTE set to __ on access to page - periodically cleared to 0 by OS - page with reference bit = __ should be chosen to be ____
least recently used replacement reference 1 0 replaced
Cache: - relies on the principle of ____ to try to find the needed data in the ____ levels of the memory hierarchy - if the data is not there, it will be retrieved from memory ____ in the hierarchy - hit rates in modern computers are often >___%
locality highest lower 95
Principle of ____: programs use a ____ part of their ____ space and reuse this space frequently
locality small memory
- nonvolatile memory - high capacity (10s of terabytes) - slow access
magnetic disk
In the memory hierarchy, SRAM is the fastest and this type of memory is the slowest.
magnetic disk
DRAM- for ___ ___ RAM SRAM-for ___ flash memory- for BIOS and solid-state ___ magnetic disk- for large ___ drives
main memory cache drives storage
Misses depend on ____ ____ patterns - _____ behavior - compiler _____ for memory access
memory access algorithm optimization
Disk writes take ____ of cycles - block at a time, not individual locations, are copied - write-through is impractical, so write-back is used, setting a ____ bit in the PTE when page is written
millions dirty
on a cache ___ - ___ the CPU pipeline - fetch ____ from the next level of the hierarchy
miss stall block
TLB _____: If page is in memory - load the ____ from memory and retry - could be handled in hardware or ______
misses PTE software
Memory system design is critical for ______
multiprocessors
locating data - cache with 2^k blocks, each block has 2^___ bytes - k bits of the address select one of the 2^k cache blocks - the lowest n bits are a block ____ that decides which of the 2^n bytes in the cache block will store the data - the upper part of the address is the ___
n offset tag
- each set contains n entries - block number determines the set - search all entries in the set
n-way set associative
Replacement policy Direct mapped: __ ____ Set associative: - prefer ___-____ entry - otherwise; choose among entries in the set Least recently used: LRU, choose the one that is ____ for the longest time Random: gives about the ___ performance as LRU for high associativity
no choice non-valid unused same
If the page is ___ _____, the PTE can refer to a location in ____ space on ____
not present swap disk
- accessing memory will take ___ access to get the physical address and another access to get the data - to speed things up, have a special cache - the ______ - to keep track of recently used address _____ to try to avoid accessing the page table, which is in _____
one TLB mappings RAM
+ ___-of-___ CPUs can execute other instructions during cache ___ - pending ___ stays in load/store unit - dependent instructions wait in ____ stations + Effect of misses hard to analyze, system simulations usually used
out-of-order miss store reservation
In virtual memory: - a "block" is called a "____" - a "miss" is called a "____ _____"
page page fault
Try to minimize the ____ ____ rate - fully _____ placement - smart ______ algorithms
page fault associative replacement
Stores placement information in an array of ___ ____ entries, indexed by virtual ___ _____
page table page numbers
If page is ____ in memory, the PTE (___ ___ ____) stores the physical page ____ plus other status bits (referenced, dirty, ...)
present page table entry number
Multilevel caches + _____ cache (L1) attached to CPU - small, but fast; goal: minimal hit time + L2 level 2 cache services misses from __ cache - ____, slower, but ____ than main memory - goal: low miss rate to avoid main memory access + Main memory services ___ cache misses + Some ____-___ systems include L3 cache
primary L1 larger faster L2 high-end
programs access a small portion of their address space at any time
principle of locality
Higher associativity ____ miss rate but increases ___, ___, and ___ time
reduces complexity cost access
Fast memories are ____, large memories are ___
small slow
- items near those recently accessed are likely to be accessed soon - ex: instructions in sequence, array data
spatial locality
Array elements are an example of: __________________
spatial locality
Instructions within a loop are an example of:
temporal locality
items accessed recently are likely to be accessed again soon - ex: instructions in a loop, induction variables
temporal locality
What does TLB stand for
translation lookaside buffer
if data is present in ___ level, it is a ___, if not it is a ___
upper hit miss
Which entry to replace on a miss? ____ memory: - ____ approximation with ____ support
virtual LRU hardware
Finding a block: _____ memory - table lookup makes full ______ feasible and has the benefit of reducing ____ rate
virtual associativity miss
_____ memory uses main memory as a "_____" for secondary disk storage and is Managed jointly by ____ hardware and the _____ system
virtual cache CPU operating
_____ memory - level of the memory hierarchy that manages _____ between the ____ memory (RAM) and _____ memory (disk)
virtual caching main secondary
The CPU and OS translate ____ addresses to ____ address
virtual physical
- A way to make it appear that the system has more memory than the actual physical memory.
virtual memory
______ ______: - allows a single program to expand its address space beyond physical memory - allows many programs to share physical memory space in a ____ manner
virtual memory protected
___ ___: - update upper level only - update lower level when block is replaced - need to keep more state information
write back
This type of cache write policy has the disadvantage of being slower than other policies.
write through
___ ____: - update both upper and lower levels - simplifies replacement, but may require a write buffer
write through
Write policy for Virtual memory: only ____-____ is feasible, given disk latency
write-back