OS
How does OS view physical memory?
Array of slots, tracks if free or if in use When new process = created, needs to find room for the new address space and then mark it used.
Performance with inc num of CPUs?
As the number of CPUs grows, access to a synchronized shared data structure becomes quite slow.
Pros of dynamic relocation
Base & bounds registers Efficient (quick to add base & check if it's within bounds) & easy to relocate processes to different parts of physical memory Protection (no process accesses memory outside its address space)
User mode
CPU = limited in what it can do
Paging
Cutting space up into fixed sizes Instead of splitting up a process's address space into some number of variable-sized logical segments (e.g., code, heap, stack), we divide it into fixed-sized units, each of which we call a page
Pros of paging
Flexibility Simplicity --> when trying to place pages, just puts them in the first free space it sees
Where to store page table for each process?
In memory
Simple solution to reducing page table size
Increase the size of each page -- get a lot of internal fragmentation though
Calculating num bits for VA
log2(Virtual Address Space)
TLB valid bit
whether the entry has a valid translation or not
Dirty bit
whether the page has been modified since it was brought into memory.
Calculating page table size
(2^number of VA bits/2^number of page size bits) * size of each page table entry
Code of the program
(the instructions) have to live in memory somewhere, and thus they are in the ad- dress space.
Pros & cons of segmentation
- Support sparse address spaces, by avoiding the huge potential waste of memory between logical segments of the address space. - Code sharing ---- Problems with allocating variable sized memory spaces too - free memory gets chopped into weird pieces, diff to satisfy memory allocation request
How to move a process' address space?
1) deschedule the process 2) copy the address space from the current location to the new location 3) update the saved base register (in the process structure) to point to the new location
Calculating total number of translations
2^VPN size (number of bits the VPN can hold)
Calculating PA using segmentation
Add base value of VA to the offset of the segment virtual address offset + starting memory address of segment = physical address physical address < total_size? reference that physical address else: call OS' trap function, terminate the process == seg fault
How to reduce TLB flushing overhead?
Address space identifier - similar to process ID but with fewer bits Differentiate identical VA's using the ASID Hold translations for different processes at the same time without confusion -- OS needs to set privileged register of ASID to the current process on a context swtich
Explicit approach for getting memory
Allocate top two bits for segment (00 = code, 01 = heap) & uses that base/bounds register for later math Uses remaining bits for offset into that segment Add base register to offset for final physical address
General rules
Calculate VA, VPN & offset Translate VPN --> PPN Append offset to the end of PPN
Placement of things in address space
Code = static (and thus easy to place in memory), place it at the top of the address space & it won't need more space as program runs Heap underneath this (in middle) stack @ bottom
Hardware translation in UM
Does translation, checks if address is valid using bounds register & CPU circuitry
Stack memory
Done implicitly by the compiler for you
Bus snooping
Each cache pays attention to memory updates by observing the bus that connects them to main memory. - When CPU sees update for item in cache, can invalidate or update
Multi queue scheduling
Each queue will follow a particular scheduling discipline (ie: round robin) When a job enters the system, it is placed on exactly one scheduling queue, according to some heuristic (e.g., random, or picking one with fewer jobs than others) Scheduled independently, thus avoiding the problems of information sharing and synchronization found in the single-queue approach.
Flush TLB on context switches
Empties TLB before running the next process Software system: done in kernel mode instruction Hardware system: do when the page-table base register is changed (since OS changes page table base register anyways) --- Sets all valid bits to 0
Random eviction policy
Evict TLB mapping at random -- simple -- avoids corner cases
Calculating PA using segmentation (heap)
Find the bytes in this segment that the VA references Offset = current VA - starting memory address of segment Do offset + base physical address to get the final physical address
Work for paging
For every memory reference (whether an instruction fetch or an explicit load or store), paging requires us to perform one extra memory reference in order to first fetch the translation from the page table.
TLB Algorithm (generally)
Get VPN from VA, check if TLB has translation for that VPN - if so, get a TLB hit - can get the PPN from TLB, concat onto the offset for VA, create PA & access memory If TLB miss: 1) Go to page table to grab the translation 2) Update TLB with translation 3) Hardware retries the instruction, finds it in TLB, repeat above Expensive bc extra memory references needed to access the page table
How to form the physical address
Get physical address, fetch the PTE from memory, extract the PFN, and concatenate it with the offset from the virtual address to form the desired physical address --- offset = VirtualAddress & OFFSET_MASK PhysAddr = (PFN << SHIFT) | offset
Isolation
Given by protection Each process should be running in its own isolated co- coon, safe from the ravages of other faulty or even malicious processes.
Stack translation
Grows backwards, needs to be processed differently -- Need base/bounds registers & hardware support to know if stack grows upwards or downwards
Internal fragmentation
Handing out more memory than a process needs Get wasted space inside the allocated memory region
Segmentation
Have base & bounds registers per logical segment of address space -- can place each segment in different parts of physical memory, don't fill with unused virtual address space only used memory is allocated space in physical memory, and thus large address spaces with large amounts of unused address space can be accommodated.
cost of TLB flushing
High cost of misses every time you switch processes, especially if switching processes often
When TLB miss happens
High cost of paging - access page table for translation - cost of extra memory reference = incurred Hardware raises an exception (pausing current instruction stream), raises privilege level to kernel mode, jumps to a trap handler Trap handler looks up translation in page table, uses kernel mode to update TLB, returns from trap Hardware then retries instruction --> TLB hit
Implicit approach for calculating PA
If address = generated using PC, it's in code if address = based on stack/base pointer, it's in stack Else: heap
Sbrk
Like brk, given an increment as well
Efficiency
Make virtualization time (i.e., not making programs run much more slowly) & space efficient (not using too much memory for structures needed to support virtualization) --- Relies on hardware support like TLBs to do this too
Do you translate offset?
No, tells us which byte within the page we want
Kernel Mode
OS can access the whole machine
What does OS do when a context switch happens?
OS must save and restore the base-and-bounds pair when it switches between processes
TLB
Part of MMU hardware cache of popular virtual-to-physical address translations --- On each virtual memory reference, the hardware first checks the TLB to see if the desired translation is held there; if so, the translation is performed (quickly) without having to consult the page table (which has all translations)
Memory Management Unit
Part of processor that helps with address translation
modifying base & bounds registers
Privileged instructions Allows OS to change the registers when diff programs run
Virtualizing memory
Running program thinks it is loaded into memory at a particular address (say 0) and has a potentially very large address space (say 32-bits or 64-bits) -- not true at all
What does OS do when it stops running a process?
Save the values of the base and bounds registers to memory, in process control block (PCB) --- When process = stopped, OS can move an address space from one location in memory to another rather easily.
Exceptions in address translation
Should happen when a user program tries to access memory illegally (with an address that is "out of bounds"), such as : -- CPU hould stop executing the user program and arrange for the OS "out-of-bounds" exception handler to run. -- if a user program tries to change the values of the (privileged) base and bounds registers, the CPU should raise an exception and run the "tried to execute a privileged operation while in user mode" handler.
Inefficiencies of dynamic relocation
Space between stack & heap = wasted -- Basically, put the entire address space of each process in main memory - no guarantee that main memory can hold full address space either
Segmented page table
Still have base & bound/limit registers Base register = used to hold the physical address of the page table of that segment. The bounds register is used to indicate the end of the page table (i.e., how many valid pages it has). unallocated pages between the stack and the heap no longer take up space in a page table (just to mark them as not valid).
Translating stack address to PA
Subtract max segment size from the address (after lobbing off the top two bits) Add negative offset to the base register to get the physical address Do bounds checking by making sure absolute val of neg offset < segment size
TLBs & Context switches
TLB contains virtual-to-physical translations that are only valid for the currently running process - not meaningful for other processes --- Solve this by flushing the TLB
Transparency
The OS should implement virtual memory in a way that is invisible to the running program. program shouldn't be aware of the fact that memory is virtualized; should behave as if it has its own private physical memory
Problem with page tables
Too big Consume too much memory - (see flashcard about calculating page table size. Multiply that PER PROCESS) --> gigantic!
Dynamic relocation
Translating virtual addresses to physical addresses at runtime
Goals of VM
Transparency Efficiency Protection
Page tables
Used to map virtual addresses to physical addresses The OS indexes the array by the virtual page number (VPN), and looks up the page-table entry (PTE) at that index in order to find the desired physical frame number (PFN)
Finding location of desired PTE
VPN = (VirtualAddress & VPN_MASK) >> SHIFT PTEAddr = PageTableBaseRegister + (VPN * sizeof(PTE)) ---------- use this value as an index into the array of PTEs pointed to by the page table base register to get physical address
Calculating VPN
VPN = VA - offset
Bits in TLB
VPN | PPN | other where other == : * valid bits * protection bit * address-space identifier * dirty bit
Contents of each page table entry
Valid bit Protection bit present bit dirty bit reference bit read/write bit user/supervisor bit (determines if user-mode can access page) accessed bit page frame number/physical page number
mmap()
Way to get memory from the OS create an anonymous memory region within your program — a region which is not associated with any particular file but rather with swap space can then be treated like a heap by calling mmap() on an already opened file descriptor, a process is returned a pointer to the beginning of a region of virtual memory where the con- tents of the file seem to be located. By then using that pointer, a process can access any part of the file with a simple pointer dereference.
Stack
What running program uses to keep track of: where it is in the function call chain allocate local variables pass parameters and return values to and from routines.
TLB valid bit vs page table valid bit
When invalid on page table: * page has not been allocated by the process, and should not be accessed by a correctly-working program * OS responds by killing the process, executing the next instruction after this one TLB valid bit = whether a TLB entry has a valid translation within it. By setting all TLB entries to invalid, the system can ensure that the about-to-be-run process does not accidentally use a virtual-to-physical translation from a previous process --> helpful for context switches
calloc()
allocates memory and also zeroes it be- fore returning; this prevents some errors where you assume that memory is zeroed and forget to initialize it yourself
Swapping
allows the OS to free up physical memory by moving rarely-used pages to disk.
Segment
contiguous portion of the address space of a particular length -- 3 segments: code, heap & stack Need a set of 3 base/bound register pairs to support this kind of model
TLB protection bit
determine how a page can be accessed (as in the page table). For example, code pages might be marked read and execute, whereas heap pages might be marked read and write
Heap
dynamically-allocated, user-managed memory (malloc() calls, for example) statically-initialized variables
Bounds/limit register
ensures that such addresses are within the confines of the address space. --- Processor always checks the translated memory address = within bounds and therefore legal
address translation
hardware transforms each memory access (e.g., an instruction fetch, load, or store), changing the virtual address provided by the instruction to a physical address where the desired information is actually located.
Spatial locality
if a program accesses a data item at address x, it is likely to access data items near x as well
Issues with segmented page tables
if we have a large but sparsely-used heap, for example, we can still end up with a lot of page table waste. causes external fragmentation to arise again. While most of memory is managed in page-sized units, page tables now can be of arbitrary size (in multiples of PTEs). Thus, finding free space for them in memory is more complicated.
valid bit
indicate whether the particular translation is valid Program starts with code & heap at one end, stack on the other Unused space = invalid, if program tries to access invalid memory, generates a trap for OS to kill the process
Present bit
indicates whether this page is in phys- ical memory or on disk (i.e., it has been swapped out)
Protection bits
indicating whether the page could be read from, written to, or executed from. Accessing a page not allowed by those bits generates a trap to the OS
Calculating num bits for offset
log2(page size)
realloc()
makes a new larger region of memory, copies the old region into it, and returns the pointer to the new region. allocated space for something (say, an array), and then need to add something to it
Advantage of MQMS vs SQMS
more scalable provides cache affinity --- Problem with load imbalance (can have an empty CPU and a CPU with 3 jobs, runs round robin on the jobs while CPU1 stays empty)
TLB hit rate
number of hits divided by the total number of accesses
Accessing memory from paging code
offset = VirtualAddress & OFFSET_MASK PhysAddr = (PFN << SHIFT) | offset // Extract the VPN from the virtual address VPN = (VirtualAddress & VPN_MASK) >> SHIFT // Form the address of the page-table entry (PTE) PTEAddr = PTBR + (VPN * sizeof(PTE)) // Fetch the PTE PTE = AccessMemory(PTEAddr) // Check if process can access the page if (PTE.Valid == False) RaiseException(SEGMENTATION_FAULT) else if (CanAccess(PTE.ProtectBits) == False) RaiseException(PROTECTION_FAULT) else // Access is OK: form physical address and fetch it offset = VirtualAddress & OFFSET_MASK PhysAddr = (PTE.PFN << PFN_SHIFT) | offset Register = AccessMemory(PhysAddr)
How stack & heap grow
opposite directions Heap: grows downwards Stack: grows upwards
Translation equation
physical address = virtual address + base
Page frames
physical memory as an array of fixed-sized slots - each frame can have a single virtual memory page
Protection
protect processes from one another as well as the OS itself from pro- cesses. --- When one process performs a load, a store, or an instruction fetch, should not access or affect in any way the memory contents of anything outside its address space
Single queue multi scheduling
putting all jobs that need to be scheduled into a single queue; does not require much work to take an existing policy that picks the best job to run next and adapt it to work on more than one CPU does not scale well (due to synchronization overheads), and it does not readily preserve cache affinity.
What does OS do when process = terminated?
reclaiming all of its memory for use in other processes or the OS. put its memory back on the free list clean up any associated data structures
Page table
records where each virtual page of the address space is placed in physical memory per process --- Stores address translations for each of the virtual pages of the address space, letting us know where in physical memory each page resides.
Address space
running program's view of memory in the system memory that the process can address = part of the process
Caches
small, fast memories that (in general) hold copies of popular data that is found in the main memory of the system
Program's perspective of address space
starts at address 0 and grows to a maximum of 16 KB;
Why is valid bit crucial?
supporting a sparse address space marks all the unused pages in the address space invalid, then removes the need to allocate physical frames for those pages and save a great deal of memory.
LRU
take advantage of locality in the memory-reference stream, assuming it is likely that an entry that has not recently been used is a good candidate for eviction.
External fragmentation
the free space gets chopped into little pieces of different sizes and is thus fragmented; subsequent re- quests may fail because there is no single contiguous space that can sat- isfy the request, even though the total amount of free space exceeds the size of the request.
Reference/accessed bit
track whether a page has been accessed, and is useful in determining which pages are popular and thus should be kept in memory
brk sys call
used to change the loca- tion of the program's break: the location of the end of the heap -- Takes address of the break, either increases or decreases the size of the heap based on whether the new break is larger or smaller than the current break. NEVER call directly!
Base register
used to transform virtual addresses (generated by the pro- gram) into physical addresses
Temporal locality
when a piece of data is accessed, it is likely to be accessed again in the near future;
Cache affinity
when run on a particular CPU, builds up a fair bit of state in the caches (and TLBs) of the CPU. - should try to keep a process on the same CPU if at all possible.