Operating Systems: Multiple Processor Systems
Master-Slave multiprocessors
1 copy of OS on master CPU. all system calls redirected there. if too many CPUs, master will become bottleneck
3 parts of gang scheduling
1. groups of related threads are scheduled as a unit, a gang 2. all members of a gang run simultaneously, on different timeshared CPUs 3. all gang members start and end their time slices together
Three System Types
1. share memory MP 2. message passing multicomputer 3. wide are distributed system
characteristics of NUMA machine
1. single address space visible to all cpus 2. access to remote memory is via LOAD and STORE instructions 3. access to remote memory is slower than access to local memory
UMA w. single bus
2+ CPUs and 1+ memory modules all use same bus. limited by bandwidth of bus
UMA w. caching and private memory
Each CPU has cache and local private memory. Compiler place all program text, strings, constants, and other read only data, stacks, and local variables in private memories shared memory for writeable shared variables
node peripherals b.w 3 MP, multicomputer, & distributed system
MP - all shared MC - shared exc, maybe disc DS - full set per node
node configuration b.w 3 MP, multicomputer, & distributed system
MP - cpu MC - cpu, ram, net interface DS - complete computer
administration b.w 3 MP, multicomputer, & distributed system
MP - one organization MC - one organiztion DS - many organizations
file systems b.w 3 MP, multicomputer, & distributed system
MP - one shared MC - one, shared DS - each node has its own
operating systems b.w 3 MP, multicomputer, & distributed system
MP - one, shared MC - multiple, same DS - possibly all different
location b.w 3 MP, multicomputer, & distributed system
MP - same rack MC - same room DS - possibly worldwide
internode communication b.w 3 MP, multicomputer, & distributed system
MP - shared ram MC - dedicated interconnect DS - traditional network
remote procedure call
allows programs to call procedures located on other cpus
shared memory multiprocessor (MP)
computer system in which 2+ CPUs share full access to a common ram
processor allocation algorithms
decision about which process should go on which node
distributed shared memory (DSM)
each machine has own virtual memory and page, trap to OS when attempt to use page it doesnt have, OS locates the page
Symmetric MP
eliminates asymmetry - 1 copy of OS in memory but any cpu can run it. no bottle neck. associates mutex (lock) w. OS. any cpu can run OS, but onyl 1 at a time (big kernel lock). can have multiple critical regions by splitting OS into dependent regions
nonblocking call
if send is nonblocking, it returns control to the called immediately before the message is sent adv: can compute parallel disadv: sender cannot modify, doesnt know when the transmission is done
UMA w. caching
less traffic on bus, each block marked as read only or read-write
4 parts of multistage switching network message
module - which memory to use address - specify address w.in module opcode - gives operation (read or write) value - optional, may contain operand
headless workstation
multi computers, node of a multicomputer consists of a cpu, memory, a network interface, and sometimes a hard disk
NUMA
non-uniform memory access - every memory word can not be read as fast as every other memory word
unique features of multiprocessor OS
process synchronization, resource management, scheduling
nonblocking network
property of crossbar switch, means no CPU is ever denied the connection it needs because some crosspoint or line is already occupied. grows n^2
test and set lock
read out a memory word and store it in a register while simultaneously writes a 1 (or other nonzero value) into the memory word. Must first lock the bus
client stub
represents the server procedure in the clients address space
space sharing
scheduling multiple threads at the same time across multiple CPUs. Useful if threads of a process communicate a lot
send and receive
send - sends message pointed to by mptr to a process identified by dest and causes the called to be blocked until message sent receive - message is copied to the buffer pointed to by mptr and the called is unblocked
server stub
server is bound with this procedure
circuit switching
switching regime, consists of the first switch first establishing a path through all the switches to the destination switch. then bits are pumped all the way from the source to the destination
store and forward packet switching
switching scheme, consists of the acket being injected into the first switch by the sources node network interface board
architecture diff b.w symmetric and asymmetric
symmetric: all processor same architecture asymmetric: all could be different or same
communication diff b.w symmetric and asymmetric
symmetric: all processors communicate by a shared memory asymmetric: no communication, controlled by master
easy diff b.w symmetric and asymmetric
symmetric: complex, need to be synchronized to maintain the load balance asymmetric: simple, master access the data structure
Basic difference b.w symmetric and asymmetric
symmetric: each processor runs tasks in the OS asymmetric: only master processor runs the tasks of the OS
failure diff b.w symmetric and asymmetric
symmetric: if failure, capacity of system reduces asymmetric: slave turned into master if master fails. if slave fails, task is switched to other processors
Process diff b.w symmetric and asymmetric
symmetric: processor takes processes from a queue (common or private) asymmetric: master processor assigns processes to slave processors
smart scheduling
thread acquiring a spin lock sets a process wide flag to show that it currently has a spin lock
multi computers
tightly coupled cpus that do not share memory, cluster computers
UMA
uniform memory access - every memory word can be read as fast as every other memory word
false sharing
when a nonlocal memory word is referenced, a chunk of memory containing the word is fetched from its current location and put on the machine making the reference - page containing variables constantly travels b.w the two machines
two level scheduling algorithm
when thread created, assigned to a cpu. this assignment of threads to cpus it the top level of the algorithm. then each cpu acquires its own collection of threads
thread afinity
where a thread has a preferential processor for scheduling on amultiprocessor machine. increase probability of being scheudled
packet
where message is broken up into a chunk of some max length, that is the packet