241 Full Review
a data area shared by hardware devices or program processes that operate at different speeds or with different sets of priorities. The buffer allows each device or process to operate without being held up by the other. ... This term is used both in programming and in hardware
buffering
an inode stores all the information about a file except its name and its actual data
"ls -l" shows the size of each file in a directory. Is the size stored in the directory or in the file's inode?
forever
Heap memory lifetime
For every possible address (all 4 billion of them) we will store the 'real' i.e. physical address. Each physical address will need 4 bytes (to hold the 32 bits). This scheme would require 16 billion bytes to store all of entries. Oops - our lookup scheme would consume all of the memory that we could possibly buy for our 4GB machine. We need to do better than this. Our lookup table better be smaller than the memory we have otherwise we will have no space left for our actual programs and operating system data. The solution is to chunk memory into small regions called 'pages' and 'frames' and use a lookup table for each page.
Address Translation
*
Address-of operator
our multi-threaded attempt that will block a thread until there is space or there is at least one item to remove.
Analyzing multi-threaded coded
An operation (or set of operations) is atomic or uninterruptible if it appears to the rest of the system to occur instantaneously. Without locks, only simple CPU instructions ("read this byte from memory") are atomic (indivisible). On a single CPU system, one could temporarily disable interrupts (so a sequence of operations cannot be interrupted) but in practice atomicity is achieved by using synchronization primitives, typically a mutex lock.
Atomic operations
Hold and wait
Be able to identify when Dining Philosophers code causes a deadlock (or not). For example, if you saw the following code snippet which Coffman condition is not satisfied? // Get both locks or none. pthread_mutex_lock( a ); if( pthread_mutex_trylock( b ) ) { /*failed*/ pthread_mutex_unlock( a ); ... }
best-fit at first glance appears to be an excellent strategy however, if we can not find a perfectly-sized hole then this placement creates many tiny unusable holes, leading to high fragmentation. It also requires a scan of all possible holes.
Best Fit
to be able to coalesce a free block with a previous free block we will also need to find the previous block, so we store the block's size at the end of the block, too. These are called "boundary tags"
Boundary Tag
A segregated allocator is one that divides the heap into different areas that are handled by different sub-allocators dependent on the size of the allocation request. Sizes are grouped into classes (e.g. powers of two) and each size is handled by a different sub-allocator and each size maintains its own free list. A well known allocator of this type is the buddy allocator. We'll discuss the binary buddy allocator which splits allocation into blocks of size 2^n (n = 1, 2, 3, ...) times some base unit number of bytes, but others also exist (e.g. Fibonacci split - can you see why it's named?). The basic concept is simple: If there are no free blocks of size 2^n, go to the next level and steal that block and split it into two. If two neighboring blocks of the same size become unallocated, they can be coalesced back together into a single large block of twice the size. Buddy allocators are fast because the neighboring blocks to coalesce with can be calculated from the free'd block's address, rather than traversing the size tags. Ultimate performance often requires a small amount of assembler code to use a specialized CPU instruction to find the lowest non-zero bit.
Buddy Allocator
At the beginning of this course we assumed that file streams are always line buffered i.e. the C library will flush its buffer everytime you send a newline character. Actually this is only true for terminal streams - for other filestreams the C library attempts to improve performance by only flushing when it's internal buffer is full or the file is closed. Pipe write are atomic up to the size of the pipe. Meaning that if two processes try to write to the same pipe, the kernel has internal mutexes with the pipe that it will lock, do the write, and return. The only gotcha is when the pipe is about to become full. If two processes are trying to write and the pipe can only satisfy a partial write, that pipe write is not atomic -- be careful about that!
Buffer Size/Atomicity
points to the first character in the array
C Strings as pointers
an array of characters
C Strings representation
print to a file and print
C io fprintf and printf
a code segment that accesses shared variables and has to be executed as an atomic action may not be correct if accessed by multiple threads so it must be locked
CSP (critical section problems)
Remember our page table maps pages to frames, but each page is a block of contiguous addresses. How do we calculate which particular byte to use inside a particular frame? The solution is to re-use the lowest bits of the virtual memory address directly. For example, suppose our process is reading the following address- VirtualAddress = 11110000111100001111000010101010 (binary) On a machine with page size 256 Bytes, then the lowest 8 bits (10101010) will be used as the offset. The remaining upper bits will be the page number (111100001111000011110000).
Calculating offsets for multi-level page table
malloc
Calls to heap allocation
yes the buffer is too small
Can one process alter another processes memory through normal means? Why?
To create a thread use the function pthread_create. This function takes four arguments: int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); The first is a pointer to a variable that will hold the id of the newly created thread. The second is a pointer to attributes that we can use to tweak and tune some of the advanced features of pthreads. The third is a pointer to a function that we want to run Fourth is a pointer that will be given to our function The argument void *(*start_routine) (void *) is difficult to read! It means a pointer that takes a void * pointer and returns a void * pointer. It looks like a function declaration except that the name of the function is wrapped with (* .... )
Capturing return values from a thread
If the current block and the next block (if it exists) are both free we need to coalesce these blocks into a single block. Similarly, we also need to check the previous block, too. If that exists and represents an unallocated memory, then we need to coalesce the blocks into a single large block.
Coalescing
while(1) fork();
Code me up a fork bomb in C (please don't run it).
There are four necessary and sufficient conditions for deadlock. These are known as the Coffman conditions. Mutual Exclusion Circular Wait Hold and Wait No pre-emption
Coffman Conditions
pid_t child_id = fork(); if (child_id == -1) { perror("fork"); exit(EXIT_FAILURE);} if (child_id > 0) { // We have a child! Get their exit code int status; waitpid( child_id, &status, 0 ); // code not shown to get exit status from child } else { // In child ... excel(...); // start calculation exit(123); }
Correct use of fork, exec and waitpid
A critical section is a section of code that can only be executed by one thread at a time, if the program is to function correctly. If two threads (or processes) were to execute code inside the critical section at the same time then it is possible that program may no longer have correct behavior.
Critical Section
a system called "DNS" (Domain Name Service) is used. If a machine does not hold the answer locally then it sends a UDP packet to a local DNS server. This server in turn may query other upstream DNS servers. DNS by itself is fast but not secure. DNS requests are not encrypted and susceptible to 'man-in-the-middle' attacks. For example, a coffee shop internet connection could easily subvert your DNS requests and send back different IP addresses for a particular domain
DNS
sometimes called a physical record, is a sequence of bytes or bits, usually containing some whole number of records, having a maximum length, a block size. Data thus structured are said to be blocked.
Data Block
&
Deferencing pointers
The Dining Philosophers problem is a classic synchronization problem. Imagine I invite N (let's say 5) philosophers to a meal. We will sit them at a table with 5 chopsticks (one between each philosopher). A philosopher alternates between wanting to eat or think. To eat the philosopher must pick up the two chopsticks either side of their position (the original problem required each philosopher to have two forks). However these chopsticks are shared with his neighbor.
Dining Philosophers
ppid parent = getppid(&status); kill( parent, SIGKILL);
Don't you hate it when your parents tell you that you can't do something? Write me a program that sends SIGSTOP to your parent.
In a multi threaded program, there are multiple stack but only one address space
Each thread has a stack
he various free spaced holes that are generated in either your memory or disk space. External fragmented blocks are available for allocation, but may be too small to be of any use.
External Fragmentation
Left-Right Deadlock
Failed DP Solutions
summarizes basic information about data, which can make finding and working with particular instances of data easier. For example, author, date created and date modified and file size are examples of very basic document metadata
File Metadata
has the advantage that it will not evaluate all possible placements and therefore be faster.
First Fit
A frame (or sometimes called a 'page frame') is a block of physical memory or RAM (=Random Access Memory). This kind of memory is occasionally called 'primary storage' (and contrasted with slower, secondary storage such as spinning disks that have lower access times) A frame is the same number of bytes as a virtual page. If a 32 bit machine has 2^32 (4GB) of RAM, then there will be the same number of them in the addressable space of the machine. It's unlikely that a 64 bit machine will ever have 2^64 bytes of RAM A page is a block of virtual memory. A typical block size on Linux operating system is 4KB (i.e. 2^12 addresses), though you can find examples of larger blocks. So rather than talking about individual bytes we can talk about blocks of 4KBs, each block is called a page. We can also number our pages ("Page 0" "Page 1" etc)
Frames/Pages
The function getaddrinfo can convert a human readable domain name (e.g. www.illinois.edu) into an IPv4 and IPv6 address. In fact it will return a linked-list of addrinfo structs: struct addrinfo { int ai_flags; int ai_family; int ai_socktype; int ai_protocol; socklen_t ai_addrlen; struct sockaddr *ai_addr; char *ai_canonname; struct addrinfo *ai_next; };
Get address info
Two students need a pen and paper: The students share a pen and paper. Deadlock is avoided because Mutual Exclusion was not required. The students both agree to grab the pen before grabbing the paper. Deadlock is avoided because there cannot be a circular wait. The students grab both the pen and paper in one operation ("Get both or get none"). Deadlock is avoided because there is no Hold and Wait The students are friends and will ask each other to give up a held resource. Deadlock is avoided because pre-emption is allowed.
Give a real life example of breaking each Coffman condition in turn. A situation to consider: pen and paper etc. How would you assure that work would get done?
#include <unistd.h> extern char **environ; int execl(const char *path, const char *arg, ... /* (char *) NULL */); int execlp(const char *file, const char *arg, ... /* (char *) NULL */); int execle(const char *path, const char *arg, ... /*, (char *) NULL, char * const envp[] */); int execv(const char *path, char *const argv[]); int execvp(const char *file, char *const argv[]); int execvpe(const char *file, char *const argv[], char *const envp[]);
How do you pass in command line arguments to execl*? How about execv*? What should be the first command line argument by convention?
int writers; // Number writer threads that want to enter the critical section (some or all of these may be blocked) int writing; // Number of threads that are actually writing inside the C.S. (can only be zero or one) int reading; // Number of threads that are actually reading inside the C.S. // if writing !=0 then reading must be zero (and vice versa) reader() { lock(&m) while (writers) cond_wait(&turn, &m) // No need to wait while(writing here) because we can only exit the above loop // when writing is zero reading++ unlock(&m) // perform reading here lock(&m) reading-- cond_broadcast(&turn) unlock(&m) } writer() { lock(&m) writers++ while (reading || writing) cond_wait(&turn, &m) writing++ unlock(&m) // perform writing here lock(&m) writing-- writers-- cond_broadcast(&turn) unlock(&m) }
Give me an implementation of a reader-writer lock with condition variables, make a struct with whatever you need, it just needs to be able to support the following functions void reader_lock(rw_lock_t* lck); void writer_lock(rw_lock_t* lck); void reader_unlock(rw_lock_t* lck); void writer_unlock(rw_lock_t* lck); The only specification is that in between reader_lock and reader_unlock, no writers can write. In between the writer locks, only one writer may be writing at a time.
all are finished one or more is forced to stop one or more is not allowed to join
Give me three conditions under which a multithreaded process will exit. Can you think of any more?
You already know one way to send a SIG_INT just type CTRL-C From the shell you can use kill (if you know the process id) and killall (if you know the process name) # First let's use ps and grep to find the process we want to send a signal to $ ps au | grep myprogram angrave 4409 0.0 0.0 2434892 512 s004 R+ 2:42PM 0:00.00 myprogram 1 2 3 #Send SIGINT signal to process 4409 (equivalent of `CTRL-C`) $ kill -SIGINT 4409 #Send SIGKILL (terminate the process) $ kill -SIGKILL 4409 $ kill -9 4409 killall is similar except that it matches by program name. The next two example, sends a SIGINT and then SIGKILL to terminate the processes that are running myprogram # Send SIGINT (SIGINT can be ignored) $ killall -SIGINT myprogram # SIGKILL (-9) cannot be ignored! $ killall -9 myprogram
How are signals served under UNIX? (Bonus: How about Windows?)
Install a signal handler to asynchronously handle signals use sigaction (or, for simple examples, signal ). To synchronously catch a pending signal use sigwait (which blocks until a signal is delivered) or signalfd (which also blocks and provides a file descriptor that can be read() to retrieve pending signals).
How do I asynchronously and synchronously catch a signal?
Use sigprocmask! With sigprocmask you can set the new mask, add new signals to be blocked to the process mask, and unblock currently blocked signals. You can also determine the existing mask (and use it for later) by passing in a non-null value for oldset. Blocking signals is similar in multi-threaded programs to single-threaded programs: Use pthread_sigmask instead of sigprocmask Block a signal in all threads to prevent its asynchronous delivery The easiest method to ensure a signal is blocked in all threads is to set the signal mask in the main thread before new threads are created
How do I change the signal disposition in a single threaded program? How about multithreaded?
/proc/dev/random/??????
How do I simplify /./proc/../dev/./random/?
they mark the size of the block and if it is free know when to coalesce or if it is a fit
How do boundary tags work? How can they be used to coalesce or split?
delete a freed pointer
How do we prevent double free errors?
The frame size is 2KB. Assuming memory is byte-addressable, we need an offset into 2000 different bytes. 2000 is approximately (2^10)*2 = 2^11, so we need 11 bits for the frame offset.
How do you determine how many bits are used in the page offset?
pthread_join the pthread library will automatically finish the process if there are no other threads running. pthread_exit(...) is equivalent to returning from the thread's function; both finish the thread and also set the return value (void *pointer) for the thread.
How do you get a return value given a pthread_t? What are the ways a thread can set that return value? What happens if you discard the return value?
return value of a negative
How do you know if exec or fork failed?
chmod - change file mode bits chmod [OPTION]... MODE[,MODE]... FILE... chmod [OPTION]... OCTAL-MODE FILE... chmod [OPTION]... --reference=RFILE FILE... Change the mode of each FILE to MODE. With --reference, change the mode of each FILE to that of RFILE. -c, --changes like verbose but report only when a change is made -f, --silent, --quiet suppress most error messages -v, --verbose output a diagnostic for every file processed --no-preserve-root do not treat '/' specially (the default) --preserve-root fail to operate recursively on '/' --reference=RFILE use RFILE's mode instead of MODE values -R, --recursive change files and directories recursively --help display this help and exit --version output version information and exit Each MODE is of the form '[ugoa]*([-+=]([rwxXst]*|[ugo]))+|[-+=][0-7]+'.
How do you use chmod to set user/group/owner read/write/execute permissions?
There can be multiple active readers When there is an active writer the number of active readers must be zero A writer must wait until the current active readers have finished
How many of the following statements are true? There can be multiple active readers There can be multiple active writers When there is an active writer the number of active readers must be zero If there is an active reader the number of active writers must be zero A writer must wait until the current active readers have finished
0 all are given back and then can complete
How many processes are blocked? As usual assume that a process is able to complete if it is able to acquire all of the resources listed below. P1 acquires R1 P2 acquires R2 P1 acquires R3 P2 waits for R3 P3 acquires R5 P1 waits for R4 P3 waits for R1 P4 waits for R5 P5 waits for R1 (Draw out the resource graph!)
one line
How tight can you make the critical section?
WIFEXITED(wstatus) returns true if the child terminated normally, that is, by calling exit(3) or _exit(2), or by returning from main().
How to use the WAIT exit status macros WIFEXITED etc.
"IP4", or more precisely, "IPv4" is version 4 of the Internet Protocol that describes how to send packets of information across a network from one machine to another . Roughly 95% of all packets on the Internet today are IPv4 packets. A significant limitation of IPv4 is that source and destination addresses are limited to 32 bits (IPv4 was designed at a time when the idea of 4 billion devices connected to the same network was unthinkable - or at least not worth making the packet size larger) A newer packet protocol "IPv6" solves many of the limitations of IPv4 (e.g. makes routing tables simpler and 128 bit addresses) however less than 5% of web traffic is IPv6 based.
IPv4 vs IPv6
ll->size++; and prev = i;
Identify the critical section here struct linked_list; struct node; void add_linked_list(linked_list *ll, void* elem){ node* packaged = new_node(elem); if(ll->head){ ll->head = }else{ packaged->next = ll->head; ll->head = packaged; ll->size++; } } void* pop_elem(linked_list *ll, size_t index){ if(index >= ll->size) return NULL; node *i, *prev; for(i = ll->head; i && index; i = i->next, index--){ prev = i; } //i points to the element we need to pop, prev before if(prev->next) prev->next = prev->next->next; ll->size--; void* elem = i->elem; destroy_node(i); return elem; }
Dead lock occurs because neither thread can get both requirements. the third thread will get blocked
If one thread calls pthread_mutex_lock(m1) // success pthread_mutex_lock(m2) // blocks and another threads calls pthread_mutex_lock(m2) // success pthread_mutex_lock(m1) // blocks What happens and why? What happens if a third thread calls pthread_mutex_lock(m1) ?
Suppose we wanted to perform a multi-threaded calculation that has two stages, but we don't want to advance to the second stage until the first stage is completed. We could use a synchronization method called a barrier. When a thread reaches a barrier, it will wait at the barrier until all the threads reach the barrier, and then they'll all proceed together.
Implementing a barrier
A simple (single-threaded) implementation is shown below. Note enqueue and dequeue do not guard against underflow or overflow - it's possible to add an item when when the queue is full and possible to remove an item when the queue is empty. For example if we added 20 integers (1,2,3...) to the queue and did not dequeue any items then values 17,18,19,20 would overwrite the 1,2,3,4. We won't fix this problem right now, instead when we create the multi-threaded version we will ensure enqueue-ing and dequeue-ing threads are blocked while the ring buffer is full or empty respectively. void *buffer[16]; int in = 0, out = 0; void enqueue(void *value) { /* Add one item to the front of the queue*/ buffer[in] = value; in++; /* Advance the index for next time */ if (in == 16) in = 0; /* Wrap around! */ } void *dequeue() { /* Remove one item to the end of the queue.*/ void *result = buffer[out]; out++; if (out == 16) out = 0; return result; }
Implementing a ring buffer
int writers; // Number writer threads that want to enter the critical section (some or all of these may be blocked) int writing; // Number of threads that are actually writing inside the C.S. (can only be zero or one) int reading; // Number of threads that are actually reading inside the C.S. // if writing !=0 then reading must be zero (and vice versa) reader() { lock(&m) while (writers) cond_wait(&turn, &m) // No need to wait while(writing here) because we can only exit the above loop // when writing is zero reading++ unlock(&m) // perform reading here lock(&m) reading-- cond_broadcast(&turn) unlock(&m) } writer() { lock(&m) writers++ while (reading || writing) cond_wait(&turn, &m) writing++ unlock(&m) // perform writing here lock(&m) writing-- writers-- cond_broadcast(&turn) unlock(&m) }
Implementing producer consumer
Normally, ext4 allows an inode to have no more than 65,000 hard links. This applies to regular files as well as directories, which means that there can be no more than 64,998 subdirectories in a directory (because each of the '.' and '..' entries, as well as the directory entry for the directory in its parent directory counts as a hard link). This feature lifts this limit by causing ext4 to use a link count of 1 to indicate that the number of hard links to a directory is not known when the link count might exceed the maximum count limit.
In ext2, what is stored in an inode, and what is stored in a directory entry?
n a Unix-style file system, an index node, informally referred to as an inode, is a data structure used to represent a filesystem object, which can be one of various things including a file or a directory. Each inode stores the attributes and disk block location(s) of the filesystem object's data. Filesystem object attributes may include manipulation metadata (e.g. change, access, modify time), as well as owner and permission data (e.g. group-id, user-id, permissions).
Inode
nternal fragmentation happens when the block you give them is larger than their allocation size. Let's say that we have a free block of size 16B (not including metadata). If they allocate 7 bytes, you may want to round up to 16B and just return the entire block. This gets very sinister when you implementing coalescing and splitting (next section). If you don't implement either, then you may end up returning a block of size 64B for a 7B allocation! There is a lot of overhead for that allocation which is what we are trying to avoid.
Internal Fragmentation
yes
Is a pipe thread safe?
yes the static memory of output
Is the following code valid? If so, why? Where is output located? char *foo(int var){ static char output[20]; snprintf(output, 20, "%d", var); return output; }
If there are no free blocks of size 2^n, go to the next level and steal that block and split it into two. If two neighboring blocks of the same size become unallocated, they can be coalesced back together into a single large block of twice the size.
Let's say that we are using a buddy allocator with a new slab of 64kb. How does it go about allocating 1.5kb?
What if all the philosophers pick up their left at the same time, try to grab their right, put their left down, pick up their left, try to grab their right.... We have now livelocked our solution! Our poor philosopher are still starving, so let's give them some proper solutions. Livelock occurs when a process continues to execute but is unable to make progress. In practice Livelock may occur because the programmer has taken steps to avoid deadlock. In the above example, in a busy system, the student will continually release the first resource because they are never able to obtain the second resource. The system is not deadlock (the student process is still executing) however it's not making any progress either.
Livelocking DP Solutions
The Memory Management Unit is part of the CPU. It converts a virtual memory address into a physical address. The MMU may also interrupt the CPU if there is currently no mapping from a particular virtual address to a physical address or if the current CPU instruction attempts to write to location that the process only has read-access. To overcome this overhead, the MMU includes an associative cache of recently-used virtual-page-to-frame lookups. This cache is called the TLB ("translation lookaside buffer"). Everytime a virtual address needs to be translated into a physical memory location, the TLB is queried in parallel to the page table. For most memory accesses of most programs, there is a significant chance that the TLB has cached the results. However if a program does not have good cache coherence (for example is reading from random memory locations of many different pages) then the TLB will not have the result cache and now the MMU must use the much slower page table to determine the physical frame.
MMU/TLB
start_time is the wall-clock start time of the process (CPU starts working on it) end_time is the end wall-clock of the process (CPU finishes the process) run_time is the total amount of CPU time required arrival_time is the time the process enters the scheduler (CPU may not start working on it) turnaround_time = end_time - arrival_time response_time = start_time - arrival_time wait_time = (end_time - arrival_time) - run_time
Measures of Efficiency
kill(child, SIGQUIT);
My terminal is anchored to PID = 1337 and has just become unresponsive. Write me the terminal command and the C code to send SIGQUIT to it.
From the command line: mkfifo From C: int mkfifo(const char *pathname, mode_t mode); You give it the path name and the operation mode, it will be ready to go! Named pipes take up no space on the disk. What the operating system is essentially telling you when you have a named pipe is that it will create an unnamed pipe that refers to the named pipe, and that's it! There is no additional magic. This is just for programming convenience if processes are started without forking (meaning that there would be no way to get the file descriptor to the child process for an unnamed pipe) Unnamed pipes (the kind we've seen up to this point) live in memory (do not take up any disk space) and are a simple and efficient form of inter-process communication (IPC) that is useful for streaming data and simple messages. Once all processes have closed, the pipe resources are freed.
Named pipe and Unnamed Pipes
naturally aligned" means that any item is aligned to at least a multiple of its own size. For example, a 4-byte object is aligned to an address that's a multiple of 4, an 8-byte object is aligned to an address that's a multiple of 8, etc. For an array, you don't normally look at the size of the whole array
Natural Alignment
Mutex locks, or semaphores
Once you have identified a critical section, what is one way of assuring that only one thread will be in the section at a time?
Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is typically caused by network congestion. A connection-oriented transport layer protocol, such as TCP, may be based on a connectionless network layer protocol (such as IP), but still achieves in-order delivery of a byte-stream, by means of segment sequence numbering on the sender side, packet buffering and data packet reordering on the receiver side.
Packet Loss/Connection Based
A page fault is when a running program tries to access some virtual memory in its address space that is not mapped to physical memory. Page faults will also occur in other situations.
Page Faults
A page table is a mapping between a page to the frame. For example Page 1 might be mapped to frame 45, page 2 mapped to frame 30. Other frames might be currently unused or assigned to other running processes, or used internally by the operating system.
Page Table
The child process inherits a copy of the parent's signal dispositions. In other words, if you have installed a SIGINT handler before forking, then the child process will also call the handler if a SIGINT is delivered to the child. Note pending signals for the child are not inherited during forking.
Pending Signals when Forking/Exec
Every file and directory has a set of 9 permission bits and a type field r, permission to read the file w, permission to write to the file x, permission to execute the file
Permission Bits
These file descriptors can be used with read - // To read... char buffer[80]; int bytesread = read(filedes[0], buffer, sizeof(buffer)); And write - write(filedes[1], "Go!", 4);
Pipe read write ends
A POSIX pipe is almost like its real counterpart - you can stuff bytes down one end and they will appear at the other end in the same order. Unlike real pipes however, the flow is always in the same direction, one file descriptor is used for reading and the other for writing. The pipe system call is used to create a pipe.
Pipes
+1 == +size of pointer
Pointer arithmetic
Processes are very powerful but they are isolated! That means that by default, no process can communicate with another process. This is very important because if you have a large system (let's say EWS) then you want some processes to have higher privilages (monitoring, admin) than your average user, and one certainly doesn't want the average user to be able to bring down the entire system either on purpose or accidentally by modifying a process.
Process memory isolation
When a process starts, it gets its own address space. Meaning that each process gets : A Stack. The Stack is the place where automatic variable and function call return addresses are stored. Every time a new variable is declared, the program moves the stack pointer down to reserve space for the variable. This segment of the stack is Writable but not executable. If the stack grows too far (meaning that it either grows beyond a preset boundary or intersects the heap) you will get a stackoverflow most likely resulting in a SEGFAULT or something similar. The stack is statically allocated by default meaning that there is only a certain amount of space to which one can write A Heap. The heap is an expanding region of memory. If you want to allocate a large object, it goes here. The heap starts at the top of the text segment and grows upward (meaning sometimes when you call malloc that it asks the operating system to push the heap boundary upward). This area is also Writable but not Executable. One can run out of heap memory if the system is constrained or if you run out of addresses (more common on a 32bit system). A Data Segment This contains all of your globals. This section starts at the end of the Text segment and is static in size because the amount of globals is known at compile time. This section is Writable but not Executable and there isn't anything else too fancy here. A Text Segment. This is, arguably, the most important section of the address. This is where all your code is stored. Since assembly compiles to 1's and 0's, this is where the 1's and 0's get stored. The program counter moves through this segment executing instructions and moving down the next instruction. It is important to note that this is the only Executable section of the code. If you try to change the code while it's running, most likely you will segfault (there are ways around it but just assume that it segfaults).
Process memory layout (where is the heap, stack etc; invalid memory addresses)
Imagine you had a key-value map data structure which is used by many threads. Multiple threads should be able to look up (read) values at the same time provided the data structure is not being written to. The writers are not so gregarious - to avoid data corruption, only one thread at a time may modify (write) the data structure (and no readers may be reading at that time). This is an example of the Reader Writer Problem. Namely how can we efficiently synchronize multiple readers and writers such that multiple readers can read together but a writer gets exclusive access?
Producer Consumer Problem
RAID-3 uses parity codes instead of mirroring the data. For each N-bits written we will write one extra bit, the 'Parity bit' that ensures the total number of 1s written is even. The parity bit is written to an additional disk. If any one disk (including the parity disk) is lost, then its contents can still be computed using the contents of the other disks. RAID-5 is similar to RAID-3 except that the check block (parity information) is assigned to different disks for different blocks. The check-block is 'rotated' through the disk array. RAID-5 provides better read and write performance than RAID-3 because there is no longer the bottleneck of the single parity disk. The one drawback is that you need more disks to have this setup and there are more complicated algorithms need to be used
RAID
The rpc file contains user readable names that can be used in place of RPC program numbers. Each line has the following information: · name of server for the RPC program · RPC program number · aliases Items are separated by any number of blanks and/or tab characters. A '#' indicates the beginning of a comment; characters from the '#' to the end of the line are not interpreted by routines which search the file. Here is an example of the /etc/rpc file from the Sun RPC Source distribution.
RPC
Relative paths are paths that start from your current position in the tree.
Relative Path
Most filesystems cache significant amounts of disk data in physical memory. Linux, in this respect, is particularly extreme: All unused memory is used as a giant disk cache.
Reliable File Systems
Zombie is a child not waited on and orphan is similar. be a good parent by waiting on your children
What is an orphan? How does it become a zombie? How do I be a good parent?
A resource allocation graph tracks which resource is held by which process and which process is waiting for a resource of a particular type. It is very powerful and simple tool to illustrate how interacting processes can deadlock. If a process is using a resource, an arrow is drawn from the resource node to the process node. If a process is requesting a resource, an arrow is drawn from the process node to the resource node.
Resource Allocation Graphs
SIGKILL Term action Kill signal SIGSTOP Stop action Stop process SIGINT Term action Interrupt from keyboard
SIGKILL vs SIGSTOP vs SIGINT.
Shortest Job First (SJF) Preemptive Shortest Job First (PSJF) First Come First Served (FCFS) Round Robin (RR)
Scheduling Algorithms
the total memory overhead 4MB (for the single level implementation) Multi-level pages are one solution to the page table size issue for 64 bit architectures. We'll look at the simplest implementation - a two level page table. Each table is a list of pointers that point to the next level of tables, not all sub-tables need to exist. For a single page table, our machine is now twice as slow! (Two memory accesses are required) For a two-level page table, memory access is now three times as slow. (Three memory accesses are required)
Single level vs multi level page table
The technique is used to retain allocated memory that contains a data object of a certain type for reuse upon subsequent allocations of objects of the same type. It is analogous to an object pool, but only applies to memory, not other resources.
Slab Allocation/Memory Pool
the allocator will need to split the hole into two to have some part of the hole taken up by memory
Splitting
This block contains metadata about the filesystem, how large, last modified time, a journal, number of inodes and the first inode start, number of data block and the first data block start.
Superblock
socket, getaddressinfo, connect, read/write
TCP client calls
htons, socket, bind listen and accept.
TCP server calls
TCP is a connection-based protocol that is built on top of IPv4 and IPv6 (and therefore can be described as "TCP/IP" or "TCP over IP"). TCP creates a pipe between two machines and abstracts away the low level packet-nature of the Internet: Thus, under most conditions, bytes sent from one machine will eventually arrive at the other end without duplication or data loss. TCP will automatically manage resending packets, ignoring duplicate packets, re-arranging out-of-order packets and changing the rate at which packets are sent. TCP's three way handshake is known as SYN, SYN-ACK, and ACK. The diagram on this page helps with understanding the TCP handshake. TCP Handshake Most services on the Internet today (e.g. a web service) use TCP because it hides the complexity of lower, packet-level nature of the Internet. UDP is a connectionless protocol that is built on top of IPv4 and IPv6. It's very simple to use: Decide the destination address and port and send your data packet! However the network makes no guarantee about whether the packets will arrive. Packets (aka Datagrams) may be dropped if the network is congested. Packets may be duplicated or arrive out of order. Between two distant data-centers it's typical to see 3% packet loss. A typical use case for UDP is when receiving up to date data is more important than receiving all of the data. For example, a game may send continuous updates of player positions. A streaming video signal may send picture updates using UDP
TCP vs UDP
If all file descriptors referring to the read end of a pipe have been closed, then a write(2) will cause a SIGPIPE signal to be generated for the calling process.
When is SIGPIPE delivered to a process?
exit(42) exits the entire process and sets the processes exit value. This is equivalent to return 42 in the main method. All threads inside the process are stopped. pthread_exit(void *) only stops the calling thread i.e. the thread never returns after calling pthread_exit. The pthread library will automatically finish the process if there are no other threads running. pthread_exit(...) is equivalent to returning from the thread's function; both finish the thread and also set the return value (void *pointer) for the thread. Calling pthread_exit in the the main thread is a common way for simple programs to ensure that all threads finish. For example, in the following program, the myfunc threads will probably not have time to get started.
Under what conditions will a process exit
The fork system call clones the current process to create a new process. It creates a new process (the child process) by duplicating the state of the existing process with a few minor differences (discussed below). The child process does not start from main. Instead it returns from fork() just as the parent process does. Use waitpid to wait for the child to finish Use one of the exec functions after forking. The exec set of functions replaces the process image with the the process image of what is being called. This means that any lines of code after the exec call are replaced. Any other work you want the child process to do should be done before the exec call.
Understanding what fork and exec and waitpid do. E.g. how to use their return values.
typedef struct sem_t { int count; pthread_mutex_t m; pthread_condition_t cv; } sem_t; int sem_init(sem_t *s, int pshared, int value) { if (pshared) { errno = ENOSYS /* 'Not implemented'*/; return -1;} s->count = value; pthread_mutex_init(&s->m, NULL); pthread_cond_init(&s->cv, NULL); return 0; } sem_post(sem_t *s) { pthread_mutex_lock(&s->m); s->count++; pthread_cond_signal(&s->cv); /* See note */ /* A woken thread must acquire the lock, so it will also have to wait until we call unlock*/ pthread_mutex_unlock(&s->m); } sem_wait(sem_t *s) { pthread_mutex_lock(&s->m); while (s->count == 0) { pthread_cond_wait(&s->cv, &s->m); /*unlock mutex, wait, relock mutex*/ } s->count--; pthread_mutex_unlock(&s->m); }
Use a counting semaphore to implement a barrier.
Condition variables allow a set of threads to sleep until tickled! You can tickle one thread or all threads that are sleeping. If you only wake one thread then the operating system will decide which thread to wake up. You don't wake threads directly instead you 'signal' the condition variable, which then will wake up one (or all) threads that are sleeping inside the condition variable. Condition variables are used with a mutex and with a loop (to check a condition). Occasionally a waiting thread may appear to wake up for no reason (this is called a spurious wake)! This is not an issue because you always use wait inside a loop that tests a condition that must be true to continue. Threads sleeping inside a condition variable are woken up by calling pthread_cond_broadcast (wake up all) or pthread_cond_signal (wake up one). Note despite the function name, this has nothing to do with POSIX signals!
Using Condition Variables
We can implement a counting semaphore using condition variables. Each semaphore needs a count, a condition variable and a mutex typedef struct sem_t { int count; pthread_mutex_t m; pthread_condition_t cv; } sem_t;
Using Counting Semaphore
The execlp(), execvp(), and execvpe() functions duplicate the actions of the shell in searching for an executable file if the specified filename does not contain a slash (/) character. The file is sought in the colon-separated list of directory pathnames specified in the PATH environment variable. If this variable isn't defined, the path list defaults to a list that includes the directories returned by confstr(_CS_PATH) (which typically returns the value "/bin:/usr/bin") and possibly also the current working directory; see NOTES for further details. If the specified filename includes a slash character, then PATH is ignored, and the file at the specified pathname is executed.
Using exec with a path
POSIX call, kill(child, SIGUSR1); // Send a user-defined signal kill(child, SIGSTOP); // Stop the child process (the child cannot prevent this) kill(child, SIGTERM); // Terminate the child process (the child can prevent this) kill(child, SIGINT); // Equivalent to CTRL-C (by default closes the process) As we saw above there is also a kill command available in the shell e.g. get a list of running processes and then terminate process 45 and process 46 ps kill -l kill -9 45 kill -s TERM 46
Using kill from the shell or the kill POSIX call.
The pthread_create() function shall create a new thread, with attributes specified by attr, within a process. If attr is NULL, the default attributes shall be used. If the attributes specified by attr are modified later, the thread's attributes shall not be affected. Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by thread. The thread is created executing start_routine with arg as its sole argument. If the start_routine returns, the effect shall be as if there was an implicit call to pthread_exit() using the return value of start_routine as the exit status. Note that the thread in which main() was originally invoked differs from this. When it returns from main(), the effect shall be as if there was an implicit call to exit() using the return value of main() as the exit status.
Using pthread_create
The pthread_exit() function terminates the calling thread and returns a value via retval that (if the thread is joinable) is available to another thread in the same process that calls pthread_join(3). Any clean-up handlers established by pthread_cleanup_push(3) that have not yet been popped, are popped (in the reverse of the order in which they were pushed) and executed. If the thread has any thread- specific data, then, after the clean-up handlers have been executed, the corresponding destructor functions are called, in an unspecified order. When a thread terminates, process-shared resources (e.g., mutexes, condition variables, semaphores, and file descriptors) are not released, and functions registered using atexit(3) are not called. After the last thread in a process terminates, the process terminates as by calling exit(3) with an exit status of zero; thus, process- shared resources are released and functions registered using atexit(3) are called.
Using pthread_exit
The pthread_join() function waits for the thread specified by thread to terminate. If that thread has already terminated, then pthread_join() returns immediately. The thread specified by thread must be joinable. If retval is not NULL, then pthread_join() copies the exit status of the target thread (i.e., the value that the target thread supplied to pthread_exit(3)) into the location pointed to by retval. If the target thread was canceled, then PTHREAD_CANCELED is placed in the location pointed to by retval. If multiple threads simultaneously try to join with the same thread, the results are undefined. If the thread calling pthread_join() is canceled, then the target thread will remain joinable (i.e., it will not be detached).
Using pthread_join
pthread_mutex_lock, pthread_mutex_trylock, pthread_mutex_unlock — lock and unlock a mutex pthread_mutex_destroy, pthread_mutex_init — destroy and initialize a mutex pthread_mutex_init — destroy and initialize a mutex
Using pthread_mutex
POSIX systems, such as Linux and Mac OSX (which is based on BSD) include several virtual filesystems that are mounted (available) as part of the file-system. Files inside these virtual filesystems do not exist on the disk; they are generated dynamically by the kernel when a process requests a directory listing. Linux provides 3 main virtual filesystems /dev - A list of physical and virtual devices (for example network card, cdrom, random number generator) /proc - A list of resources used by each process and (by tradition) set of system information /sys - An organized list of internal kernel entities
Virtual File System
In very simple embedded systems and early computers, processes directly access memory i.e. "Address 1234" corresponds to a particular byte stored in a particular part of physical memory. In modern systems, this is no longer the case. Instead each process is isolated; and there is a translation process between the address of a particular CPU instruction or piece of data of a process and the actual byte of physical memory ("RAM"). Memory addresses are no longer 'real'; the process runs inside virtual memory. Virtual memory not only keeps processes safe (because one process cannot directly read or modify another process's memory) it also allows the system to efficiently allocate and re-allocate portions of memory to different processes.
Virtual Memory
sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical addresses.
What are the following and what is their purpose? Memory Management Unit.
/dev - A list of physical and virtual devices (for example network card, cdrom, random number generator) /proc - A list of resources used by each process and (by tradition) set of system information /sys - An organized list of internal kernel entities
What are /sys, /proc, /dev/random, and /dev/urandom?
bits that allow or block reading and writing on systems
What are permission bits?
takes time a local variable would be faster
What are some downsides to atomic operations? What would be faster: keeping a local variable or many atomic operations?
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <netdb.h> #include <unistd.h> int main(int argc, char **argv) { int s; int sock_fd = socket(AF_INET, SOCK_STREAM, 0); struct addrinfo hints, *result; memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_family = AF_INET; /* IPv4 only */ hints.ai_socktype = SOCK_STREAM; /* TCP */ s = getaddrinfo("www.illinois.edu", "80", &hints, &result); if (s != 0) { fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(s)); exit(1); } if(connect(sock_fd, result->ai_addr, result->ai_addrlen) == -1){ perror("connect"); exit(2); } char *buffer = "GET / HTTP/1.0\r\n\r\n"; printf("SENDING: %s", buffer); printf("===\n"); // For this trivial demo just assume write() sends all bytes in one go and is not interrupted write(sock_fd, buffer, strlen(buffer)); char resp[1000]; int len = read(sock_fd, resp, 999); resp[len] = '\0'; printf("%s\n", resp); return 0; }
What are the calls to set up a TCP client?
#include <string.h> #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/socket.h> #include <netdb.h> #include <unistd.h> #include <arpa/inet.h> int main(int argc, char **argv) { int s; int sock_fd = socket(AF_INET, SOCK_STREAM, 0); struct addrinfo hints, *result; memset(&hints, 0, sizeof(struct addrinfo)); hints.ai_family = AF_INET; hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_PASSIVE; s = getaddrinfo(NULL, "1234", &hints, &result); if (s != 0) { fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(s)); exit(1); } if (bind(sock_fd, result->ai_addr, result->ai_addrlen) != 0) { perror("bind()"); exit(1); } if (listen(sock_fd, 10) != 0) { perror("listen()"); exit(1); } struct sockaddr_in *result_addr = (struct sockaddr_in *) result->ai_addr; printf("Listening on file descriptor %d, port %d\n", sock_fd, ntohs(result_addr->sin_port)); printf("Waiting for connection...\n"); int client_fd = accept(sock_fd, NULL, NULL); printf("Connection made: client_fd=%d\n", client_fd); char buffer[1000]; int len = read(client_fd, buffer, sizeof(buffer) - 1); buffer[len] = '\0'; printf("Read %d chars\n", len); printf("===\n"); printf("%s\n", buffer); return 0; }
What are the calls to set up a TCP server?
For a 32 bit machine with 4KB pages, each entry needs to hold a frame number - i.e. 20 bits because we calculated there are 2^20 frames.
What are the following and what is their purpose? Frame number.
Thus the total memory overhead for our multi-level page table has shrunk from 4MB (for the single level implementation) to 3 frames of memory (12KB) ! Here's why: We need at least one frame for the high level directory and two frames for just two sub-tables. One sub-table is necessary for the low addresses (program code, constants and possibly a tiny heap), the other sub-table is for higher addresses used by the environment and stack. In practice, real programs will likely need more sub-table entries, as each subtable can only reference 1024*4KB = 4MB of address space but the main point still stands - we have significantly reduced the memory overhead required to perform page table look ups.
What are the following and what is their purpose? Multilevel page table
Remember our page table maps pages to frames, but each page is a block of contiguous addresses. How do we calculate which particular byte to use inside a particular frame? The solution is to re-use the lowest bits of the virtual memory address directly. For example, suppose our process is reading the following address- VirtualAddress = 11110000111100001111000010101010 (binary) On a machine with page size 256 Bytes, then the lowest 8 bits (10101010) will be used as the offset. The remaining upper bits will be the page number (111100001111000011110000).
What are the following and what is their purpose? Page number and page offset.
For every possible address (all 4 billion of them) we will store the 'real' i.e. physical address. Each physical address will need 4 bytes (to hold the 32 bits).
What are the following and what is their purpose? Physical Address
The execution bit defines whether bytes in a page can be executed as CPU instructions. By disabling a page, it prevents code that is maliciously stored in the process memory (e.g. by stack overflow) from being easily executed. (further reading: http://en.wikipedia.org/wiki/NX_bit#Hardware_background)
What are the following and what is their purpose? The NX Bit
The dirty bit allows for a performance optimization. A page on disk that is paged in to physical memory, then read from, and subsequently paged out again does not need to be written back to disk, since the page hasn't changed. However, if the page was written to after it's paged in, its dirty bit will be set, indicating that the page must be written back to the backing store. This strategy requires that the backing store retain a copy of the page after it is paged in to memory. When a dirty bit is not used, the backing store need only be as large as the instantaneous total size of all paged-out pages at any moment. When a dirty bit is used, at all times some pages will exist in both physical memory and the backing store.
What are the following and what is their purpose? The dirty bit
the MMU includes an associative cache of recently-used virtual-page-to-frame lookups fix the overhead of time
What are the following and what is their purpose? Translation Lookaside Buffer
Mutual Exclusion: The resource cannot be shared Circular Wait: There exists a cycle in the Resource Allocation Graph. There exists a set of processes {P1,P2,...} such that P1 is waiting for resources held by P2, which is waiting for P3,..., which is waiting for P1. Hold and Wait: A process acquires an incomplete set of resources and holds onto them while waiting for the other resources. No pre-emption: Once a process has acquired a resource, the resource cannot be taken away from a process and the process will not voluntarily give up a resource. Two students need a pen and paper: The students share a pen and paper. Deadlock is avoided because Mutual Exclusion was not required. The students both agree to grab the pen before grabbing the paper. Deadlock is avoided because there cannot be a circular wait. The students grab both the pen and paper in one operation ("Get both or get none"). Deadlock is avoided because there is no Hold and Wait The students are friends and will ask each other to give up a held resource. Deadlock is avoided because pre-emption is allowed.
What do each of the Coffman conditions mean? (e.g. can you provide a definition of each one)
array
What does a multi leveled table look like in memory?
A function is asynchronous-safe, or asynchronous-signal safe, if it can be called safely and without side effects from within a signal handler context. That is, it must be able to be interrupted at any point to run linearly out of sequence without causing an inconsistent state. It must also function properly when global data might itself be in an inconsistent state. Some asynchronous-safe operations are listed here: Call the signal() function to reinstall a signal handler Unconditionally modify a volatile sig_atomic_t variable (as modification to this type is atomic) Call the _Exit() function to immediately terminate program execution Invoke an asynchronous-safe function, as specified by the implementation
What does it mean that a function is signal handler safe?
pthread_exit(void *) only stops the calling thread i.e. the thread never returns after calling pthread_exit. The pthread library will automatically finish the process if there are no other threads running. pthread_exit(...) is equivalent to returning from the thread's function; both finish the thread and also set the return value (void *pointer) for the thread. Calling pthread_exit in the the main thread is a common way for simple programs to ensure that all threads finish. exit
What does pthread_exit do under normal circumstances (ie you are not the last thread)? What other functions are called when you call pthread_exit?
socket() creates an endpoint for communication and returns a file descriptor that refers to that endpoint. The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
What does socket do?
copy bytes from one file to another Use the versatile dd command. For example, the following command copies 1 MB of data from the file /dev/urandom to the file /dev/null. The data is copied as 1024 blocks of blocksize 1024 bytes. $ dd if=/dev/urandom of=/dev/null bs=1k count=1024
What does the "dd" command do?
(exec)The child process inherits a copy of the parent's signal dispositions. (fork)The child process inherits a copy of the parent process's signal disposition and a copy of the parent's signal mask. For example if SIGINT is blocked in the parent it will be blocked in the child too. For example if the parent installed a handler (call-back function) for SIG-INT then the child will also perform the same behavior.
What happens to my signal disposition after I fork? Exec?
(fork).Pending signals however are not inherited by the child. (exec)Both the signal mask and the signal disposition carries over to the exec-ed program. Signals are preserved as well. Signal handlers are reset, because the original handler code has disappeared along with the old process.
What happens to pending signals after I fork? Exec?
a new stack is created
What happens when a pthread gets created? (you don't need to go into super specifics)
splitting increases fragmentation, done to allocate new blocks coalesces decreases fragmentation, done when two free blocks are next to each other
What is Coalescing/Splitting? How do they increase/decrease fragmentation? When can you coalesce or split?
a system called "DNS" (Domain Name Service) is used. If a machine does not hold the answer locally then it sends a UDP packet to a local DNS server. This server in turn may query other upstream DNS servers.
What is DNS? What is the route that DNS takes?
When there are a ton of small chunks (too small for allocation) of free memory
What is External Fragmentation? When does it become an issue?
When there is extra memory already allocated
What is Internal Fragmentation? When does it become an issue?
leads to high fragmentation. It also requires a scan of all possible holes.
What is a Best Fit placement strategy? How is it with External Fragmentation? Time Complexity?
A worst-fit strategy finds the largest hole that is of sufficient size fragmentation
What is a Worst Fit placement strategy? Is it any better with External Fragmentation? Time Complexity?
Condition variables allow a set of threads to sleep until tickled safety is an advantage
What is a condition variable? Why is there an advantage to using one over a while loop?
Conceptually, a semaphore is a nonnegative integer count. Semaphores are typically used to coordinate access to resources, with the semaphore count initialized to the number of free resources. The number of cookies in a cookie jar
What is a counting semaphore? Give me an analogy to a cookie jar/pizza box/limited food item.
A 'fork bomb' is when you attempt to create an infinite number of processes When a child finishes (or terminates) it still takes up a slot in the kernel process table. Only when the child has been 'waited on' will the slot be available again Once a process completes, any of its children will be assigned to "init" - the first process with pid of 1. Thus these children would see getppid() return a value of 1. These orphans will eventually finish and for a brief moment become a zombie. Fortunately, the init process automatically waits for all of its children, thus removing these zombies from the system.
What is a fork bomb, zombie and orphan? How to create/remove them
here are three types of Page Faults Minor If there is no mapping yet for the page, but it is a valid address. This could be memory asked for by sbrk(2) but not written to yet meaning that the operating system can wait for the first write before allocating space. The OS simply makes the page, loads it into memory, and moves on. Major If the mapping to the page is not in memory but on disk. What this will do is swap the page into memory and swap another page out. If this happens frequently enough, your program is said to thrash the MMU. Invalid When you try to write to a non-writable memory address or read to a non-readable memory address. The MMU generates an invalid fault and the OS will usually generate a SIGSEGV meaning segmentation violation meaning that you wrote outside the segment that you could write to.
What is a page fault? What are the types? When does it result in a segfault?
The pipe system call is used to create a pipe. int filedes[2]; pipe (filedes); printf("read from %d, write to %d\n", filedes[0], filedes[1]); These file descriptors can be used with read - // To read... char buffer[80]; int bytesread = read(filedes[0], buffer, sizeof(buffer)); And write - write(filedes[1], "Go!", 4);
What is a pipe? How do I create a pipe?
Signal dispositions Each signal has a current disposition, which determines how the process behaves when it is delivered the signal. ... A process can change the disposition of a signal using sigaction(2) or signal(2).
What is a process Signal Disposition?
the producer-consumer problem[1][2] (also known as the bounded-buffer problem) is a classic example of a multi-process synchronization problem. The problem describes two processes, the producer and the consumer, who share a common, fixed-size buffer used as a queue. one thread can read while one thread can write they are the same thing
What is a producer consumer problem? How might the above be a producer consumer problem be used in the above section? How is a producer consumer problem related to a reader writer problem?
Remote Procedure Call. RPC is the idea that we can execute a procedure (function) on a different machine. In practice the procedure may execute on the same machine, however it may be in a different context - for example under a different user with different permissions and different lifecycle.
What is a remote procedure call? When should I use it?
A signal is a construct provided to us by the kernel. It allows one process to asynchronously send a signal (think a message) to another process. If that process wants to accept the signal, it can, and then, for most signals, can decide what to do with that signal. Here is a short list (non comprehensive) of signals.
What is a signal?
Superblock: This block contains metadata about the filesystem, how large, last modified time, a journal, number of inodes and the first inode start, number of data block and the first data block start. Inode: This is the the key abstraction. An inode is a file. Disk Blocks: These are where the data is stored. The actual contents of the file
What is a superblock? Inode? Disk block?
a type of synchronization method. A barrier for a group of threads or processes in the source code means any thread/process must stop at this point and cannot proceed until all other threads/processes reach this barrier.
What is a thread barrier?
void merge_sort(int *arr, size_t len){ if(len > 1){ //Mergesort the left half //Mergesort the right half //Merge the two halves }
What is an embarrassingly parallel problem?
Incrementing a variable (i++) is not atomic because it requires three distinct steps: Copying the bit pattern from memory into the CPU; performing a calculation using the CPU's registers; copying the bit pattern back to memory. During this increment sequence, another thread or process can still read the old value and other writes to the same memory would also be over-written when the increment sequence completes.
What is atomic operation?
The stub code is the necessary code to hide the complexity of performing a remote procedure call. One of the roles of the stub code is to marshall the necessary data into a format that can be sent as a byte stream to a remote server. The server stub code will receive the request, unmarshall the request into a valid in-memory data call the underlying implementation and send the result back to the caller.
What is marshalling/unmarshalling? Why is HTTP not an RPC?
First fit has the advantage that it will not evaluate all possible placements and therefore be faster.
What is the First Fit Placement strategy? It's a little bit better with Fragmentation, right? Expected Time Complexity?
here are critical parts of our code that can only be executed by one thread at a time
What is the critical section?
a hard link is a directory entry that associates a name with a file on a file system. All directory-based file systems must have (at least) one hard link giving the original name for each file. The term "hard link" is usually only used in file systems that allow more than one hard link for the same file. symlink(const char *target, const char *symlink); To create a symbolic link in the shell use ln -s Can refer to files that don't exist yet Unlike hard links, can refer to directories as well as regular files Can refer to files (and directories) that exist outside of the current file system Main disadvantage: Slower than regular files and directories. When the links contents are read, they must be interpreted as a new path to the target file.
What is the difference between a hard link and a symbolic link? Does the file need to exist?
The shutdown() function shall cause all or part of a full-duplex connection on the socket associated with the file descriptor socket to be shut down. The shutdown() function takes the following arguments: socket Specifies the file descriptor of the socket. how Specifies the type of shutdown. The values are as follows: SHUT_RD Disables further receive operations. SHUT_WR Disables further send operations. SHUT_RDWR Disables further send and receive operations. The shutdown() function disables subsequent send and/or receive operations on a socket, depending on the value of the how argument. #include <unistd.h> int close(int fildes); The close() function shall deallocate the file descriptor indicated by fildes. To deallocate means to make the file descriptor available for return by subsequent calls to open() or other functions that allocate file descriptors. All outstanding record locks owned by the process on the file associated with the file descriptor shall be removed (that is, unlocked). If fildes refers to a socket, close() shall cause the socket to be destroyed. If the socket is in connection-mode, and the SO_LINGER option is set for the socket with non-zero linger time, and the socket has untransmitted data, then close() shall block for up to the current linger interval until all data is transmitted. Use the shutdown call when you no longer need to read any more data from the socket, write more data, or have finished doing both. When you shutdown a socket for further writing (or reading) that information is also sent to the other end of the connection. For example if you shutdown the socket for further writing at the server end, then a moment later, a blocked read call could return 0 to indicate that no more bytes are expected. Use close when your process no longer needs the socket file descriptor.
What is the difference between a socket shutdown and closing?
paths or no paths
What is the difference between execs with a p and without a p? What does the operating system
All of these system calls are used to wait for state changes in a child of the calling process, and obtain information about the child whose state has changed. A state change is considered to be: the child terminated; the child was stopped by a signal; or the child was resumed by a signal. In the case of a terminated child, performing a wait allows the system to release the resources associated with the child; if a wait is not performed, then the terminated child remains in a "zombie" state (see NOTES beloIf wstatus is not NULL, wait() and waitpid() store status information in the int to which it points. If wstatus is not NULL, wait() and waitpid() store status information in the int to which it points. This integer can be inspected with the following macros (which take the integer itself as an argument, not a pointer to it, as is done in wait() and waitpid()!):
What is the int *status pointer passed into wait? When does wait fail?
a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very large memory."
What is virtual memory?
kill(child, SIGINT); // Equivalent to CTRL-C (by default closes the process)
What signal is sent when you press CTRL-C
The socket call creates an outgoing socket and returns a descriptor (sometimes called a 'file descriptor') that can be used with read and write etc.In this sense it is the network analog of open that opens a file stream - except that we haven't connected the socket to anything yet! The system calls send(), sendto(), and sendmsg() are used to transmit a message to another socket. The send() call may be used only when the socket is in a connected state (so that the intended recipient is known). The only difference between send() and write(2) is the presence of flags. With a zero flags argument, send() is equivalent to write(2). Also, the following call send(sockfd, buf, len, flags); is equivalent to sendto(sockfd, buf, len, flags, NULL, 0); The argument sockfd is the file descriptor of the sending socket.
When can you use read and write? How about recvfrom and sendto?
though calling sbrk(0) can be interesting because it tells you where your heap currently ends
When does the 5 line sbrk implementation of malloc have a use?
within the original stack
Where is each thread's stack?
write to heap stack and text segment. invalid memory addresses are..
Where is the heap, stack, data, and text segment? Which segments can you write to? What are invalid memory addresses?
UDP is connection less TCP is connection based
Which protocol is connection less and which one is connection based?
Wait for a thread to finish Clean up thread resources Grabs the return value of the thread Finished threads will continue to consume resources. Eventually, if enough threads are created, pthread_create will fail. In practice, this is only an issue for long-running processes but is not an issue for simple, short-lived processes as all thread resources are automatically freed when the process exits. Both pthread_exit and pthread_join will let the other threads finish on their own (even if called in the main thread). However, only pthread_join will return to you when the specified thread finishes. pthread_exit does not wait and will immediately end your thread and give you no chance to continue executing.
Why is pthread_join important (think stack space, registers, return values)?
not locked with a mutex
Why is this code dangerous? if(not_ready){ pthread_cond_wait(&cv, &mtx); }
This means sigaction can be called from any thread because you will be setting a signal handler for all threads in the process. You should use sigaction instead of signal because it has better defined semantics. signal on different operating system does different things which is bad sigaction is more portable and is better defined for threads if need be.
Why sigaction vs signal?
parallel code needs to be atomic or protected with mutex locks
Why will the following not work in parallel code //In the global section size_t a; //In pthread function for(int i = 0; i < 100000000; i++) a++; And this will? //In the global section atomic_size_t a; //In pthread function for(int i = 0; i < 100000000; i++) atomic_fetch_add(a, 1);
Arbitrator (Naive and Advanced). Problems:These solutions are slow, They have a single point of failure, the arbitrator making it a bottleneck, The arbitrator needs to also be fair, and be able to determine deadlock in the second solution, In practical systems, the arbitrator tends to give the forks repeatedly to philosophers that just ate because of process scheduling Leaving the Table (Stallings' Solution) Problems: The solution requires a lot of context switching which is very expensive for the CPU, You need to know about the number of resources before hand in order to only let that number of philosophers, Again priority is given to the processes who have already eaten. Partial Ordering (Dijkstra's Solution) Problems: The philosopher needs to know the set of resources in order before grabbing any resources, You need to define a partial order to all of the resources, Prioritizes philosopher who have already eaten. There are many more advanced solutions a non-exhaustive list includes Clean/Dirty Forks (Chandra/Misra Solution) Actor Model (other Message passing models) Super Arbitrators (Complicated pipelines)
Working DP Solutions: Benefits/Drawbacks
(computing), or folder, a file system structure in which to store computer files; Directory (OpenVMS command)
Working with Directories
targets the largest unallocated space, it is a poor choice if large allocations are required.
Worst Fit
monitor ProducerConsumer { int itemCount = 0; condition full; condition empty; procedure add(item) { if (itemCount == BUFFER_SIZE) { wait(full); } putItemIntoBuffer(item); itemCount = itemCount + 1; if (itemCount == 1) { notify(empty); } } procedure remove() { if (itemCount == 0) { wait(empty); } item = removeItemFromBuffer(); itemCount = itemCount - 1; if (itemCount == BUFFER_SIZE - 1) { notify(full); } return item; } } procedure producer() { while (true) { item = produceItem(); ProducerConsumer.add(item); } } procedure consumer() { while (true) { item = ProducerConsumer.remove(); consumeItem(item); } }
Write code to implement a producer consumer using condition variables and a mutex. Assume there can be more than one thread calling enqueue and dequeue.
mutex buffer_mutex; // similar to "semaphore buffer_mutex = 1", but different (see notes below) semaphore fillCount = 0; semaphore emptyCount = BUFFER_SIZE; procedure producer() { while (true) { item = produceItem(); down(emptyCount); down(buffer_mutex); putItemIntoBuffer(item); up(buffer_mutex); up(fillCount); } } procedure consumer() { while (true) { down(fillCount); down(buffer_mutex); item = removeItemFromBuffer(); up(buffer_mutex); up(emptyCount); consumeItem(item); } }
Write up a Producer/Consumer queue, How about a producer consumer stack?
#include <unistd.h> #include <stdlib.h> #include <stdio.h> int main() { int fh[2]; pipe(fh); FILE *reader = fdopen(fh[0], "r"); FILE *writer = fdopen(fh[1], "w"); pid_t p = fork(); if (p > 0) { int score; fscanf(reader, "Score %d", &score); printf("The child says the score is %d\n", score); } else { fprintf(writer, "Score %d", 10 + 10); fflush(writer); } return 0; }
Writing to a zero reader pipe Reading from a zero writer pipe
epoll is not part of POSIX, but it is supported by Linux. It is a more efficient way to wait for many file descriptors. It will tell you exactly which descriptors are ready. It even gives you a way to store a small amount of data with each descriptor, like an array index or a pointer, making it easier to access your data associated with that descriptor. Given three sets of file descriptors, select() will wait for any of those file descriptors to become 'ready'. readfds - a file descriptor in readfds is ready when there is data that can be read or EOF has been reached. writefds - a file descriptor in writefds is ready when a call to write() will succeed. exceptfds - system-specific, not well-defined. Just pass NULL for this. select() returns the total number of file descriptors that are ready. If none of them become ready during the time defined by timeout, it will return 0. After select() returns, the caller will need to loop through the file descriptors in readfds and/or writefds to see which ones are ready. As readfds and writefds act as both input and output parameters, when select() indicates that there are file descriptors which are ready, it would have overwritten them to reflect only the file descriptors which are ready. Unless it is the caller's intention to call select() only once, it would be a good idea to save a copy of readfds and writefds before calling it.
epoll vs select
get child pid get parent pid
getpid vs getppid
A thread is short for 'thread-of-execution'. It represents the sequence of instructions that the CPU has (and will) execute. To remember how to return from function calls, and to store the values of automatic variables and parameters a thread uses a stack.
pthread lifecycle
The recv(), recvfrom(), and recvmsg() calls are used to receive messages from a socket. They may be used to receive data on both connectionless and connection-oriented sockets. This page first describes common features of all three system calls, and then describes the differences between the calls. The only difference between recv() and read(2) is the presence of flags. With a zero flags argument, recv() is generally equivalent to read(2) (but see NOTES). Also, the following call recv(sockfd, buf, len, flags); is equivalent to recvfrom(sockfd, buf, len, flags, NULL, NULL);
recvfrom
By calling sbrk the C library can increase the size of the heap as your program demands more heap memory. As the heap and stack (one for each thread) need to grow, we put them at opposite ends of the address space. So for typical architectures the heap will grow upwards and the stack grows downwards. Modern operating system memory allocators no longer need sbrk - instead they can request independent regions of virtual memory and maintain multiple memory regions.
sbrk
Use the shutdown call when you no longer need to read any more data from the socket, write more data, or have finished doing both. When you shutdown a socket for further writing (or reading) that information is also sent to the other end of the connection. For example if you shutdown the socket for further writing at the server end, then a moment later, a blocked read call could return 0 to indicate that no more bytes are expected.
shutdown
Most programming languages offered buffered I/O features by default, since it makes generating output much more efficient. These buffered I/O facilities typically "Just Work" out of the box. But sometimes they don't. When we say they "don't work" what we mean is that excess buffering occurs, causing data not to be printed in a timely manner. This is typically fixed by explicitly putting a "flush" call in the code, e.g. with something like sys.stdout.flush() in Python, fflush(3) in C, or std::flush in C++.
Buffering of stdout
one of the POSIX errors that you can get from different blocking functions (send, recv, poll, sem_wait etc.) The POSIX standard unhelpfully describes it as "Interrupted function."
EINTR
Hypertext Transfer Protocol is an application protocol for distributed, collaborative, and hypermedia information systems
HTTP
read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf. On files that support seeking, the read operation commences at the file offset, and the file offset is incremented by the number of bytes read. If the file offset is at or past the end of file, no bytes are read, and read() returns zero. If count is zero, read() may detect the errors described below. In the absence of any errors, or if read() does not check for errors, a read() with a count of 0 returns zero and has no other effects. According to POSIX.1, if count is greater than SSIZE_MAX, the result is implementation-defined; see NOTES for the upper limit on Linux. write() writes up to count bytes from the buffer starting at buf to the file referred to by the file descriptor fd. The number of bytes written may be less than count if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)), or the call was interrupted by a signal handler after having written less than count bytes. The open() system call opens the file specified by pathname. If the specified file does not exist, it may optionally (if O_CREAT is specified in flags) be created by open(). The return value of open() is a file descriptor, a small, nonnegative integer that is used in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.) to refer to the open file. The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
POSIX file IO (read, write, open)
Use raise or kill int raise(int sig); // Send a signal to myself! int kill(pid_t pid, int sig); // Send a signal to another process
Raising Signals in C
Use pthread_kill int pthread_kill(pthread_t thread, int sig) In the example below, the newly created thread executing func will be interrupted by SIGINT pthread_create(&tid, NULL, func, args); pthread_kill(tid, SIGINT); pthread_kill(pthread_self(), SIGKILL); // send SIGKILL to myself
Raising Signals in a multithreaded program
Terminate Process (Can be caught) Tell the process to stop nicely
SIGINT
means what action will occur when a signal is delivered to the process. For example, the default disposition SIGINT is to terminate it. The signal disposition can be changed by calling signal() (which is simple but not portable as there are subtle variations in its implementation on different POSIX architectures and also not recommended for multi-threaded programs) or sigaction (discussed later). You can imagine the processes' disposition to all possible signals as a table of function pointers entries (one for each possible signal). The default disposition for signals can be to ignore the signal, stop the process, continue a stopped process, terminate the process, or terminate the process and also dump a 'core' file. Note a core file is a representation of the processes' memory state that can be inspected using a debugger.
Signal Disposition
Both the signal mask and the signal disposition carries over to the exec-ed program. https://www.gnu.org/software/libc/manual/html_node/Executing-a-File.html#Executing-a-File Pending signals are preserved as well. Signal handlers are reset, because the original handler code has disappeared along with the old process.
Signal Disposition when Forking/Exec
An async-signal-safe function is one that can be safely called from within a signal handler. Many functions are not async-signal-safe. In particular, nonreentrant functions are generally unsafe to call from a signal handler. A concept similar to thread safety is Async-Signal safety. Async-Signal-Safe operations are guaranteed not to interfere with operations that are being interrupted. The problem of Async-Signal safety arises when the actions of a signal handler can interfere with the operation that is being interrupted.
Signal Handler Safe
it is possible to have signals that are in a pending state. If a signal is pending, it means it has not yet been delivered to the process. The most common reason for a signal to be pending is that the process (or thread) has currently blocked that particular signal. If a particular signal, e.g. SIGINT, is pending then it is not possible to queue up the same signal again. It is possible to have more than one signal of a different type in a pending state. For example SIGINT and SIGTERM signals may be pending (i.e. not yet delivered to the target process)
Signal States
SIGINT Terminate Process (Can be caught) Tell the process to stop nicely SIGQUIT Terminate Process (Can be caught) Tells the process to stop harshly SIGSTOP Stop Process (Cannot be caught) Stops the process to be continued SIGCONT Continues a Process Continues to run the process SIGKILL Terminate Process (Cannot be Ignored) You want your process gone
Signals
The strcmp() function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2. The strncmp() function is similar, except it compares only the first (at most) n bytes of s1 and s2. The strcat() function appends the src string to the dest string, overwriting the terminating null byte ('\0') at the end of dest, and then adds a terminating null byte. The strings may not overlap, and the dest string must have enough space for the result. If dest is not large enough, program behavior is unpredictable; buffer overruns are a favorite avenue for attacking secure programs. The strncat() function is similar, except that * it will use at most n bytes from src; and * src does not need to be null-terminated if it contains n or more bytes. As with strcat(), the resulting string in dest is always null- terminated. If src contains n or more bytes, strncat() writes n+1 bytes to dest (n from src plus the terminating null byte). Therefore, the size of dest must be at least strlen(dest)+n+1. A simple implementation of strncat() might be: char * strncat(char *dest, const char *src, size_t n) { size_t dest_len = strlen(dest); size_t i; for (i = 0 ; i < n && src[i] != '\0' ; i++) dest[dest_len + i] = src[i]; dest[dest_len + i] = '\0'; return dest; } The strcpy() function copies the string pointed to by src, including the terminating null byte ('\0'), to the buffer pointed to by dest. The strings may not overlap, and the destination string dest must be large enough to receive the copy. Beware of buffer overruns! (See BUGS.) The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated. If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written. A simple implementation of strncpy() might be: char * strncpy(char *dest, const char *src, size_t n) { size_t i; for (i = 0; i < n && src[i] != '\0'; i++) dest[i] = src[i]; for ( ; i < n; i++) dest[i] = '\0'; return dest; }
Simple C string functions (strcmp, strcat, strcpy)
The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3). The strndup() function is similar, but copies at most n bytes. If s is longer than n, only n bytes are copied, and a terminating null byte ('\0') is added. strdupa() and strndupa() are similar, but use alloca(3) to allocate the buffer. They are available only when using the GNU GCC suite, and suffer from the same limitations described in alloca(3).
String duplication
The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3). The strndup() function is similar, but copies at most n bytes. If s is longer than n, only n bytes are copied, and a terminating null byte ('\0') is added. strdupa() and strndupa() are similar, but use alloca(3) to allocate the buffer. They are available only when using the GNU GCC suite, and suffer from the same limitations described in alloca(3).
String truncation
is a connection-based protocol that is built on top of IPv4 and IPv6 (and therefore can be described as "TCP/IP" or "TCP over IP"). TCP creates a pipe between two machines and abstracts away the low level packet-nature of the Internet: Thus, under most conditions, bytes sent from one machine will eventually arrive at the other end without duplication or data loss.
TCP
A translation lookaside buffer (TLB) is a memory cache that is used to reduce the time taken to access a user memory location. It is a part of the chip's memory-management unit (MMU). The TLB stores the recent translations of virtual memory to physical memory and can be called an address-translation cache.
TLB
a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very large memory."
Virtual Memory
srt1 is the size of the whole thing str2 is only the size of the pointer
What are the differences between the following two declarations? What does sizeof return for one of them? char str1[] = "bhuvan"; char *str2 = "another one";
It's a small place Hello World
What does the following print out int main(){ fprintf(stderr, "Hello "); fprintf(stdout, "It's a small "); fprintf(stderr, "World\n"); fprintf(stdout, "place\n"); return 0; }
the addresses of each of the variables to look like this. &bhuvan // 0x100 &bhuvan.firstname // 0x100 = 0x100+0x00 &bhuvan.lastname // 0x114 = 0x100+0x14 &bhuvan.phone // 0x128 = 0x100+0x28 pointer and deallocate
What is the & operator? How about *?
%s %d %c
What is the printf specifier to print a string, int, or char?
4 bytes and 4 bytes
What should the following usually return? int *ptr; sizeof(ptr); sizeof(*ptr);
The array name points to the first byte of the array. The array is mutable, so we can change its contents (be careful not to write bytes beyond the end of the array though).
arrays
When a thread reaches a barrier, it will wait at the barrier until all the threads reach the barrier, and then they'll all proceed together.
barrier
an array of characters
c strings
array vs pointer
char p[]vs char* p
changes the file mode bits of each given file according to mode, which can be either a symbolic representation of changes to make, or an octal number representing the bit pattern for the new mode bits.
chmod
A computer network in which one centralized, powerful computer (called the server) is a hub to which many less powerful personal computers or workstations (called clients) are connected.
client/server
There are four necessary and sufficient conditions for deadlock. These are known as the Coffman conditions. Mutual Exclusion Circular Wait Hold and Wait No pre-emption If you break any of them, you cannot have deadlock!
coffman conditions
Condition variables allow a set of threads to sleep until tickled! You can tickle one thread or all threads that are sleeping. If you only wake one thread then the operating system will decide which thread to wake up. You don't wake threads directly instead you 'signal' the condition variable, which then will wake up one (or all) threads that are sleeping inside the condition variable. Condition variables are used with a mutex and with a loop (to check a condition).
condition variables
is the process of storing the state of a process or of a thread, so that it can be restored and execution resumed from the same point later. This allows multiple processes to share a single CPU, and is an essential feature of a multitasking operating system.
context switch
occurs when a process or thread enters a waiting state because a requested system resource is held by another waiting process, which in turn is waiting for another resource held by another waiting process.
deadlock
So you have your philosophers sitting around a table all wanting to eat some pasta (or whatever that is) and they are really hungry. Each of the philosophers are essentially the same, meaning that each philosopher has the same instruction set based on the other philosopher ie you can't tell every even philosopher to do one thing and every odd philosopher to do another thing.
dining philosophers
free will fail when a string is already free
double-free error
API performs a similar task to poll(2): monitoring multiple file descriptors to see if I/O is possible on any of them. The epoll API can be used either as an edge-triggered or a level-triggered interface and scales well to large numbers of watched file descriptors.
epoll
utility shall cause the shell to exit with the exit status specified by the unsigned decimal integer n. If n is specified, but its value is not between 0 and 255 inclusively, the exit status is undefined.
exit
Input/Ouput to and from files. File input and file output is an essential in programming. Most software involves more than keyboard input and screen user interfaces. Data needs to be stored somewhere when a program is not running, and that means writing data to disk
file I/O
The design of a file system is difficult problem because there many high-level design goals that we'd like to satisfy. An incomplete list of ideal goals include: Reliable and robust (even with hardware failures or incomplete writes due to power loss) Access (security) controls Accounting and quotas Indexing and search Versioning and backup capabilities Encryption Automatic compression High performance (e.g. Caching in-memory) Efficient use of storage de-duplication Not all filesystems natively support all of these goals. For example, many filesystems do not automatically compress rarely-used files
file system representation
The C POSIX library is a specification of a C standard library for POSIX systems. It was developed at the same time as the ANSI C standard. Some effort was made to make POSIX compatible with standard C; POSIX includes additional functions to those introduced in standard C.
fileio POSIX vs. C library
A common programming pattern is to call fork followed by exec and wait. The original process calls fork, which creates a child process. The child process then uses exec to start execution of a new program. Meanwhile the parent uses wait (or waitpid) to wait for the child process to finish. See below for a complete code example.
fork/exec/wait
int fprintf(FILE *stream, const char *format, ...);
fprintf
function frees the memory space pointed to by ptr, which must have been returned by a previous call to malloc(), calloc(), or realloc().
free
malloc, alloc and realloc
heap allocator
Stack is used for static memory allocation and Heap for dynamic memory allocation, both stored in the computer's RAM . Variables allocated on the stack are stored directly to the memory and access to this memory is very fast, and it's allocation is dealt with when the program is compiled.
heap/stack
Files and directories are inodes. On Unix, the collection of data that makes up the contents of a directory or a file isn't stored under a name; the data is stored as part of a data structure called an "inode".
inode vs name
function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().
malloc
can't store anything when there isnt room
memory out of bounds errors
Create named pipes (FIFOs) with the given NAMEs. Mandatory arguments to long options are mandatory for short options too. -m, --mode=MODE set file permission bits to MODE, not a=rw - umask -Z set the SELinux security context to default type --context[=CTX] like -Z, or if CTX is specified then set the SELinux or SMACK security context to CTX --help display this help and exit --version output version information and exit
mkfifo
creates a new mapping in the virtual address space of the calling process. The starting address for the new mapping is specified in addr. The length argument specifies the length of the mapping (which must be greater than 0).
mmap
(short for Mutual Exclusion) pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER; // global variable pthread_mutex_lock(&m); // start of Critical Section pthread_mutex_unlock(&m); //end of Critical Section
mutexes
a process-specific or an application-specific software construct serving as a communication endpoint, which is used by the Transport Layer protocols of Internet Protocol suite, such as User Diagram Protocol (UDP) and Transmission Control Protocol (TCP).
network ports
108?
ointer Arithmetic. Assume the following addresses. What are the following shifts? char** ptr = malloc(10); //0x100 ptr[0] = malloc(20); //0x200 ptr[1] = malloc(20); //0x300 * `ptr + 2` * `ptr + 4` * `ptr[0] + 4` * `ptr[1] + 2000` * `*((int)(ptr + 1)) + 3`
system call opens the file specified by pathname. If the specified file does not exist, it may optionally (if O_CREAT is specified in flags) be created by open(). The return value of open() is a file descriptor, a small, nonnegative integer that is used in subsequent system calls (read(2), write(2), lseek(2), fcntl(2), etc.) to refer to the open file. The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process. closes a file descriptor, so that it no longer refers to any file and may be reused. Any record locks (see fcntl(2)) held on the file it was associated with, and owned by the process, are removed (regardless of the file descriptor that was used to obtain the lock). If fd is the last file descriptor referring to the underlying open file description (see open(2)), the resources associated with the open file description are freed; if the file descriptor was the last reference to a file which has been removed using unlink(2), the file is deleted.
open/close
he software that supports a computer's basic functions, such as scheduling tasks, executing applications, and controlling peripherals. Operating System (OS) Software that controls a computer. ... Graphical User Interface (GUI) An interface that uses graphics as compared to command-driven interface. Backward-Compatible. ... Original Equipment Manufacture (OEM) License. ... Aero User Interface. ... Dual Boot. ... Virtual Machine. ... Distributions.
operating system terms
(sometimes called #PF, PF or hard fault) is a type of exception raised by computer hardware when a running program accesses a memory page that is not currently mapped by the memory management unit (MMU) into the virtual address space of a process.
page fault
the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are used by the accessing process, while physical addresses are used by the hardware, or more specifically, by the RAM subsystem.
page tables
A data structure which contains variables is called a partial data structure, because it will eventually be completed by substituting the variable with an actual data term. The most common case of a partial data structure is a list whose tail is not yet instantiated.
partial data
Pipes and FIFOs (also known as named pipes) provide a unidirectional interprocess communication channel. A pipe has a read end and a write end. Data written to the write end of a pipe can be read from the read end of the pipe. A pipe is created using pipe(2), which creates a new pipe and returns two file descriptors, one referring to the read end of the pipe, the other referring to the write end. Pipes can be used to create a communication channel between related processes; see pipe(2) for an example.
pipes
You can add an integer to a pointer. However, the pointer type is used to determine how much to increment the pointer. For char pointers this is trivial because characters are always one byte: char *ptr = "Hello"; // ptr holds the memory location of 'H' ptr += 2; //ptr now points to the first'l'
pointer arithmetic
A pointer refers to a memory address. The type of the pointer is useful - it tells the compiler how many bytes need to be read/written. You can declare a pointer as follows. int *ptr1; char *ptr2;
pointers
The printf() function shall place output on the standard output stream stdout. int dprintf(int fildes, const char *restrict format, ...); int fprintf(FILE *restrict stream, const char *restrict format, ...); int printf(const char *restrict format, ...); int snprintf(char *restrict s, size_t n, const char *restrict format, ...); int sprintf(char *restrict s, const char *restrict format, ...);
printing (printf)
Producer-consumer problem. In computing, the producer-consumer problem (also known as the bounded-buffer problem) is a classic example of a multi-process synchronization problem. The problem describes two processes, the producer and the consumer, who share a common, fixed-size buffer used as a queue.
producer/consumer
mutex either yields or blocks. Here yields means to repeatedly poll whether progress can be made, and if not, temporarily yield the processor. To block means to yield the processor until the mutex permits progress
progress/mutex
is the behavior of an electronic, software, or other system where the output is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when events do not happen in the order the programmer intended.
race conditions
attempts to read up to count bytes from file descriptor fd into the buffer starting at buf. On files that support seeking, the read operation commences at the file offset, and the file offset is incremented by the number of bytes read. If the file offset is at or past the end of file, no bytes are read, and read() returns zero. writes up to count bytes from the buffer starting at buf to the file referred to by the file descriptor fd. The number of bytes written may be less than count if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)), or the call was interrupted by a signal handler after having written less than count bytes. (See also pipe(7).)
read/write
Multiple threads should be able to look up (read) values at the same time provided the data structure is not being written to. The writers are not so gregarious - to avoid data corruption, only one thread at a time may modify (write) the data structure (and no readers may be reading at that time).
reader/writer
tracks which resource is held by which process and which process is waiting for a resource of a particular type. It is very powerful and simple tool to illustrate how interacting processes can deadlock. If a process is using a resource, an arrow is drawn from the resource node to the process node. If a process is requesting a resource, an arrow is drawn from the process node to the resource node.
resource allocation graphs
a simple, usually fixed-sized, storage mechanism where contiguous memory is treated as if it is circular, and two index counters keep track of the current beginning and end of the queue. As array indexing is not circular, the index counters must wrap around to zero when moved past the end of the array. As data is added (enqueued) to the front of the queue or removed (dequeued) from tail of the queue, the current items in the buffer form a train that appears to circle the track
ring buffer
family of functions scans input according to format as described below. This format may contain conversion specifications; the results from such conversions, if any, are stored in the locations pointed to by the pointer arguments that follow format. Each pointer argument must be of a type that is appropriate for the value returned by the corresponding conversion specification. If the number of conversion specifications in format exceeds the number of pointer arguments, the results are undefined. If the number of pointer arguments exceeds the number of conversion specifications, then the excess pointer arguments are evaluated, but are otherwise ignored.
scanf
is the problem of efficiently selecting which process to run on a system's CPU cores. In a busy system there will be more ready-to-run processes than there are CPU cores, so the system kernel must evaluate which processes should be scheduled to run on the CPU and which processes should be placed in a ready queue to be executed later.
scheduling
allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform a corresponding I/O operation (e.g., read(2) without blocking, or a sufficiently small write(2)). select() can monitor only file descriptors numbers that are less than FD_SETSIZE; poll(2) does not have this limitation.
select
contains a value and supports two operations "wait" and "post". Post increments the semaphore and immediately returns. "wait" will wait if the count is zero. If the count is non-zero the semaphore decrements the count and immediately returns.
semaphores
a construct provided to us by the kernel. It allows one process to asynchronously send a signal (think a message) to another process. If that process wants to accept the signal, it can, and then, for most signals, can decide what to do with that signal. Here is a short list (non comprehensive) of signals.
signals
Returns size in bytes of the object representation of type. Returns size in bytes of the object representation of the type that would be returned by expression, if evaluated.
sizeof
1 byte
sizeof char
depends on what x is but the pointer is 4 bytes
sizeof x vs x*
These functions return information about a file, in the buffer pointed to by statbuf. No permissions are required on the file itself, but—in the case of stat(), fstatat(), and lstat()—execute (search) permission is required on all of the directories in pathname that lead to the file. stat() and fstatat() retrieve information about the file pointed to by pathname; the differences for fstatat() are described below.
stat
set size
static memory
The input stream is referred to as "standard input"; the output stream is referred to as "standard output"; and the error stream is referred to as "standard error". These terms are abbreviated to form the symbols used to refer to these files, namely stdin, stdout, and stderr.
stderr/stdout
a symbolic link (also symlink or soft link) is the nickname for any file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution.
symlinks
function starts a new thread in the calling process. The new thread starts execution by invoking start_routine(); arg is passed as the sole argument of start_routine(). function waits for the thread specified by thread to terminate. If that thread has already terminated, then pthread_join() returns immediately. The thread specified by thread must be joinable. function terminates the calling thread and returns a value via retval that (if the thread is joinable) is available to another thread in the same process that calls pthread_join(3).
thread control (_create, _join, _exit)
pthread_cond_init(&cv, NULL); pthread_mutex_init(&m, NULL);
variable initializers
can be either of global or local scope. A global variable is a variable declared in the main body of the source code, outside all functions, while a local variable is one declared within the body of a function or a block.
variable scope
thrashing occurs when a computer's virtual memory subsystem is in a constant state of paging, rapidly exchanging data in memory for data on disk, to the exclusion of most application-level processing. This causes the performance of the computer to degrade or collapse
vm thrashing
Feature Test Macro Requirements for glibc (see feature_test_macros(7)): waitid(): Since glibc 2.26: _XOPEN_SOURCE >= 500 || _POSIX_C_SOURCE >= 200809L Glibc 2.25 and earlier: _XOPEN_SOURCE || /* Since glibc 2.12: */ _POSIX_C_SOURCE >= 200809L || /* Glibc versions <= 2.19: */ _BSD_SOURCE
wait macros
The value in errno is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno. The value of errno is never set to zero by any system call or library function.
write/read with errno