Operating Systems (Educative)
system calls
Of course, in order to allow users to tell the OS what to do and thus make use of the features of the virtual machine (such as running a program, or allocating memory, or accessing a file), the OS also provides some interfaces (APIs) that you can call. A typical OS, in fact, exports a few hundred system calls that are available to applications.
interfaces (APIs)
Of course, to run programs, and stop them, and otherwise tell the OS which programs to run, there need to be some interfaces (APIs) that you can use to communicate your desires to the OS. We'll talk about these APIs throughout this course; indeed, they are the major ways in which most users interact with operating systems.
virtualizing the CPU
It turns out that the operating system, with some help from the hardware, is in charge of this illusion, i.e., the illusion that the system has a very large number of virtual CPUs. Turning a single CPU (or a small set of them) into a seemingly infinite number of CPUs and thus allowing many programs to seemingly run at once is what we call virtualizing the CPU, the focus of the first major part of this course. https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/virtualizing-the-cpu
concurrency
We use this conceptual term to refer to a host of problems that arise, and must be addressed, when working on many things at once (i.e., concurrently) in the same program. The problems of concurrency arose first within the operating system itself; as you can see in the examples on virtualization, the OS is juggling many things at once, first running one process, then another, and so forth. As it turns out, doing so leads to some deep and interesting problems.
So, what happens when a program runs?
Well, a running program does one very simple thing: it executes instructions. Many millions (and these days, even billions) of times every second, the processor fetches an instruction from memory, decodes it (i.e., figures out which instruction this is), and executes it (i.e., it does the thing that it is supposed to do, like add two numbers together, access memory, check a condition, jump to a function, and so forth). After it is done with this instruction, the processor moves on to the next instruction, and so on, and so on, until the program finally completes. Thus, we have just described the basics of the Von Neumann model of computing. Sounds simple, right? But in this course, we will be learning that while a program runs, a lot of other wild things are going on with the primary goal of making the system easy to use.
physical memory
Now, let's consider memory. The model of physical memory presented by modern machines is very simple. Memory is just an array of bytes; to read memory, one must specify an address to be able to access the data stored there; to write (or update) memory, one must also specify the data to be written to the given address. Memory is accessed all the time when a program is running. A program keeps all of its data structures in memory and accesses them through various instructions, like loads and stores or other explicit instructions that access memory in doing their work. Don't forget that each instruction of the program is in memory too; thus memory is accessed on each instruction fetch.
thread (initial conceptual definition)
You can think of a thread as a function running within the same memory space as other functions, with more than one of them active at a time. In this example, each thread starts running in a routine called worker(), in which it simply increments a counter in a loop for loops number of times. https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/concurrency
policies
On top of these mechanisms resides some of the intelligence in the OS, in the form of policies. Policies are algorithms for making some kind of decision within the OS. For example, given a number of possible programs to run on a CPU, which program should the OS run? A scheduling policy in the OS will make this decision, likely using historical information (e.g., which program has run more over the last minute?), workload knowledge (e.g., what types of programs are run), and performance metrics (e.g., is the system optimizing for interactive performance, or throughput?) to make its decision.
THE CRUX OF THE PROBLEM: HOW TO VIRTUALIZE RESOURCES
One central question we will answer in this course is quite simple: how does the operating system virtualize resources? This is the crux of our problem. Why the OS does this is not the main question, as the answer should be obvious: it makes the system easier to use. Thus, we focus on the how: what mechanisms and policies are implemented by the OS to attain virtualization? How does the OS do so efficiently? What hardware support is needed?
High performance: minimum cost
One goal in designing and implementing an operating system is to provide high performance; another way to say this is our goal is to minimize the overheads of the OS. Virtualization and making the system easy to use are well worth it, but not at any cost; thus, we must strive to provide virtualization and other OS features without excessive overheads. These overheads arise in a number of forms: extra time (more instructions) and extra space (in memory or on disk). We'll seek solutions that minimize one or the other or both, if possible. Perfection, however, is not always attainable, something we will learn to notice and (where appropriate) tolerate.
abstractions
One of the most basic goals is to build up some abstractions in order to make the system convenient and easy to use. Abstractions are fundamental to everything we do in computer science. Abstraction makes it possible to write a large program by dividing it into small and understandable pieces, to write such a program in a high-level language like C without thinking about assembly, to write code in assembly without thinking about logic gates, and to build a processor out of gates without thinking too much about transistors. Abstraction is so fundamental that sometimes we forget its importance, but we won't here; thus, in each section, we'll discuss some of the major abstractions that have developed over time, giving you a way to think about pieces of the OS.
THE CRUX OF THE PROBLEM: HOW TO BUILD CORRECT CONCURRENT PROGRAMS
When there are many concurrently executing threads within the same memory space, how can we build a correctly working program? What primitives are needed from the OS? What mechanisms should be provided by the hardware? How can we use them to solve the problems of concurrency?
policy of the OS
You might also notice that the ability to run multiple programs at once raises all sorts of new questions. For example, if two programs want to run at a particular time, which should run? This question is answered by a policy of the OS; policies are used in many different places within an OS to answer these types of questions
This lesson focuses on what you can achieve through the combination of the system calls you just learned. (fork, wait, exec) Shell Running p4.c Pipeline
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/why-motivating-the-api
THE CRUX OF THE PROBLEM: HOW TO PROVIDE THE ILLUSION OF MANY CPUs? Although there are only a few physical CPUs available, how can the OS provide the illusion of a nearly-endless supply of said CPUs?
time sharing
Protection
Another goal will be to provide protection between applications, as well as between the OS and applications. Because we wish to allow many programs to run at the same time, we want to make sure that the malicious or accidental bad behavior of one does not harm others; we certainly don't want an application to be able to harm the OS itself (as that would affect all programs running on the system). Protection is at the heart of one of the main principles underlying an operating system, which is that of isolation; isolating processes from one another is the key to protection and thus underlies much of what an OS must do.
software to be able to store data persistently
file system
In this lesson, you will learn about direct execution and its limitations. Direct execution protocol limited direct execution
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/basic-technique-limited-direct-execution
In this lesson, you will learn how the OS keeps track of the processes with the help of data structures. Information of a process that the OS tracks register context, zombie state, process list/task list, process control block (PCB)
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/data-structures
Exercise- Virtualization: Processes
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/exercise-39wn5KYljmM
Exercise- Virtualization: Direct Execution
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/exercise-N0y5pxLk99p
Exercise -Virtualization: Process API
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/exercise-Y58MmM40MMK
Finally, The exec() System Call Running p3.c TIP: GETTING IT RIGHT (LAMPSON'S LAW)
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/finally-the-exec-system-call
Introduction to Direct Execution THE CRUX: HOW TO EFFICIENTLY VIRTUALIZE THE CPU WITH CONTROL
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/introduction-to-direct-execution
Introduction to Process API CRUX: HOW TO CREATE AND CONTROL PROCESSES ASIDE: INTERLUDES
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/introduction-to-process-api
io.c
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/persistence To accomplish this task, the program makes three calls into the operating system. The first, a call to open(), opens the file and creates it; the second, write(), writes some data to the file; the third, close(), simply closes the file thus indicating the program won't be writing any more data to it. These system calls are routed to the part of the operating system called the file system, which then handles the requests and returns some kind of error code to the user.
standard library
Because the OS provides these (system) calls to run programs, access memory and devices, and other related actions, we also sometimes say that the OS provides a standard library to applications.
resource manager
Finally, because virtualization allows many programs to run (thus sharing the CPU), and many programs to concurrently access their own instructions and data (thus sharing memory), and many programs to access devices (thus sharing disks and so forth), the OS is sometimes known as a resource manager. Each of the CPU, memory, and disk is a resource of the system; it is thus the operating system's role to manage those resources, doing so efficiently or fairly or indeed with many other possible goals in mind
virtualizing memory
Indeed, that is exactly what is happening here as the OS is virtualizing memory. Each process accesses its own private virtual address space (sometimes just called its address space), which the OS somehow maps onto the physical memory of the machine. A memory reference within one running program does not affect the address space of other processes (or the OS itself); as far as the running program is concerned, it has physical memory all to itself. The reality, however, is that physical memory is a shared resource, managed by the operating system. Exactly how all of this is accomplished is also the subject of the first part of this course, on the topic of virtualization.
process identifier
(the PID) of the running program. This PID is unique per running process.
atomically
As it turns out, the reason for these odd and unusual outcomes relate to how instructions are executed, which is one at a time. Unfortunately, a key part of the program above, where the shared counter is incremented, takes three instructions: one to load the value of the counter from memory into a register, one to increment it, and one to store it back into memory. Because these three instructions do not execute atomically (all at once), strange things can happen
Other goals
Other goals make sense: energy-efficiency is important in our increasingly green world; security (an extension of protection, really) against malicious applications is critical, especially in these highly-networked times; mobility is increasingly important as OSes are run on smaller and smaller devices. Depending on how the system is used, the OS will have different goals and thus likely be implemented in at least slightly different ways. However, as we will see, many of the principles we will present on how to build an OS are useful on a range of different devices.
design requirements
So, now you have some idea of what an OS actually does: it takes physical resources, such as a CPU, memory, or disk, and virtualizes them. It handles tough and tricky issues related to concurrency. And it stores files persistently, thus making them safe over the long-term. Given that we want to build such a system, we want to have some goals in mind to help focus our design and implementation and make trade-offs as necessary; finding the right set of trade-offs is a key to building systems.
time sharing
The OS creates this illusion by virtualizing the CPU. By running one process, then stopping it and running another, and so forth, the OS can promote the illusion that many virtual CPUs exist when in fact there is only one physical CPU (or a few). This basic technique, known as time sharing of the CPU, allows users to run as many concurrent processes as they would like; the potential cost is performance, as each will run more slowly if the CPU(s) must be shared. To implement the virtualization of the CPU, and to implement it well, the OS will need both some low-level machinery and some high-level intelligence. We call the low-level machinery mechanisms. Mechanisms are low-level methods or protocols that implement a needed piece of functionality. For example, we'll learn later how to implement a context switch, which gives the OS the ability to stop running one program and start running another on a given CPU; this time-sharing mechanism is employed by all modern OSes.
abstraction provided by the OS of a running program
The abstraction provided by the OS of a running program is something we will call a process. As we said above, a process is simply a running program; at any instant in time, we can summarize a process by taking an inventory of the different pieces of the system it accesses or affects during the course of its execution.
hardware to be able to store data persistently
The hardware comes in the form of some kind of input/output or I/O device; in modern systems, a hard drive is a common repository for long-lived information, although solid-state drives (SSDs) are making headway in this arena as well.
Reliability
The operating system must also run non-stop; when it fails, all applications running on the system fail as well. Because of this dependence, operating systems often strive to provide a high degree of reliability. As operating systems grow ever more complex (sometimes containing millions of lines of code), building a reliable operating system is quite a challenge — and indeed, much of the on-going research in the field ( including some of our own work) focuses on this exact problem.
virtualization
The primary way the OS does this is through a general technique that we call virtualization. That is, the OS takes a physical resource (such as the processor, or memory, or a disk) and transforms it into a more general, powerful, and easy-to-use virtual form of itself. Thus, we sometimes refer to the operating system as a virtual machine.
file system
The software in the operating system that usually manages the disk is called the file system; it is thus responsible for storing any files the user creates in a reliable and efficient manner on the disks of the system. Unlike the abstractions provided by the OS for the CPU and memory, the OS does not create a private, virtualized disk for each application. Rather, it is assumed that oftentimes, users will want to shareinformation that is in files. For example, when writing a C program, you might first use an editor (e.g., Emacs) to create and edit the C file (emacs -nw main.c). Once done, you might use the compiler to turn the source code into an executable (e.g., gcc -o main main.c). When you're finished, you might run the new executable (e.g., ./main). Thus, you can see how files are shared across different processes. First, Emacs creates a file that serves as input to the compiler; the compiler uses that input file to create a new executable file (in many steps — take a compiler course for details); finally, the new executable is then run. And thus, a new program is born! https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/persistence
persistence
The third major theme of the course is persistence. In system memory, data can be easily lost, as devices such as DRAM store values in a volatile manner; when the power goes away or the system crashes, any data in memory is lost. Thus, we need hardware and software to be able to store data persistently; such storage is thus critical to any system as users care a great deal about their data.
operating system (OS)
There is a body of software, in fact, that is responsible for making it easy to run programs (even allowing you to seemingly run many at the same time), allowing programs to share a memory, enabling programs to interact with devices, and other fun stuff like that. That body of software is called the operating system (OS), as it is in charge of making sure the system operates correctly and efficiently in an easy-to-use manner.
file systems (cont.)
These system calls are routed to the part of the operating system called the file system, which then handles the requests and returns some kind of error code to the user. You might be wondering what the OS does in order to actually write to disk. We would show you but you'd have to promise to close your eyes first; it is that unpleasant. The file system has to do a fair bit of work: first figuring out where on disk this new data will reside, and then keeping track of it in various structures the file system maintains. Doing so requires issuing I/O requests to the underlying storage device, to either read existing structures or update (write) them. As anyone who has written a device driver knows, getting a device to do something on your behalf is an intricate and detailed process. It requires a deep knowledge of the low-level device interface and its exact semantics. Fortunately, the OS provides a standard and simple way to access devices through its system calls. Thus, the OS is sometimes seen as a standard library. For performance reasons, most file systems first delay such writes for a while, hoping to batch them into larger groups. To handle the problems of system crashes during writes, most file systems incorporate some kind of intricate write protocol, such as journaling or copy-on-write, carefully ordering writes to disk to ensure that if a failure occurs during the write sequence, the system can recover to reasonable state afterward. To make different common operations efficient, file systems employ many different data structures and access methods, from simple lists to complex b-trees.
time sharing counterpart: space sharing
Time sharing is a basic technique used by an OS to share a resource. By allowing the resource to be used for a little while by one entity, and then a little while by another, and so forth, the resource in question (e.g., the CPU, or a network link) can be shared by many. The counterpart of time sharing is space sharing, where a resource is divided (in space) among those who wish to use it. For example, disk space is naturally a space-shared resource; once a block is assigned to a file, it is normally not assigned to another file until the user deletes the original file.
machine state
To understand what constitutes a process, we thus have to understand its machine state: what a program can read or update when it is running. At any given time, what parts of the machine are important to the execution of this program?
This lesson teaches you how the OS handles the restricted operations in a process by shifting between kernel and user mode. THE CRUX: HOW TO PERFORM RESTRICTED OPERATIONS Process modes kernel mode Executing system calls TIP: USE PROTECTED CONTROL TRANSFER Special trap instructions trap, return-from-trap, trap table ASIDE: WHY SYSTEM CALLS LOOK LIKE PROCEDURE CALLS TIP: BE WARY OF USER INPUTS IN SECURE SYSTEMS system-call number privileged operation
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/problem-1-restricted-operations
In this lesson, let's discuss how the OS regains CPU control through different approaches when other processes are running and how context switching takes place! THE CRUX: HOW TO REGAIN CONTROL OF THE CPU A cooperative approach: wait for system calls yield A non-cooperative approach: the OS takes control THE CRUX: HOW TO GAIN CONTROL WITHOUT COOPERATION timer interrupt TIP: DEALING WITH APPLICATION MISBEHAVIOR Saving and restoring context context switch TIP: USE THE TIMER INTERRUPT TO REGAIN CONTROL TIP: REBOOT IS USEFUL
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/problem-2-switching-between-processes
This lesson briefly discusses the APIs of a process that are available in an operating system. Create Destroy Wait Miscellaneous control Status
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/process-api
This lesson will introduce you to other system calls to manage and control processes. Sending signals to a process process groups Giving control of processes to users SIGINT ASIDE: RTFM — READ THE MAN PAGES
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/process-control-and-users
This lesson teaches the creation of a process in a very comprehensive manner. Transforming a program into a process load, disk, executable format, eagerly, lazily Memory allocation heap I/O related setup file descriptors
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/process-creation-a-little-more-detail
In this lesson, you will learn about different states of a process and how a process changes from one state to another. The three states of a process running, ready, blocked Transitioning from one state to another scheduler
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/processes-states
Quiz- Virtualization: Direct Execution
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/quiz-on-direct-execution
Quiz on Process API
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/quiz-on-process-api
Quiz on Processes
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/quiz-on-processes
This lesson explains how to run the simulator which mimics some aspects of an operating system to help you solidify your understanding.
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/simulator-JQVy4Np98lo
OS History
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/some-history
Summary- Virtualization: Direct Execution ASIDE: HOW LONG CONTEXT SWITCHES TAKE ASIDE: KEY CPU VIRTUALIZATION TERMS (MECHANISMS)
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/summary-q2RrBA9oE33
Summary of Virtualization: Processes
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/summary-qAy7rz5zxkp
Summary -Virtualization: Process API ASIDE: KEY PROCESS API TERMS
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/summary-xlPx64RnL9z
How memory constitutes a process? Registers program counter (PC) TIP: SEPARATE POLICY AND MECHANISM I/O information
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/the-abstraction-a-process
The fork() System Call Running p1.c Non-deterministic output
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/the-fork-system-call
The wait() System Call Running p2.c
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/the-wait-system-call
Let's learn about some useful command-line tools in this lesson! Command-line tools MenuMeters ASIDE: THE SUPERUSER (ROOT)
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/useful-tools
Running multiple instances of mem.c
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/virtualizing-memory Now, we run multiple instances of this same program again to see what happens. We see from the example that each running program has allocated memory at the same address (0x200000) and yet, each seems to be updating the value at 0x200000 independently! It is as if each running program has its own private memory, instead of sharing the same physical memory with other running programs.
mem.c
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/virtualizing-memory The program does a couple of things. First, it allocates some memory (line a1). Then, it prints out the address of the memory (a2), and then puts the number zero (0 is passed to the program as argv[1]) into the first slot of the newly-allocated memory (a3). Finally, it loops, delaying for a second and incrementing the value stored at the address held in p. With every print statement, it also prints out what is called the process identifier (the PID) of the running program. This PID is unique per running process.
This lesson speculates how the OS handles multiple interrupts concurrently. Disabling interrupts Locking schemes
https://www.educative.io/courses/operating-systems-virtualization-concurrency-persistence/worried-about-concurrency
The definition of a process
it is a running program. The program itself is a lifeless thing: it just sits there on the disk, a bunch of instructions (and maybe some static data), waiting to spring into action. It is the operating system that takes these bytes and gets them running, transforming the program into something useful. It turns out that one often wants to run more than one program at once; for example, consider your desktop or laptop where you might like to run a web browser, mail program, a game, a music player, and so forth. In fact, a typical system may be seemingly running tens or even hundreds of processes at the same time. Doing so makes the system easy to use, as one never needs to be concerned with whether a CPU is available; one simply runs programs.
malloc()
program that allocates some memory
some idea of what the OS actually does
so, now you have some idea of what an OS actually does: it takes physical resources, such as a CPU, memory, or disk, and virtualizes them. It handles tough and tricky issues related to concurrency. And it stores files persistently, thus making them safe over the long-term
