Distributed systems intro

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is the reason for growing of the importance of distributed systems?

- Geographically distributed environment: First, in many situations, the computing environment itself is geographically distributed. As an example, consider a banking network. Each bank is supposed to maintain the accounts of its customers. In addition, banks communicate with one another to monitor interbank transactions or record fund transfers from geographically dispersed automated teller machines (ATMs). Another common example of a geographically distributed computing environment is the Internet, which has deeply influenced our way of life. The mobility of the users has added a new dimension to the geographic distribution. - Speed up: Second, there is the need for speeding up computation. The speed of computation in traditional uniprocessors is fast approaching the physical limit. While multicore, superscalar, and very large instruction word (VLIW) processors stretch the limit by introducing parallelism at the architectural level, the techniques do not scale well beyond a certain level. An alternative technique of deriving more computational power is to use multiple processors. Dividing a total problem into smaller subproblems and assigning these subproblems to separate physical processors that can operate concurrently are potentially an attractive method of enhancing the speed of computation. Moreover, this approach promotes better scalability, where the users or administrators can incrementally increase the computational power by purchasing additional processing elements or resources. This concept is extensively used by the social networking sites for the concurrent upload and download of the photos and videos of millions of customers. - Resource sharing: Third, there is a need for resource sharing. Here, the term resource represents both hardware and software resources. The user of computer A may want to use a fancy laser printer connected with computer B, or the user of computer B may need some extra disk space available with computer C for storing a large file. In a network of workstations, workstation A may want to use the idle computing powers of workstations B and C to enhance the speed of a particular computation. Through Google Docs, Google lets you share their software for word processing, spreadsheet application, and presentation creation without anything else on your machine. Cloud computing essentially outsources the computing infrastructure of a user or an organization to data centers—these centers allow thousands of their clients to share their computing resources through the Internet for efficient computing at an affordable cost. - Fault tolerance: Fourth, powerful uniprocessors or computing systems built around a single central node are prone to a complete collapse when the processor fails. Many users consider this to be risky. Distributed systems have the potential to remedy this by using appropriate fault-tolerance techniques—when a fraction of the processors fail, the remaining processes take over the tasks of the failed processors and keep the application running. For example, in a system having triple modular redundancy (TMR), three identical functional units are used to perform the same computation, and the correct result is determined by a majority vote. In many fault-tolerant distributed systems, processors routinely check one another at predefined intervals, allowing for automatic failure detection, diagnosis, and eventual recovery. Some users of noncritical applications are willing to compromise with a partial degradation in system performance when a failure cripples a fraction of the processing elements or the communication links of a distributed system. This is the essence of graceful degradation. A distributed system thus provides an excellent opportunity for incorporating fault tolerance and graceful degradation.

What are some of the important issues in the study of the computational models of distributed systems?

- Knowledge of a process: The knowledge of a process is local. No process is ordinarily expected to have global knowledge about either the network topology or the global state. Each process thus has a myopic view of the system. It is fair to expect that a process knows (1) its own identity, (2) its own state, and (3) the identity of its immediate neighbors. In some special cases, a process may also have exact or approximate knowledge about the size (i.e., the number of nodes) of the network. Any other knowledge that a process might need has to be acquired from time to time through appropriate algorithmic actions. - Network topology: A network of processes may either be completely connected or sparsely connected. In a completely connected network, a channel (also called a link) exists between every pair of processes in the system. This condition does not hold for a sparsely connected topology. As a result, message routing is an important activity. A link between a pair of processes may be unidirectional or bidirectional. Examples of sparse topologies are trees, rings, arrays, and hypercubes. - Degree of synchronization: Some of the deeper issues in distributed systems center around the notion of synchrony and asynchrony. Assume that each process in a distributed system has a local clock. If these clocks represent the UTC (static differences due to time zones can be easily taken care of and ignored from this equation), then every process has a common notion of time, and the system can exhibit synchronous behavior by the simultaneous scheduling of their actions. Unfortunately, in practical distributed systems, this is difficult to achieve, since the drift of the local physical clocks is a fact of life. One approach to handle this is to use a time server that keeps all the local clocks synchronized with one another. This is good enough for some applications, but not all. The concept of a synchronous system has evolved over many years. There are many facets of synchrony. One is the existence of an upper bound on the propagation delay of messages. If the message sent by process A is not received by process B within the expected interval of real time, then process B suspects some kind of failure. Another feature of a synchronous system is the first-in-first-out (FIFO) behavior of the channels connecting the processes. With these various possibilities, it seems prudent to use the attribute synchronous to separately characterize the behaviors of clocks, or communication, or channels. In a fully asynchronous system, not only there is clock drift, but also there is no upper bound on the message propagation delays. Processes can be arbitrarily slow, and out-of-order message delivery between any pair of processes is considered feasible. In other words, such systems may completely disregard the rule of time, and processes schedule events at an arbitrary pace. The properties of a distributed system depend on the type of synchrony. Results about one system often completely fall apart when assumptions about synchrony change from synchronous to asynchronous. - Failures: The handling of failures is an important area of study in distributed systems. A failure occurs when a system as a whole or one or more of its components do not behave according to their specifications. Numerous failure models have been studied. The most common failure model is crash, where a process ceases to produce any output. In another case of failure, a process does not stop but simply fails to send one or more messages or execute one or more steps. This is called omission failure. This includes the case when a message is sent but lost in transit. Sometimes, the failure of a process or a link may alter the topology by partitioning the network into disjoint subnetworks. In the byzantine failure model, a process may behave in a completely arbitrary manner—for example, it can send inconsistent or conflicting message to its neighboring processes—or may execute a program that is different from the designated one. - Scalability: An implementation of a distributed system is considered scalable when its performance is not impaired regardless of the final scale of the system. The need for additional resources to cope with the increased scale should be manageable. Scalability is an important issue since many distributed systems have witnessed tremendous growth in size over the past decade—it is quite common for current social networks to have millions of registered users. Scalability suffers when the resource requirement grows alarmingly with the scale of the system. Some systems deliver the expected performance when the number of nodes is small, but fail to deliver when the number of nodes increases. From an algorithmic perspective, the scalability is excellent when the space or time complexity of a distributed algorithm is O(log n) or lower, where n is the number of processes in the system—however, when it is O(n) or higher, the scalability is considered poor. Well-designed distributed systems usually exhibit good scalability.

In the link register model, each link or channel is a single-reader single-writer register. The sender writes into this register, and the receiver reads from that register. To avoid additional complications, the link register model also assumes that all read and write operations on a link register are serialized, that is, write operations never overlap with read operations. A bidirectional link (represented by an undirected edge) consists of a pair of link registers.

How does link register model work?

In the state-reading model, any process can read, in addition to its own state, the state of each of its neighbors from which a channel is incident on it. However, such a process can update only its own state.

How does state-reading model work?

What are channel?

Messages propagate along directed edges called channels. Communications are assumed to be point to point—a multicast is a set of point-to-point messages originating from a designated sender process. Channels may be reliable or unreliable. In a reliable channel, the loss or corruption of messages is ruled out.

How to represent a distributed system?

Represent a distributed system by a graph G = (V, E), where V is a set of nodes and E is a set of edges joining pairs of nodes. Each node is a sequential process, and each edge corresponds to a communication channel between a pair of processes. Unless otherwise stated, we assume the graph to be directed—an edge from node i to node j will be represented by the ordered pair (i, j). An undirected edge between a pair of processes i, j is equivalent to a pair of directed edges, one from i to j and the other from j to i.

How many classes are there for the actions by a node?

The actions by a node can be divided into the following four classes: - Internal action: An action is an internal action when a process performs computations in its own address space resulting in the modification of one or more of its local variables. - Communication action: An action is a communication action when a process sends a message to another process or receives a message from another process. - Input action: An action is an input action when a process reads data from sources external to the system. For example, in a process control system, one or more processes can input the values of environmental parameters monitored by sensors. These data can potentially influence the operation of the system under consideration. - Output action: An action is an output action when it controls operations that are external to the system. An example is the setting of a flag or raising an alarm. For a given system, the part of the universe external to it is called its environment.

What is another dimension of variability in distributed systems?

Another dimension of variability in distributed systems is synchrony and asynchrony.

What are three axiom for a class of reliable channels?

Axiom 1 Every message sent by a sender is received by the receiver, and every message received by a receiver is sent by some sender in the system. Axiom 2 Each message has an arbitrary but finite, nonzero propagation delay. Axiom 3 Each channel is a FIFO channel. Thus, if x and y are two messages sent by one process P to another process Q and x is sent before y, then x is also received by Q before y.

What are some of common subproblems in distributed systems?

- Leader election: When a number of processes cooperate with one another for solving a problem, many implementations prefer to elect one of them as the leader and the remaining processes as followers. The leader assumes the role of a coordinator and runs a program that is different from that of the followers. If the leader crashes, then one of the followers is elected the leader, after which the system runs as usual. - Mutual exclusion: Access to certain hardware resources is restricted to one process at a time: an example is a printer. There are also software resources where concurrent accesses run the risk of producing inconsistent results: for example, multiple processes are not ordinarily allowed to update a shared data structure. Mutual exclusion guarantees that at most one process acquires the resource or performs a critical operation on a shared data at any time and concurrent access attempts to such resources are serialized. - Time synchronization: Local clocks invariably drift and need periodic resynchronization to support a common notion of time across the entire distributed system. - Global state: The global state of a distributed system consists of the local states of its component processes. Any computation that needs to compute the global state at a given time has to read the local states of every component process at that time. However, given the facts that local clocks are never perfectly synchronized and message propagation delays are finite, computation of the global state is a nontrivial problem. Multicasting: Sending of a given data to multiple processes in a distributed system is a common subtask in many applications. As an example, in group communication, one may want to send some breaking news to millions of members as quickly as possible. The important issues here are efficiency, reliability, and scalability. Replica management: To support fault tolerance and improve system availability, the use of process replicas is quite common. When the main server is down, one of the replica servers replaces the main server. Data replication (also known as caching) is widely used for conserving system bandwidth. However, replication requires that the replicas be appropriately updated. Since such updates can never be instantaneously done, it leaves open the possibility of inconsistent replicas. How to update the replicas and what kind of response can a client expect from these replicas? Are there different notions of consistency in replica management? How are these related to the cost of the update operation?

What are characteristics of distributed systems?

- Multiple processes: The system consists of more than one sequential process. These processes can be either system or user processes, but each process should have an independent thread of control—either explicit or implicit. - Interprocess communication: Processes communicate with one another using messages that take a finite time to travel from one process to another. The actual nature or order of the delay will depend on the physical characteristics of the message links. These message links are also called channels. - Disjoint address spaces: Processes have disjoint address spaces. We will thus not take into account a shared-memory multiprocessor as a true representation of a distributed computing system. Note that programmers often represent interprocess communication using shared-memory primitives, but the abstraction of shared memory can be implemented using messages. - Collective goal: Processes must interact with one another to meet a common goal. Consider two processes P and Q in a network of processes. If P computes f(x) = x^2 for a given set of values of x, and Q multiplies a set of numbers by π, then we hesitate to call (P, Q) a distributed system, since there is no interaction between P and Q. However, if P and Q cooperate with one another to compute the areas of a set of circles of radius x, then (P, Q) collectively represent a meaningful distributed system. Similarly, if a set of sellers advertise the cost of their products, and a set of buyers post the list of the goods that they are interested in buying as well as the prices they are willing to pay, then individually, neither the buyers nor the sellers are meaningful distributed systems, but when they are coupled into an auction system through the Internet, then it becomes a meaningful distributed system.

What are some of the behaviors characterizing a synchronous system?

- Synchronous clocks: In a system with synchronous clocks, the local clocks of every processor show the same time. The readings of a set of independent clocks tend to drift, and the difference grows over time. Even with atomic clocks, drifts are possible, although the extent of this drift is much smaller than that between clocks designed with ordinary electrical components. For less stringent applications, the domestic power supply companies closely mimic this standard, where a second is equal to the time for 60 oscillations of the alternating voltage (or 50 oscillations according to the European standard and also in Australian and Asian countries) entering our premises. Since clocks can never be perfectly synchronized, a weaker notion of synchronized clocks is that the drift rate of local clocks from real time has a known upper bound. - Synchronous processes: A system of synchronous processes takes actions in lockstep synchrony, that is, in each step, all processes execute an eligible action. In real life, however, every process running on a processor frequently stumbles because of interrupts. As a result, interrupt service routines introduce arbitrary amounts of delay between the executions of two consecutive instructions, making it appear to the outside world as if instruction execution speeds are unpredictable with no obvious lower bound. Using a somewhat different characterization, a process is sometimes called synchronous, when there is a known lower bound of its instruction execution speed. Even when processes are asynchronous, computations sometimes progress in phases or in rounds—in each phase or round, every process does a predefined amount of work, and no process starts the (i + 1)th phase until all processes have completed their ith phases. The implementation of such a phase-synchronous or round-synchronous behavior requires the use of an appropriate phase synchronization protocol. - Synchronous channels: A channel is called synchronous when there is a known upper bound on the message propagation delay along that channel. Such channels are also known as bounded-delay channels. - Synchronous message order: The message order is synchronous when the receiver process receives messages in the same order in which sender process sent them. - Synchronous communication: In synchronous communication, a sender sends a message only when the receiver is ready to receive it and vice versa. When the communication is asynchronous, there is no coordination between the sender and the receiver. The sender of message number i does not care whether the previous message (i − 1) sent by it has already been received. This type of send operation is also known as a nonblocking send operation. In a blocking send, a message is sent only when the receiver signals its readiness to receive the message. Like send, receive operations can also be blocking or nonblocking. In blocking receive, a process waits indefinitely long to receive a message that it is expecting from a sender. If the receive operation is nonblocking, then a process moves to the next task in case the expected message does not (yet) arrive and attempts to receive it later. Synchronous communication involves a form of handshaking between the sender and the receiver processes. In real life, postal or email communication is a form of asynchronous communication, whereas a telephone conversation is a great example of a synchronous communication.

What are the most common problems in distributed systems?

- Time

What are two primary models that capture the essence of interprocess communication?

1. message-passing 2. shared-memory

What are two variations of the shared-memory model used in distributed algorithms?

1. the state-reading model (also known as the locally shared variable model) 2. the link register model


Conjuntos de estudio relacionados

12.5 Mobile Device Network Connectivity

View Set

nonmarketable us gov securities unit 2

View Set

psych chapter 15 quiz (psychological disorders)

View Set

DIT Microsoft Excel Test Review: Projects 1-10 GMetrix

View Set

Bio 122 Chapter 11 Study Guide: Cell Communication

View Set