Distributed Systems Implementation
node
Any computer or switching device attached to a network
communication subsystem
Collection of hardware and software components that provide the communication facilities for a distributed system.
concurrency
concurrent program execution
Distributed systems are undergoing a period of significant change as a result of:
the emergence of pervasive networking technology, the emergence of omnipresent computing coupled with the desire to support user mobility in distributed systems, the increase demand for multimedia services, the view of distributed systems as a utility
Denial of Service (DoS)
A form of attack in which the enemy interferes with the activities of authorized users by making excessive and pointless invocations on services or message transmission on a network resulting in overloading of physical resources.
threats to processes
A process handling incoming requests may not be able to identify the sender. Protocols like IP do include address of the source computer in each message, but it is not difficult for an enemy to generate a message with a forged source address. Lack of reliable information is a threat to both Servers (genuine request) and Clients (genuine result).
process
A process is one instance of a program while it is executing. A program under execution in a virtual process. The virtual address space may or map not contain one or more threads
internet
A single communication subsystem providing communication between all of the hosts that are connected to it.
subnet
A unit of routing. It is a collection of nodes that can all be reached on the same physical network. An internet consists of many subnets.
threats to communication channels
An enemy can copy, alter or inject messages as they travel across the network and its intervening gateways. Such attacks present a threat to the privacy and integrity of information as it travels over the network and to the integrity of the system. Ex: Your email message being revealed to your friend (could even be altered and revealed).
sockets and ports
Both UDP and TCP use the socket abstraction. The socket abstraction provides an endpoint for communication between sources. Inter-process communication consists of transmitting a message between a socket in one process and a socket in another process For a process to receive messages its socket must be bound to a local port and one of the internet address of the computer on which it runs. Messages sent to a particular Internet address and port number can only be received by a process whose socket is associated with that port number and internet address.
COBRA (Common Object Request Broker Architecture)
COBRA is a middleware design. COBRA allows the application process to communicate without restriction to programming languages, hardware platforms, software platforms, networks they communicate over.
hosts
Computers and other devices that use the network for communication purposes
cryptography
Cryptography is the science of keeping messages secure and encryption is the process of scrambling a message in such a way as to hide its contents. Modern cryptography uses encryption algorithm.
Remote Procedure Design Issues
Design issues important to RPC are: Style of programming promoted by RPC - Programming with interfaces Call semantics associated with RPC Transparency and how it relates to remote procedure calls
masking failures
Distributed systems consists of components. It is possible to construct Distributed systems from components that exhibit failures. For example mirrors can continue to provide services in the event of one of them crashing. A knowledge of the failure characteristics can enable a new service to be designed to mask failure of the components on which it depends. A services masks a failure either by hiding it altogether or by converting it into a more acceptable type of failure.
secure channel
Encryption and Authentication are used to build secure channels as a service layer on top of existing communication services. A secure channel is a communication channel connecting a pair of processes each of which acts on behalf of a principal
linux command: #gcc
GNU project C and C++ compiler
HTTP
HTTP is implemented over TCP. In the original version of the protocol each client-server interaction consisted of the following steps: 1.The client requests and the server accepts a connection at the default server port or at a port specified in the URL. 2.The client sends a request message to the server 3.The server sends a reply message to the client 4.The connection is closed However establishing and closing a connection for each request-reply exchange is expensive. This also overloads the server and causes network traffic.
reliability
In distributed systems when errors do occur they are usually due to failures in the software at the sender and receiver or buffer overflow. These need to be taken care of to ensure reliability.
linux command: #ls
It lists the files and folders in the directory you specify
networking performance
Message Transmission Time = Latency + Length / Data Transfer Rate
mobile code
Mobile code raises new and interesting security problems for any process that receives and executes program code from elsewhere
multicasting
Most communications on distributed Systems are between pairs of processes (client server). However at times there is a need for one-to-many communication. For such requirement many network technologies support the simultaneous transmission of messages to several recipients.
characteristics of a distributed system
No global clock, concurrency, independent failures
Quality of Service (QoS)
QoS is the ability to meet deadlines when transmitting and processing streams of real-time multimedia data. Applications that transmit multimedia data impose major new requirements on computer network. They demand guaranteed bandwidth and bounded latencies for the communication channels they use. This is a huge requirement on distributed systems.
Remote Method Invocation (RMI)
RMI is closely related to RPC but extended into the world of distributed objects. In RMI a calling object can invoke a method in a potentially remote object. The underlying details are generally hidden from the user.
Remote Procedure Call (RPC)
RPC makes the programming of distributed system look similar, if not identical, to conventional programming. RPC allows for a high level of distributed transparency. RPC extends the abstraction of a procedure call to distributed environments. In RPC, procedures on remote machines can be called as if they are procedures in the local address space.
receive from any
Receive from any: The receive method does not specify an origin for messages. Instead an invocation of receive gets a message addressed to its socket from any origin. The receive method returns the Internet address and local port of the sender, allowing the recipient to check where the message came from.
message size
Receiving process needs to specify an array of bytes of a particular size in which to receive a message. If the message is too long for the array the message is truncated on arrival.
blocking
Sockets normally provide non-blocking sends and blocking receives for datagram communication (a non-blocking receive is an option in some datagram implementations). The send operation returns when it has handed the message to the underlying UDP and IP. It is then the responsibility of the UDP and IP to transmit the message to the destination. On arrival message is placed in a queue for the socket that is bound to the destination port. The message is collected by a receive invocation on that socket. Message get discarded if no process has a socket bound to the destination port.
data transfer rate
Speed at which data can be transferred between two computers in the network once transmission has begun. Base Unit: bits/second. This is determined by the networks physical characteristics.
TCP/IP (Transmission Control Protocol/Internet Protocol)
TCP/IP - Transmission Control Protocol/ Internet Protocol Widespread adoption of TCP/IP has resulted in more than 60 million hosts. Many application services and application-level protocols now exists based on TCP/IP. TCP is a transport protocol. It can be used to support application directly or additional protocols can be layered on it to provide additional features.
latency
The delay that occurs after a send operation is executed and before data starts to arrive at the destination computer. This is determined primarily by software overheads, routing delays, and a load dependent statistical element. It is the time required to transfer an empty message. For this course we will only consider network latency.
scalability
The number of hosts computers and web servers in the internet are growing everyday and difficult to keep track of. The potential future size of the internet is commensurate with the population of the planet which could include several billion nodes and hundreds of millions of active hosts. The purpose for which the internet was build was not expected to handle such loads. But so far so good. Work in progress are in - addressing and routing mechanisms. Future traffic is expected to grow at least proportion to the number of active users. The ability of the internet's infrastructure to cope with this growth will depend upon the economics of use
timeouts
The receive that blocks forever is suitable for use by a server that is waiting to receive requests from its clients. However in some cases it is not possible for a process that has invoked a receive operation to wait indefinitely as the sending process may have had issues (crashed, lost message). For such situations timeout can be set on sockets. Timeout value is normally hard to predict but it is fairly greater than the message transmit time.
internet protocols
The success of TCP/IP is based on the protocols' independence from the underlying transmission technology, enabling internetworks to be built up from many heterogeneous networks. Users and application programs perceive a single virtual network supporting TCP and UDP Implementers of TCP and UDP see a single virtual IP network hiding the diversity of the underlying transmission media.
authentication
The use of shared secrets and encryption provides the basis for the authentication of messages.
timing failures
This describes a synchronous distributed systems failure where time limits are set on process execution time, message delivery time and clock drift rate.
Omission Failures
This failure refers to cases when a process or communication fails to perform actions that it is supposed to do
XML
This is concerned with a textual format for representing data structure. It is used for textual self-describing structure data (Web documents) and messages exchanged by clients and servers in web services.
COBRA
This is concerned with an external representation for the structured and primitive types that can be passed as the arguments and result of remote method invocation in CORBA. This can be used by a variety of programming languages.
java's object serialization
This is concerned with the flattening and external data representation of any single object or tree of objects that may need to be transmitted in a message or stored in a disk. This is for use only by Java
getRequest
This method is used by a server process to acquire service requests.
doOperation
This method is used by clients to invoke remote operations. The arguments for this primitive specify the remote server, operation to invoke, and additional arguments required by the operation.
sendReply
When the server has invoked the specified operation it then uses SendReply to send the reply message to the client.
Physical Model
a representation of the underlining hardware elements of a distributed system that abstracts away from specific details of the computer and networking technologies employed
fabrication
a situation in which addition data or actively is generated that would normally exist
interception
a situation in which an unauthorized party has gained access to a service or data
interruption
a situation in which service or data become unavailable, unusable, destroyed
modification
a situation in which unauthorized changing of data or tampering with a service it no longer can use
leakage
acquisition of information by unauthorized recipients
peer-to-peer (P2P)
all of the processes involved in a task or activity play similar roles interacting cooperatively as peers, the participating process run the same programs and offer the same interfaces
Mobility transparency
allows the movement of resources and clients within a system without affecting the operation of users or programs
scaling transparency
allows the system and applications to expand in scale without change to the system structure or the application algorithms
performance transparency
allows the systems to be reconfigured to improve performance as loads vary
linux command: #mkdir
allows the user to create directories
linux command: #mv
allows the user to move or rename files
independent failures
computer systems will/can fail. The design needs to support failure
network virtualization
concerned with the construction of many different virtual networks over an existing network for example the Internet. Each virtual network can be designed to support a particular distributed application.
linux command: fork()
creates a process with an exceptional environment copied from the caller
multithreaded
describes a program that is designed to have parts of its code execute concurrently
reflection
designed to support both introspection (the dynamic discovery of properties of the system) and intercession (the ability to dynamically modify structure or behavior)
brokerage
designed to support interoperability in potentially complex distributed infrastructures
proxy pattern
designed to support location transparency in remote procedure calls (RPC)
Distributed multimedia systems
distributed systems should be able to support a range of media types in an integrated manner (ex. storage, transmission and presentation of discrete media types like pictures and text messages) Distributed Multimedia Systems should be able to perform similar functions for continuous media types like audio and video (ex. store and locate, transmit, support the presentation and share with different users groups)
three generations of distributed systems
early distributed systems, internet-scale distributed systems, contemporaneity distributed systems
Access Transparency
enables local and remote resources to be accessed using identical operations
replication transparency
enables multiple instance of resources to be used to increase reliability and performance without knowledge of replicas by users or application programmers
location transparency
enables resources to be accessed without knowledge of their physical or network location
Concurrency Transparency
enables several process to operated concurrently using shared resource without interference between them
failure transparency
enables the concealment of faults, allowing users and application programs to complete their task despite the failure of hardware or software components
client server (centralized)
historically the most important and widely employed. client process interact with individual server processes
vandalism
interference with the proper operation of a system without gain to the perpetrator
linux command: chmod
it stands for change modes because the security characters are called the mode of the file
linux command: #ssh
make a connection to a remote Linux computer and log into your account
middleware
middleware is represented by processes or objects in a set of computers that interact with each other to implement communication and resource sharing support for distributed applications. Provides useful building block for the construction of software that can work with one another in a distributed system
Process omission failures
process crashes. To ensure services we need systems that crash cleanly. either function correctly or stops. This failure is recognized by other processes based on timeouts. also known as fail stops, if other processes are able to detect the fault
asynchronous Distributed systems
process execution speeds. message transmission delays. clock drift rate.
Indirect communication
senders do not need to know who they are sending to. Senders and receivers do not need to exist at the same time. Key techniques include: group communication, publish subscribe systems, message queues, tuple spaces, and distributed shared memory.
linux command: #cat
stands for concatenation it allows the user to create files
call by value
the actual value. when you call a method, the method sees a copy of any primitives passed to it. This any changes it makes to those values have no effect on the caller's variables
thread
the basic unit of program execution. A process can have several threads running concurrently, each performing a different job, such as waiting for events or performing a time-consuming job that the program doesn't need to complete before going on. When a thread has finished its job, the thread is suspended or destroyed
skeleton
the class of a remote object has a skeleton which implements the method in the remote interface
dispatcher
the dispatcher recievers messages frin the communication modules, it uses the operatrionID to select the appropriate method in the skeleton, passing on the request message. A server has one dispatcher and one skeleton for each class representing a remote object. The dispatcher and the proxy use the same allocation of operationsID to the method of the remote interface
ubiquitous computing
the harnessing of many small, cheap computational devices that are present in users' physical environments (ex. at home office and even natural settings)
kernel
the kernel is a program that is distinguished by the facts that it remains loaded from system initialization and its code is executed with complete access privileges for the physical resources on its host computer. The kernel in particular can control the memory management unit and set the process registers so that no other code may access the machine's physical resources except in acceptable ways
Unmarshalling
the process of disassembling them on arrival to produce equivalent collection of data items at the destination. consists of the generation of primitive values from their external data representation and the rebuilding of the data structures.
marshalling
the process of taking a collection of data items and assembling them into a form suitable for transmission in a message. consists of the translation of structured data items and primitive values into an external data representation.
synchronous distributed system
the time to execute each step of a process has a known lower and upper bound. Each message transmitted over a channel is received within a known bound time. Each process has a local clock whose drift rate from real time has a known bound
no global clock
there is no single notion of the correct time. When programs need to cooperate they coordinate their actions by passing messages
thin clients
thin clients led to the emergence of virtual network computing (VNC). VNC provides remote access to the graphical user's interface. A VNC client (viewer) interacts with a VNC server through a VNC protocol. VNC has suspended network computers. VNC has proven to be a more flexible solution and now dominates the marketplace
Interprocess Communication
this is a low level communication support (ex. message-passing primitives, direct access to API via socket programming, and multicast communication)
remote invocation
this is the most common communication paradigm in distributed systems. Relies on a two way exchange between communicating entities resulting in calling of a remote operation, procedures or method
auditing
this is used for tracing client activities
proxy
to make a remote invocation transparent to client by behaving like a local object to the invoker; but instead of executing an invocation. it forwards it in a message to a remote object
encryption
transforms data into something an attacker cannot understand. Implements confidentially. Provides support for integrity checks
tampering
unauthorized alteration of information
Communication omissions failure
when message are not transported by the communication channel from the outgoing message buffer to the incoming message buffer then we say a communication failure has occurred. also known as dropping messages, this is caused by lack of buffer space at the receiver, intervening gateway, or network transmission error
distributed computing as a utility
with maturity of distributed systems infrastructure companies are promoting the view of distributed resources as a commodity or utility. With this model, resources are provided by service providers and resources are rented (cloud)