Distributed Systems

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Security

- 3 Aspects: -> Confidentiality (protection against disclosure to unauthorised individuals) -> Integrity (protection against alteration and corruption) -> Availability (protection against interference with means of access) Security Mechanisms - Encryption - Authentication - Authorisation Security challenges: - Denial of service attacks - Security against mobile code

Heterogeneity

- DS use hardware and software resources of varying characteristics -> networks -> computer hardware -> operating systems -> programming languages -> implementation by different developers Approaches to solve: - using standard protocols - using agreed message formats and data types - adhering to an agreed upon application program interface (APIs) - using middleware - portable code (can use VM's to run on any OS)

Distributed System Challenges

- Heterogeneity - Openness - Security - Scalability - Failure Handling - Concurrency - Transparency

Authentication with public keys

- Sara is a trusted key distribution service - Alice asks sara for bob's private key - alice returns a public key certificate --> bobs public key, bob, keyname ALL signed with sara's private key - Alice trusts sara and knows her public key, decrypts it using that to obtain bob's public key - Alice can then create a private shared key and send it to bob, encrypted with his public key - they can then communicate using shared secret key ISSUES -> how to get sara's public key??

Variations of interaction models

- Synchronous system model, assumes bounds on: -> the time to execute each step of a process -> message transmission delay -> local clock drift rate - Asynchronous system model, assumed no bound on: - process execution speed - message transmission delays - clock drift rates

Computer networks

- The internet - Intranets - Wireless networks

Invocation performance

- User space procedure - quick - system call - little bit more for domain transition - interprocess on same host - domain transition to kernel, then to other process, then back to kernel, then back to original process - interprocess on remote host (RMI) - same overhead as previous, plus overhead of network communication between two kernels

Why shared memory is the fastest

- don't need to copy into kernel space

Persistent asynchronous invocations

- placed in a queue at the client and attempts are made to complete the request as the client roams from network to network -> flaky connections

Intranets

- portion of the Internet that is separately administered by organisations - features include: -> a boundary that can be configured to enforce local security policies -> several local area connections (LAN's) linked by backbone connections -> A connection to the Internet via a route allowing users within the intranet to access services on the Internet -> firewalls to protect an intranet by preventing unauthorised messages leaving or entering by filtering incoming/outgoing messages

Publish/Subscribe Systems

- system where publishers publish structured events to an event service and subscribers express interest in events - heterogeneous - asynchronous

Distributed OS

- tries to abstract the network from the user, thereby removing the need for the suer to specify how the networking commands and operations should be undertaken - referred to as single system image

Two properties of reliable communication

- validity: any message in outgoing buffer is eventually delivered to incoming message buffer - integrity: the message is identical to the one sent, and no messages are delivered twice

Hypervisor

- virtualising the hardware for the OS that sits above it

Kernel and Server Processes

kernel = part os OS which assumes full access to host's resources, and shares access to all other processes that are executing on the host - encapsulate resources on the host by providing a useful service interface for clients - hides internal operations - protect resources for illegitimate access. - concurrently process client requests, so that all clients receive service

Supervisor mode vs user mode

supervisor mode = instructions that execute while the processor is in supervisor/privileged mode are capable of accessing and controlling every resource on the host user mode = instructions that execute while the processor is in user/unprivileged mode are restricted, by the processor, to only those accesses defined or granted by the kernel defined by a register on the processors user process accesses kernel resource using a system call, which switches it to supervisor mode and gives control to the kernel

Transfer policy vs location policy

transfer policy = determines whether the new process is allocated locally or remotely location policy = determines which host, from a set of given hosts, the new process should be allocated on. may be static or adaptive, static policies do not take into account the current state of the distributed system policies can take into account relative load, interprocess communication, host architectures and specialised resources that processes may require

Core OS Components

1) Process Manager - Handles the creation of processes 2) Thread Manager - Handles the creation, synchronisation and scheduling of threads 3) Communication Manager - Handles interprocess communication 4) Memory Manager - Handles the allocation and access to physical and virtual memory 5) Supervisor - Handles privileged operations

Addressing security threats

- cryptography and shared secrets: encryption is the process of scrambling messages - authentication: providing identities of users - secure channel: encryption and authentication are used to build secure channels as a service layer on top of an existing communication channel. A secure channel is a communication channel connecting a pair of processes on behalf of its principles

External Data representation and marshalling

- data sutructures in programs are flattened into a sequence of bytes before transmission - converted into an agreed external format before transmission - external data representation: agreed standard for representing data structures and primitive data - marshalling: process of converting data to form - unmarshalling: process of disassembling the data at the receiver External Data Representation: - CORBA's - Java serialization - XML - JSON

Certificate definition

- digital certificate is a document containing a statement that is signed by a principal - for this to be useful, principles public key must be known - principle also known as CA

DSM

- distributed shared memory - abstraction for memory sharing when they don't physically share memory - processes access DSM via reads and updates Limited scalability Time uncoupled Space uncoupled

Leakage, Tampering and Vandelism

Leakage - the acquisition of information by unauthorised recipients Tampering - the unauthorised alteration of information Vandalism - Interference with the proper operation of a system without gain to perpetrator

Address space

- each process will have a virtual address space - virtual address space can be divided into regions that are contiguous and do not overlap - a paged virtual memory scheme divides it into fixed sized blocks that are either located in RAM or located in swap space on the hard disk - a page table is used by the processor and operating system to map virtual addresses to real addresses - page table also contains access control bits for each page that determine the access privileges of the process on a per page basis - OS manages swapping pages in and out

Remote Procedure Call (RPC)

- enable clients to execute procedures in a server processes based on a defined service interface Client - program calls method locally on client stub procedure - then marshals it and passes it to communication module, which handles request reply - synchronous

WWW 3 components

- HyperText Markup Language (HTML) - specifies content and layout - Uniform Resource Locators (URLs) - identify resources stored on the web - HyperText Transfer Protocol (HTTP) - used for transferring resources between web servers and clients (browsers)

Invocation Semantics

- Maybe invocation semantics -> remote procedure call may be executed once or not at all. Unless the caller receives a result, it is unknown as to whether the remote procedure was called Retransmit: No Duplicate filtering: NA Re-execute procedure or retransmit reply: NA - At-least-once -> either the remote procedure was executed at least once, and the caller received a response, or the caller received an exception to indicate the remote procedure was not executed at all Retransmit: Yes Duplicate filtering: No Re-execute procedure or retransmit reply: Re-execute procedure - At-most once -> the remote procedure call was either executed exactly once, in which case the caller received a response, or it was not executed at all and the caller receives an exception Retransmit: Yes Duplicate filtering: Yes Re-execute procedure or retransmit reply: Retransmit reply

Difference between computer networks and distributed systems

- a computer network is a collection of spatially separated , interconnected computers that exchange messages based on specific protocols - a distributed system is multiple computers on the network working together as a system

Group communication

- a multicast operation allows group communication - sending a single message to number of processes identified as a group - can happen with or without guarantees of delivery - uses: -> fault tolerance based on replicated services -> finding discovery servers -> better performance through replicated data -> propagation of event notification

Information leakage

- about learning things about the system - not getting real data, but can infer from messages that a particular activity is taking place - or looking at size of information to work out what sort of information it is

Transparency

- access transparency - location transparency - concurrency transparency - replication transparency - failure transparency - mobility transparency - performance transparency - scaling transparency

Security Model

- achieved by securing processes, communication channels and protecting objects they encapsulate against unauthorised access - can use access rights - enemy is one capable of sending message to any process or reading messages in the network Threats from an enemy - threats to process: servers/clients cannot be sure about the source of the message - threats to communication channels: enemy can copy, alter or inject messages - denial or service attacks: overloading the server or otherwise triggering excessive delays to the service - mobile code: performs operations that corrupt the server or service

Certificate chain

- alice wants bob's public key - Carol has signed a certificate attesting to Bob's public key and Sara has signed a certificate attesting to Carol's public key - if alice trusts Carol's certificate then she can authenticate Bob's identity - if alice doesn't trust Carol, she must first authenticate carol using Sara's certificate - revoking a certificate is usually by using predefined expiry dates - eventually get to the root certificate, where the certificate is signed by its own public key -> just have to trust it "self signed"

Dynamic Pages

- allow users to interact with resources by taking user input, executing programs and returning results - Programs executed can take different forms: -> Common Gateway Program (CGI) -> Servlet -> Downloaded code - Javascript

IP multicast

- allows a sender to transmit a single packet to a set of computers that form a group - sender not aware of individual recipients - available only for UDP - when a multicast message reaches a computer, copies are forwarded to all processes that have sockets bound to the multicast address and the specified port number - omission failures are possible

Fundamental Models

- allows distributed systems to be analysed in terms of fundamental properties regardless of the architecture Aspects: - Interaction model - Failure model - Security model

Wireless networking

- allows the integration of small portable computing devices Two popular mediums: - mobile computing (nomadic computing) - ubiquitous computing

Remote method Invocation

- an object can receive remote invocations is called a remote object - remote interface defines the methods that can be invoked by external processes communication module - responsible for communicating messages between client and server. uses three fields: -> message type -> request ID -> remote object reference

Digital signature with public private key

- binding identity to a message - use a digest function to turn the message into something smaller, almost always mapped to different digests - encrypt the digest with your private key, and append this encrypted digest to the message to send Receiver - takes the appended thing, decrypts it with their public key, compute the digest of the message locally, and compare the results

Proxy servers

- cache is a store of recently used objects that is closer to the client - new objects are added to the cache replacing existing objects

Challenge Response

- challenge clients identity by sending them a message that only they should decrypt - if they cannot make sense of the message then they fail - if they can respond then they pass

Roles and responsibilities

- client, a process that initiates connections to some other process - server, a process that can receive connections from some other processes - peer, can be seen as taking both the role of client and server, connecting and receiving connections from other peers

Consequences of distributed systems

- concurrency -> (distributed computers perform their tasks autonomously, system accessed by multiple users simultaneously) - no global clock -> (clocks on individual computers operate independently -> communicate via message passing, but this limits accuracy) - independent failures -> some components of system may fail while others still run -> failures of come computers may not be known to others immediately

Worker pool architecture

- creating new thread incurs overhead that can quickly become the bottleneck - create threads in advance - creates a fixed number of threads, as requests arrive at the server, they are put into a queue by I/O thread and from there assigned to the next available worker thread

Load manager

- gathers information about current state of system Centralised - a single load manager receives feedback from all other hosts in the system Hierarchical = load managers are arranged in a tree where the internal nodes are load managers and the lead nodes are hosts Decentralised = typically a load manager for every host and load managers communicate with all other load managers directly Sender-initiated/push = local host is responsible for determining the remote host to allocate the process Receiver-initiated/pull = remote hosts advertises to other hosts that a new process should be allocated on it

Request reply Design Issues

- handling timeouts - discarding duplicate messages - handling lost reply messages - history Strategies: - strategy to retry request message - mechanism to filter duplicates - strategy for results retransmission

Processes

- has an address space and has some amount of allocated memory - consists of one or more threads that are given processor time, including thread synchronization and communication resources - higher-level resources like open files and windows - resource sharing/interprocess communication is required for threads to access resources in other processes (eg. shares memory or socket communication)

Lightweight RPC

- if they realise that the procedure call is actually on localhost, it will do shared memory instead

Worst case assumptions

- interfaces exposed - networks insecure (communication) - limit the lifetime and scope of each secret - algorithms and program code are available to attackers - attackers may have access to large resources - minimise the trusted base

Communication paradigms

- interprocess communication are the underlying primitives -> shared memory, sockets - remote invocation - based on two way exchange between communicating entities in a distributed system ie. request-reply, remote procedure calls, remote method invocation - indirect communication -> space uncoupling - senders do not need to know who they are sending to -> time uncoupling - senders and receivers do not need to exist at the same time

RMI software

- layer that lies between the application and the communication and object reference modules. 1) Proxy - plays the role of a local object to the invoking object. There is a proxy for each remote object which is responsible for marshalling and unmarshalling 2) Dispatcher -> there is one dispatcher for each remote object class. responsible for mapping to an appropriate method in the skeleton based on the method ID 3) Skeleton - responsible for unmarshalling arguments and forwarding them to the servant - and marshalling the result from the servant to be returned to the client

Placement

- mapping services to multiple servers - caching -> storing data at places that are typically closer to the client - mobile code -> transferring code to the location that is most efficient - mobile agents -> code and data together

Factors contributing to RMI delay

- marshalling and unmarshalling - data copying (user to kernel space, and different layers of communication subsystem) - packet initialization - protocols headers and checksums - thread scheduling and context switching - waiting for acknowledgement

Interaction model

- models the interaction between processes - important aspects are performance of communication channels and computer clocks and timing events Performance of communication channels 1) Latency - delay between start of message transmission and received 2) Bandwidth - total amount of information that can be transmitted over a given time 3) Jitter - the variation in time taken to deliver a series of messages Event timing (time stamp varying due to:) - initial time setting being different - differences in clock drift times

Tuple Spaces

- more abstract form of shared memory - take, write, read Space un-coupled Time un-coupled State-based 1-to-1 or 1-to-many Limited scalability

Concurrency

- multiple clients can access same resources at same time - one approach is to make access sequential but this slows down the system - semaphores can be used

Issues with shared secret key

- need an agreed upon value such as a checksum to ensure it has not been changed on the way - sharing the shared key is difficult - no way to know if it is a copy/replay -> can use a logical counter

TCP Stream Communication

- no limit to message size - recovers lost messages though acknowledgements - uses a flow control - can reorder messages using ID's Client - creates a socket specifying the server address and port - read and write data using the stream associated with the socket Server - create a listening socket bound to a server port - wait for clients to request a connection - accepts a connection and creates a new stream socket for the server to communicate Failure Model - use checksum to check packets corruppt - timeouts and retransmission is used to deal with lost packets - under severe congestion TCP streams declare the connection to be broken - when communication is broken the process cannot distinguish between network failure and process crash - communicating process cannot definitely say whether the messages sent recently were received

Digital signature with shared key

- no need to encrypt - hash/digest message PLUS shared key - append this hash/digest to the message - very fast

Indirect Group communication

- offers a space uncoupled service whereby a message is sent to a group then the message is delivered to every member of the group - More than primitive IP multicast -> manages group membership -> detects failures and provides reliability and ordering guarantees

Threats from mobile code

- once you run it it has entire access to OS Trusted networking computing - only run it if it has certificates

Message Queues

- one-to-one processing Information dissemination Possible scalability

Types of publish/subscribe systems

1) Channel-based - publishers publish to named channels and subscribers subscribe to all events on a channel 2) Type-based - subscribers register interest in types of events and notifications occur when particular types of events occur 3) Topic based - subscribers register interest in particular topics and notifications occur when any information related to the topic arrives 4) Content based - Subscribers can specify interest in specific values/ranges for multiple attributes

Failure Model

1) Omission failures 2) Arbitrary failures 3) Timing failures

Omission failures

- process or communication channel fails to perform what is expected to do. Process omission failures -> normally caused by process crash -> repeated failures during invocation is an indication -> timeouts can be used to detect this type of crash -> a crash is referred to as a fail-stop if other processes can detect certainty that the process has crashed Communication omission failures -> send omission failure: message not being transported from sending process to its outgoing buffer -> receive omission failure: A message not being transported from the receiving process's incoming message buffer and receiving process -> channel omission failures: a message not being transported from outgoing message bugger to next processes incoming message buffer

Process migration

- processes can be migrated from one host to another by copying their address space - processes that are CPU dependent -> process migration can be effective - can be complicated - called checkpointing

Threads versus multiple processes

- processes provide larger encapsulating - threads cheaper to allocate/deallocate and easy to share resources via shared memory - processes require new address space -> new page table - if kernel doesn't schedule threads across multiple processors then can be useful to create a new process - context switching between threads also cheaper because of cache behaviour with address spaces - danger in thread switching is shared memory, can thread can easily corrupt

Interfaces

- programmers are only concerned with the abstraction offered by the interface, they are not aware of the implementation details - programmers also don't need to know underlying programming language or platform used to implement the service

UDP

- provides a message passing abstraction - best effort - simplest form of Interprocess communication - transmits a single message called a datagram to the receiving process

TCP

- provides an abstraction for a two-way stream - streams do not have message boundaries - stream provide the basis for producer consumer communication - data sent by producer are queued until the consumer is ready to receive them - consumer must wait when no data is available

Socket

- provides an end point for communication between processes - same socket can be used for both sending and receiving messages - any number of processes can send messages to the same port - for a process to receive messages, its socket must be bound to a local port on one of the internet addresses of the computer on which it runs

Networked OS

- provides support for networking operations - each host remains autonomous in the sense that it can still operate when its disconnected

Arbitrary Failures (Byzantine failure)

- refers to any type of failure that can occur -> intended steps omitted in processing -> message contents corrupted -> non-existent messages delivered -> real messages delivered more than once

Openness

- refers to the ability to extend the system in difference ways by adding hardware or software resources Approaches: - publishing key interfaces - allowing a uniform communication mechanism to communicate over published interfaces - ensuring all implementations adhere to the published standardds

Implementation Issues for Group Communication

- reliability and ordering -> FIFO -> casual ordering: if a message happens before another message then that's preserved -> total ordering: if a message is delivered before another message then that's preserved for all other processes - group membership management -> group members leave and join -> failed members -> notifying members of group membership changes -> changes to the group address

Transparency for remote invocation

- remote invocation is more prone to fail due to network and remote machines - latency of remote invocations is significantly higher than that of local invocations

Exchange Protocols

- request R protocol - request reply RR protocol - request reply acknowledge RRA protocol

Benefits of distributed systems

- resource sharing (one server many clients) - economical - reliablity - availability - scalability

3 Aspects that encryption solves

- secrecy/confidentiality and integrity - authentication -> encrypt it with something that the recipient should know if they are who they say they are, then they should be able to read the message - digital signature

UDP datagram communication

- server (receiver) binds its socket to a server port, which is made known to the client - a client (sender) binds its socket to any free port on the client machine - the receive method returns the internet address and port of the sender, in addition to the message allowing replies Blocking - non-blocking sends and blocking receives are used - operation returns when message is copied to the buffer - messages are discarded if no socket is bound on the port Timeouts - receive will wait indefinitely till messages are received - timeouts can be set on sockets to exit from infinite waits and check the conditions of the sender Possible failures - data corruption (checksum) - omission failures (buffers full) - order (messages might be delivered out of order)

HTTP as RR protocol

- specifies: -> the messages involved in the protocol -> the methods, arguments and results -> the rules for marshalling messages - allows authentication

Shared Memory

- two separate address spaces can share parts of real memory. this is useful for: 1) Libraries - the binary code for a library can often be quite large and is the same for all processes that use it 2) Kernel - the kernel maintains code and data that is often identical across all processes 3) Data sharing and communication - when two processes want to access the same data/want to communicate then shared mem can be used - processes can arrange, by calling appropriate system functions, to share a region of memory for this purpose. The kernel and a process can also share data/communicate using this approach

Protection - visible vs invisible

- visible resource can be discovered by listing a directory contents or searching for it - invisible resource should be known prior to the client - it can be guessed though

Timing failures

- when time limits set on process execution time, message delivery tie and clock rate drift -> clock exceeds bound on rate of drift -> process exceeds the bounds on the interval between two steps -> a message's transmission takes longer than the stated bound

Alternative Threading

1) Thread per request - avoid accessing a shared queue - potential parallelism can be maximised - thread allocation and deallocation incurs overhead - as requests increase the advantages or parallelism decreased - overhead in context switching between threads 2) Thread-per-connection - single client makes several requests, reduces overall number of threads 3) Thread-per-object - worker thread is associated for each remote object/resource that is being accessed - I/O thread receives requests and queues them for each worker thread

Middleware

= a software layer between the distributed application and the operating systems that provides a programming abstraction and masks the heterogeneity of the underlying networks, hardware, OS and programming languages Services: - Communication service: provide access transparent - Naming service: allows remote resources to be looked up by name similar to directories - Facilities for storage: Facilities for data persistence - Distributed Transactions: If an operation fails all referenced data remains unchanged - Security: Protection against various kinds of security threats

Copy on write

= technique that makes a copy of a memory region only when the new process actually writes to it. this saves time when allocating the new process and saves memory space

Distributed System definition

A system in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages

Failure Handling

Approaches: - detecting - masking - tolerating - recovery - redundancy

Block ciphers, Cipher block chaining

Block ciphers = operate on fixed size blocks of data - A message is subdivided into blocks and the lat block is padded to the standard length - each block is encrypted independently - a block is transmitted as soon as it is encrypted Cipher block chaining - avoids the problem of identical plain text block encrypted to identical encrypted blocks - before it is encrypted, XOR it with the previous block's encrypted version, then encrypt and share - if same message is send to two different recipients then it will still look the same and this poses an information leakage weakness - to guard against this a block called an initialisation vector is used to start each message in a different way Stream ciphers = used when data cannot be easily divided into blocks (ie. voice) - in this case, an agreed upon key stream is encrypted and the output is XORed with the data stream

Scalability

Challenges: - cost of physical resources, dobuling resources should double capability - controlling the performance loss, complexity of algorithms should be scalable - Resources should not run out (ie. IP addresses) - Avoiding performance bottlenecks - decentralised algorithms should be used to avoid performance bottlenecks

Architectural Patterns

Client server -> clients invoke services in servers and results are returned Peer to peer -> each process in the systems plays a similar role interacting cooperatively as peers

Choice of protocol

Connection oriented (TCP) - often used when client and server to exchange info in a single session for a reasonable length of time - this can suffer if end-points are changing their identity Connection-less protocol - used for request-reply, things that do not need a session for any length of time - less overhead, used for applications that require low latency

Monolithic and Micro kernel design

Monolithic kernel - one piece of code that has built into it all of the services it provides - one large entity - shared memory, speed advantages - service crashing can bring down entire kernel Micro kernel - build a kernel such that it has the minimal amount of code in it that can create a kernel - above that you put all the services that would create an OS - can be debugged - interprocess communication between services -> more message passing - service dying would not affect other services, microkernel safe (fault isolation)

Types of attacks on communication

Eavesdropping - obtaining copies of messages without authority Masquerading - sending or receiving messages using the identity of another principal without their authority Message tampering - intercepting messages and altering their contents before passing them on (middle man attack) Replaying - storing intercepted messages and sending them at a later date Denial of service - flooding a channel or other resource with messages in order to deny access for others

Secure electronic transactions

Emails, purchase of goods and services, banking transactions, micro transactions

Emulation and virtualization

Emulation = it does all the things that you would expect it to do Virtualisation = it is the thing you expect it to be, for all intensive purposes (virtualise the hardware)

Authentication with shared keys (Kerebos)

Sara is authentication trusted server, and knowns kA and kB Alice wants to contact Bob alice asks sara for ticket for bob -> sara responds with ticket encrypted with kB, and a shared secret kAB, all encrypted with kA alice decrypts that, and sends the kB encrypted ticket and a request (with alice) to bob bob gets the ticket and he can decrypt it with kB, which actually contains shared key kAB and alice, and can then match that with 'alice' on the request, and the secret key kAB has been shared (and is called a session key) -> useful where all users are part of a single organisation

models of remote invocation

Types of middleware 1) Remote Procedure Call model (RPC) - an extension of the conventional procedure call model 2) Remote method invocation model (RMI) - an extension of the object-oriented programming model 3) Request-reply protocol

Request reply operations

doOperation -> sends a request message to the remote object and returns the reply getRequest -> aquires a client request via the server port sendReply -> sends the reply message reply to the client at its internet address and port


Set pelajaran terkait

Solve Systems of Linear Equations

View Set

Intro To Philosophy Final Review

View Set

Chapter 11- Attitudes and Influencing Attitudes

View Set

Chapter 12 - Personal Auto Policy

View Set

Bible- Quiz 2 :Abraham to Joseph- Due 08/30-Final

View Set

Principles of Management - Chapter 9 Managing Human Resources and Diversity

View Set

producer consumer decomposer carnivore omnivore herbivore primary consumer secondary consumer food chain food web

View Set

Cultural Anthropology - Applied Perspective: Chapter 9: Marriage and the Family

View Set

IBM Cloud Technical Advocate - Intro to Containers/Kubernetes/ROKS Study Jam MC

View Set

Board Examination Questionnaires ( Mineral Sampling )

View Set

Introduction to Pharm ATI Module

View Set