Distributed Systems
Briefly explain the protocol for atomic multicast in pseudocode.
Sender: reliably multicast message to rest of group All participants: On receiving message: - If first time I'm seeing this message then { reliably multicast again; and deliver to application; } appli - else { discard copy ; }
What is a class loader?
A class loader is an object that is responsible for loading classes. All classes come from a class loader. Local class loaders load local classes. Non local classes must be loaded by specialised loaders. Class loaders are essential for Java's security. They provide a birth certificate to classes.
How can we ensure access control?
Authentication
What is DES? Name the two functions.
Data Encryption Standard. DES_encrypt and DES_decrypt
What is group communication?
Enables the multicasting of a message to a group of processes as a single action.
What are the ways to realise content based approaches (implement init)
Flooding, filtering, advertisements, rendezvous
What is distributed shared memory?
*Something which provides an abstraction of shared memory in a distributed system *If data is not available locally, it must be fetched (similar to a page fault in traditional virtual memory systems). *Hides distribution entirely from the programmer * No new programming abstractions to learn * Can be costly to implement though, in terms of maintaining consistency of shared data
What is a tuple space?
*Yet another paradigm for indirect communication offering an abstraction of semi-structured shared space consisting of a number of tuples * Processes can write tuples to the tuple space * Processes can then read a tuple space which also leaves a copy in the tuple space * Alternatively, they can take the tuple which removes it from the space * Both the read and take operations are based on pattern matching (associative access)
Describe how a reliable multicast will be implemented.
1) Sender sends a message to each member of the group and waits the acknowledgement (ACK) 2) If some acknowledgements are not received in a given period of time, re-send the message, repeat this N times if necessary 3) If all acknowledgements received, report success to the caller!
What is a distributed system?
A distributed system is a collection of independent computers that appears to its users as a single coherent system
What is publish subscribe? Explain the process.
A key example of a distributed event based system whereby: Publishes publish an event E. publish(E) Subscribers express interested in a set of events specified by a filter F. subscribe(F) Events are delivered asynchronously: notify(E) Publish will optionally advertise what they will produce: advertise(F) The system acts as a broken to deliver events to the right subscribers.
What is sandboxing?
A sandbox is a security mechanism for separating running programs. It is often used to execute untested code, or untrusted programs from unverified third parties, suppliers, untrusted users and untrusted websites. In general, a sandbox is an isolated computing environment in which a program or file can be executed without affecting the application in which it runs. Sandboxes are used by software developers to test new programming code.
What is the Java Security Manager?
A security manager is an object that defines a security policy for an application. This policy specifies actions that are unsafe or sensitive. Any actions not allowed by the security policy cause a SecurityException to be thrown. An application can also query its security manager to discover which actions are allowed. Only one security manager for all eternity (of the JVM)! The security manager protects potentially dangerous operations inside the JVM (file input output, network access, create of new class loaders, etc)
What is a session object in JMS?
A session object is a single-threaded context for producing and consuming message. It's created using Connection.createSession(..)
What is a message queue?
An alternative paradigm for indirect communication based on distributed queues. Messages are sent to a queue. Processes can then access messages in the queue either by receiving a message (blocking), polling for messages (non-blocking), or being notified when messages arrive. Messages are persistent and message delivery is reliable Fundamentally a point to point service (not multi-party)
What is group communication?
Based on the concept of a group abstraction, with operations provided to join and leave the group (like group memberships) Messages are sent to a group rather than any individual process and messages are then delivered to each member of the group Often enhanced by guarantees in terms of message ordering and reliablity.
What are the motivations for distributed systems?
Because the world is distributed. Because joining forces increases performance, availability. Because problems rarely hits two different places at exactly the same time. As a company having only one database server is a bad idea.
What are the two main modes of message consumption in JMS?
Blocking - aka synchronous pull mode, with MessageConsumer.recieve() Non blocking - aka asynchronous push mode with MessageListener.onMessage(..) The receiver needs to implement this method which will be called by JMS when a message arrives.
Describe the channel based subscription model.
Channel based: Publishers publish events to named channels and subscribers then subscribe to one of these named channels and therefore receive all events sent to that channel.
Sandboxing is based on what three components?
Class loaders Byte-code verifier Security manager
When replicating servers, what is needed?
Collective coordination for either fault tolerance or scalability.
Explain rendezvous in context to content based subscription model approaches.
Consider the set of all possible events as an event space Partition this event space into pieces and allocate responsibility for each piece to a given broker (known as the rendezvous node for that event) Implementation requires two functions to be defined: SN(s) which takes a given subscription S, and returns one of more rendezvous nodes which take responsibility for that subscription EN(e) which takes a given event, E, and returns one or more rendezvous nodes responsible for matching E against subscriptions in the system. Can lead to highly scalable implementations.
Describe the content based subscription model.
Content Based Generalisation of topic based allowing the expression of subscriptions over a range of fields in an event notification. The filter is a query defined in terms of compositions of contestants over the value of event attributes Significantly more expressive butt with significant new challenges introduced in terms of implementation
What is a policy file?
Describes what can be enforced, it's a bit like a code of law. Organised into a serious of grant statements.
How can we ensure secure communication?
Encryption
Explain filtering in context to content based subscription model approaches.
Filters propagated back through the broker network with the intuition that notifications are only forwarded through the broker network if there is a path to a valid subscriber.
How are queue and topic object retrieved?
From a naming service. And what is retrieved is a REFERENCE to a topic or a queue.
When does reliable multicast work fine?
If network problems are transient (so messages eventually get through) If there's no crash (or no spurious behaviour)
Under what conditions does Java allow mobile code?
If sandboxing is activated, otherwise, no chance.
Explain advertisement in context to content based subscription model approaches.
In above scheme, filters need to be sent to all possible publishers. Overhead can be reduced by use of advertisement.
What is indirection communication?
In direct communication sender and receivers exist in the same time and know of each other. Communication between entities in a distributed system through an intermediary with no direct coupling between the sender and the receiver(s) RPC / RMI = direct link between client and server Indirect communication = uncoupling Space uncoupling: participants not known to each other Time uncoupling: independent lifecycles
What are the advantages of group communication?
In terms of server load, the servers only needs to send one copy of the message. In terms of network load, it means the network has fewer packets to transfer. In terms of bandwidth, it uses less bandwidth to reach more clients, unlike multiple unicasts.
Is atomic multicast reliable?
It protects against faulty participants. All members receive the message, OR NO ONE DOES.
What's the point in group communication?
It supports the efficient dissemination of data. Service discovery, publish/subscribe etc. Supports for replication: fault tolerance and scalability.
What is JMS?
Java Message Service. An API that is part of the J2EE standard. It's a set of interfaces and semantics that define how a JMS client access the facilities of enterprise messaging product. Programs written with JMS APIs can run on any JMS implementation JMS allows Java programs to access the facilities provided by the Message Orientated Middleware (MOM) Clients communicate via message channels.
Why is mobile code a big security threat?
Malicious mobile code can compromise the server or client.
What are the two main modes of communication in JMS?
Message queueing (one to one communication) Publish subscribe (1 to many communication)
Explain publish subscribe (1-many)
Messages are "published" relative to a topic object on the server They are received by message consumers (clients) that have subscribed to the topic One message is received by all subscribers If clients subscribe/unsubscribe while messages are sent, results are undefined (they may or may not get the messages)
Explain 1 to 1 communication
Messages are sent to a queue object on the server. Message consumers pull the messages from the queue. One message can only be received by one client. But several clients can be writing/reading from the same queue concurrently.
Messages vs RPC
Messages provide loose coupling, asynchronous interaction is possible. RPC uses synchronous messages In large scale systems: RPC are tightly coupled and the failure of one program has a direct impact of another. Messages are highly flexible.
What is Middleware?
Middleware is the software that connects software components or enterprise applications. Middleware is the software layer that lies between the operating system and the applications on each side of a distributed computer network. Typically, it supports complex, distributed business software applications. Provides abstraction and hides complexity.
What is mobile code?
Mobile code refers to code that can be sent from one destination to another and run at that destination.
What is the problem with reliable multicast?
Not very scalable. If there are N receivers, then sender gets N acknowledgements. ****ing imagine that. If we send M messages, there will be M x N acknowledgement? ****ing blimey.
In terms of communication, what is broadcast?
One to ALL.
In terms of communication, what is multicast?
One to many or many to many
In terms of communication, what is unicast?
One to one.
What are the types of group communication?
Open group (anyone can join) Closed groups (closed membership) Peer - all members are equal and all members send messages to the group. Client-server - replicated servers, client may or may not care which server answers Diffusion group - server sends to other servers and clients Hierarchical - one or more members are different from the rest. Eg, president and employees of a company, distance learning.
What are the advanced aspects of JMS?
Reliability: By default, messages are sent in persistent mode. JMS server takes extra care to prevent message loss In particular messages sent in this mode are logged to stable storage when sent Possible to switch this off to gain performance Durability: Default: consumers only receive messages sent while active Possible to create "durable subscription" Message expiration: By default, messages never expire Possible to set expiration time Messages not received after this time are destroyed Transaction: Grouping of a sequence of client operations (sending, receiving) into one atomic unit of work, If anything goes wrong, work done is rolled back and transaction can be started all over again.
How can we ensure protection against malicious mobile code?
Sandboxing
What are the three different ways to protect against security threats?
Sandboxing, encryption, authentication
Explain flooding in context to content based subscription model approaches.
Send all published events to all possible recipients, or alternatively all subscriptions back to all possible senders. Can be supported by underlying multicast service as available. Simple but significant message overload.
What is space uncoupling?
Space uncoupling: a sender can send a message but does not know to whom it is sending nor if more than one, if anyone, will receive the message. Because of this, the system developer has many degrees of freedom in dealing with change: participants (senders or receivers) can be replaced, updated, replicated or migrated
How many times is the X used in challenge response?
The same X is never reused. Once.
What do message producers and consumers do?
They provide methods to send and receive messages either to/from a queue or about a topic. Message production by Producer: send(Message message) + variant with fine tuning. Message reception by Consumer: Synchronous: receive(), receive(long time out), receiveNowWait() Asynchronous: Listener mechanism setMessageListener(..)
What is time uncoupling?
Time uncoupling: a sender can send a message even if the receiver is still not available. The message is stored and picked up at a later moment. In which the sender and receiver(s) can have independent lifetimes (in other words, the sender and receiver(s) do not need to exist at the same time to communicate) Important benefits particularly in volatile environments where senders and receivers may come and go.
Why use Mobile Code?
To pass objects from one machine to the other that has no local implementation of the objects.
Describe the topic based subscription model.
Topic based Each notification is expressed in terms of a number of fields, with one field denoting the topic and subscriptions defined in terms of a *topic of interest* Similar to channel-based approaches (implicit vs explicit) Can be enhanced by introducing hierarchies of topics.
Name the goals of middleware. [TO REDS]
Transparency - The ability to view a distributed system as if it were a single computer - Varying dimensions of transparency incl. location, access, migration, etc. - The degree of transparency is a key decision in any systems architecture Openness - The offering of services according to standard rules (syntax and semantics) - Openness provides support for the key properties of portability and interoperability - Again, the degree of openness is a key factor in systems design Resource sharing - The ability to access and share resources in a distributed environment - The bread and butter of distributed systems Extensibility - The ability to be able to introduce new or modified functionality Dependability/ Quality of Service - Security - Providing secure and authenticated channels, access control, key management, etc. - Fault-tolerance - Providing highly available and resilient distributed applications and services Scalability - Scalable with respect to size, e.g. support massive growth in the number of users of a service - Scalable with respect to geography, e.g. supporting distributed systems that span continents (dealing with latencies, etc) - Scalable with respect to administration, e.g. supporting systems which span many diferent administrative organisations
Describe the type based subscription model.
Type based Intrinsically linked with (typed) object-based approaches Subscriptions defined in terms of this type (signature/methods/attributes) with matching defined in terms of types or subtypes of the given filter. Can be interpreted elegantly into programming languages and also check type of correctness of subscriptions
How to avoid a ACK explosion! Tell me.
Use negative ACKS (so NACKS). If everything is fine, receiver does not say anything. If the message is lost, the receiver complains to the sender. NACK explosions can still occur but they are less likely. However there's a garbage collection problem where past messages are only kept for a certain period. There are more advanced schemes like limiting NACK instances using NACK suppressions or hierarchical feedback control.
What is byte code verifier?
When a class loader presents the bytecodes of a newly loaded Java platform class to the virtual machine, these bytecodes are first inspected by a verifier. The verifier checks that the instructions cannot perform actions that are obviously damaging. All classes except for system classes are verified. Byte-Code-Verifier checks the byte code obtained by class loaders before it is allowed to execute.
What is symmetric key encryption?
When the same key is used to encrypt and decrypt data.
What is challenge response protocol?
Where we can identify a user without exchanging the secret key. It requires the server to know all secret keys. Real application use asymmetric encryption to distribute keys.
What key decision needs to be made regarding implementing publish subscribe?
Whether to go for a centralised, distributed or fully peer to peer architecture. Most systems have a distributed architecture consisting of a network or brokers.
Is best effort guaranteed?
no, messages sent to all members may or may not arrive.
Is reliable multicast reliable?
reasonable efforts are made to ensure delivery in spite of message losses Can be based on positive or negative acknowledgements No guarantees if the sender crashes during multicast