Ch 1 Distributed systems
Three Scalability Dimensions
- Size (add more users or resources for performance) - Geographic (users and resources lie far apart) - Administrative (spans many orgs)
How can you bring down the service time and thus the response-to-server time ratio if your system is grinding to a halt?
- decreasing the arrival rate of requests or increasing the processing capacity of the service can help reduce the utilization of a service
Middleware for distributed systems offer what?
- it is a manager of resources that extends over multiple machines offering each application the same interface - it is a software layer placed between the OS and the distributed applications it also offers: - facilities for interapplication communication - security services - accounting services - masking of and recovery from failures - the main difference with the OS is that middleware is offered in a networked env
Cluster computing
- a collection of similar PCS closely connected with a high speed local area network - used for parallel programming in which one program runs on multiple machines
Grid computing
- a federation of computers where the systems fall under different administrative domains with different hardware, software or network topology - generally the computers are largely the same byt there are hybrid architectures
Open distributed system
- a system that offers components that can easily be used by or integrated into other systems -this often defines services through interfaces using an Interface Definition Language - it should be interoperable, portable, and extensible
Uniform Resource Locator (URL)
- an address used for locating a document on the Web. - it gives no clue as to the location of the websites main web server
What are some things that makes coordination between nodes challenging?
- computing elements are autonomous and need to coordinate - no global clock - authenticating a node to identify group membership can create scalability bottlenecks
Cloud computing
- dynamically construct the infrastructure needed for services - is an easily usable and accessible pool of virtualized resources that can be configured dynamically
How would you hide communication latencies?
- geographic scalability - you can avoid waiting for responses to remote service requests by using asynchronous communication, when a reply comes in the application is interrupted and a special handler is called to complete the request - for interactive apps it is better to move the computation to the client processing the request like checking a form on the client side for errors before sending the response to the server
To hide replication it is necessary that all replicas _____________
- have the same name - so the system also needs to support location transparency, otherwise it is impossible to refer to replicas in diff locations
Response time
- how long it takes before the service processes a request including time in the queue - the average number of requests divided by the average throughput - if the utilization time is very small then the response to service time ratio is close to 1so the request is almost instant - once the utilization comes closer to 1 the response to server time ratio increase to very high values meaning the system is coming to a halt
Extensible
- it should be easy to add or replace parts of a system without affecting those components that stay in place
4 Design goals of Distributed Systems
- make resources easily accessible and shareable - hide that resources are distributed - open - scalable
Why is it difficult to geographically scale distributed systems?
- many are based on synchronous communication where the client blocks until a reply is received from the server - in wide area networks where the communication can be slow this means you are waiting for a response and thus cant do other things
What challenge occurs when trying to create a single coherent system for a distributed environment and distribution transparency?
- partial failures are inevitable and if the user is not aware of which node is failing or the process that is failing on some set of unknown nodes then it will be hard to debug - with distribution transparency there is a performance price (say an application repeatedly tries to contact a server before giving up, masking the server failure before trying another one will slow down the system) -there is also a trade-off with geographic scalability, since hiding latencies and bandwidth restrictions are difficult - in some situations hiding distribution is not useful, like location based services on mobile phones where you want to find the nearest store
What are some of the least and most problematic scalability problems?
- size scalability is the least problematic since you can increase the capacity of the machine or add more machines - geographic scalability is tougher since network latencies are naturally bound from below and you are forced to copy data to the client which leads to consistency issues. - administrative scalability is the most challenging since it deals with politics across orgs
Three root causes for bottleneck considered with size scalability
- the computational capacity limited by the CPU - storage capacity, including I/O transfer rate - network between the user and the centralized service
Interoperability
- the extend to which two implementations of compenents from different developers can work together by relying on a common standard
Portability
- the extent an application developed for a distributed system A can be executed without modification for distributed system B that implements the same interfaces
Pervasive systems
- the introduction of mobile embeded computing devices that blurs the line between users and system components - many sensors pick up the user behavior and many actuators steer the behavior - this has unique solutions to make the system transparent and unobtrusive
False assumptions when designing a distributed system
- the network is reliable - the network is secure - the network is homogenous - the topology doesnt change - latency is zero - bandwidth is infinite - transport cost is zero - there is one admin
Partitioning and distribution
- used in scaling - splits a component into smaller parts and spreads it across a system - an example is how DNS is handled with the path of each name being a name of a host in the Internet, so a single server does not have to deal with all requests for name resolution
Overlay network
- used to organize a collection of nodes - in this case a node is a software process with a list of other processes it can send messages to
What are the two ways to organize the collection of nodes for identifying which nodes can communicate with one another?
1. Overlay network, where the node has a list of other processes it can send messages to 2. A node may need to first look up a neighbor
3 Types of pervasive systems
1. ubiquitous computing systems 2. mobile systems 3. sensor networks
ACID Properties
Atomic - indivisible, either all occur or none occur Consistent - should be consistent when the transaction begins and ends Isolated - transactions done interfere with one another Durable - changes are permanent once committed
What is a common communication service offered by middleware?
Remote Procedure Calls, which allow an application to invoke a function that is implemented and executed on a remote computer as if it was locally available
Caching and replication can lead to ____________ problems
consistency - therefore it often requires global synchronization mechanisms
What is a distributed system?
it is a collection of autonomous computing elements (either hardware devices or software processes) that appear to the user as a single coherent system
Mobile phone users can continue a conversation while they move, this is an example of ________________________
migration transparency
in distributed systems transactions are often constructed as a ____________
nested transaction, or a number of sub transactions where a top level transaction forks off children that run in parallel
The fraction of time pₙ that there are n requests in the system
pₙ = (1 - λ/µ) (λ/µ) ⁿ λ =arrival of requests per sec µ = capacity to process requests per sec
Remote method invocations RMI
similar to RPC except it operates on objects instead of functions - the disadvantage for both is that the caller and the calle both need to be up and running at the time of communication and how to refer to each other which is tight coupling - an alternative is having a messaging system carry requests (or a message oriented middleware like publish subscribe systems)
Utilization of a service
the fraction of time that it is busy U = ∑pⁿ = 1 - p₀
distribution transparency
the internal details of the distribution are hidden from the user, this includes where the data is stored, on which computer a process is executing, or how the data is replicated.