quiz 1

Ace your homework & exams now with Quizwiz!

(policy) what level of consistency do we require for client-cached data?

(mechanism) allow dynamic setting of caching policies

(policy) what level of secrecy do we require for communication?

(mechanism) offer different encryption algorithms

(policy) which QoS requirements do we adjust in the face of varying bandwidth?

(mechanism) provide adjustable QoS parameters per data stream

(policy) which operations do we allow downloaded code to perform?

(mechanism) support different levels of trust for mobile code

traditional three layered view

application interface layer (top level) processing layer data layer (bottom)

characteristic features of a distributed system

autonomous computing elements (nodes), which can be either hardware devices or software processes (very broad term) single coherent system: users or applications perceive a single system (transparent) nodes need to collaborate

openness

being able to interact with services from other open systems, regardless of the underlying environment "uber important" in distributed systems because you're gonna have different applications working together, and like, if one of them only works on POSIX systems the windows users can go truck themselves cause the applications won't be able to interoperate RMI/RPC/etc. factor into this bc then the applications can work together at a language-to-language level

relocation transparency

between actions something moved and i don't wanna know hide that an object may have moved to another location a big deal in cloud computing

concurrency transparency

hide that an object may be shared by several independent users the object is being shared by several different users, and this especially works if something is read-only. it gets tricky when you have writable objects. in this case, you may not WANT transparency (google docs)

failure transparency

hide the failure and recovery (via replication) of an object

Location transparency

hide where an object is located not always desired, sometimes that context is important to the users

distribution transparency

hiding the fact that the system is distributed a difficulty is in the long times for some actions that tell the user the action must be happening over a long distance- must be distributed

multicomputer

high performance distributed computing cluster- homogenous (but not necessarily?), single managing node (master) easier to scale and less expensive than multiprocessors, but more difficult to program and not as fast solution: shared memory model on top of multicomputer example: map all main-memory pages from different processors into one single virtual address space has largely been abandoned in favor of multiprocessors now, but i know we have one on campus

multiprocessor & multicore

high performance distributed computing so. so much money. (in the beginning) faster than multicomputer but can't be scaled up without just buying a new computer easier to program than a multicomputer

importance of specifications

how are you supposed to know how to make your system open if you don't know the requirements better to specify policy, not mechanism

how do autonomous nodes handle group membership in a distributed system?

how do you know who a member is and who isn't? group membership isn't an issue with clustering for obvious reasons, the computers are literally linked together but in general, this is a huge issue

list and describe typical middleware services

human-time services (such as web request servicing) & machine-time services "here's a list of services and descriptions, match them up"

Enterprise application integration

Distributed information systems example: MOM allow direct application-to-application communication nested transactions (distributed transaction processing is like a Whole Thing) in many cases, the data involved in a transaction is distributed across several servers. A TP monitor is responsible for coordinating the execution of a transaction

the network is reliable

FALSE- we only think it is because we have protocols that can detect and recover when a packet is lost

cloud computing

High performance distributed computing an environment in which people will sell services to people, you sell me a service and i don't have to worry about where that service is, i don't have to maintain it or manage allows organizations to outsource their IT infrastructure (hardware and software); only valuable if this is cheaper than implementing & maintaining own system

grid computing

High performance distributed computing the next step from clusters lots of nodes from everywhere- heterogenous, dispersed across several organizations, can easily span a wide-area network people actually wanna share resources to allow collaboration,grids generally use virtual organizations not very popular due to scope (application handling across the whole internet), not many companies actually want to share their resources

basic server operations

PUT - create a new resource GET - retrieve the state of a resource DELETE - delete a resource POST - modify a resource by transferring a new state

ubiquitous computing systems

Pervasive systems continuously present, i.e. there is a continuous interaction between system and user. devices are networked, distributed, and accessible in a transparent manner interaction between users and devices is highly unobtrusive system is aware of a user's context in order to optimize interaction devices operate autonomously without human intervention, are highly self-managed system can handle a wide range of dynamic actions and interactions see: IoT

mobile computing systems

Pervasive systems emphasis is on the fact that devices are inherently mobile as a device's location changes, network has to adjust for change of local services, reachability, etc. (discovery) communication may become more difficult- can't guarantee connectivity (a stable route). must allow disruption-tolerant networking mobility patterns

sensor ( & actuator) nets

Pervasive systems emphasis on the actual (collaborative) sensing and actuation of the environment sensors are attached to nodes that are many, simple (small memory/ compute/ communication capacity), and often battery-powered can potentially work as a distributed database- introduce duty cycles because many sensor networks need to operate on a strict energy budget

define what a distributed system is

a collection of autonomous computing elements that appears to its users as a single coherent system

virtual organization

a grouping of users (or their ID's) that will allow for authorization on resource allocation

duty cycle

a node is active during Active Time and suspended during Suspended time, and the cycle is Active Time over Total Time cycle is usually between 10-30%, can even be < 1% tho if your cycle is too low, sensor nodes may not wake up at the same time anymore (time to wake up may vary) and become permanently disconnected (active during non-overlapping time slots) if your cycle is too high, all the power being used by all the sensors at once could drain your battery too much

partial failures

a snag in distribution transparency it's inevitable that at any time only a Part of the distributed system fails hiding partial failures and their recovery is often very difficult and in general impossible to hide either the system will fail to recover in time to complete the user's request, or it will slow down because of recovery time if one or two nodes die, maybe that's fine. if a whole subnet goes down? fuk u

groupware

a specific kind of distributed system that is specifically worried about resource sharing example: whiteboard (ancient) was a way to build up a multicast tree in the commodity internet by piping messages from source through shortest path tree (re: dijkstra)

IDL

a specification language used to describe a software component's application programming interface (API) describe an interface in a language-independent way, enabling communication between software components that do not share one language commonly used in remote procedure call software- machines at either end of the link may be using different operating systems and computer languages helps w openness

application layer (cloud computing)

actual applications, like office suites comparable to suite of apps shipped with OS's

integrating applications via messaging

allows decoupling in time & space requires lots of detailed protocol handling by the application (low level)

temporal coupling

both nodes have to be up and running in order to send the message

problems w geographical scalability

can't just go from LAN to WAN multicast issues; how to know when to duplicate a message in its journey when you're sending it to multiple recipients

connectivity layer (grid computing)

collective -> connectivity -> fabric communication/transaction protocols for moving data between resources also various authentication protocols

resource layer (grid computer)

collective -> resource -> fabric manages a single resource, such as creating processes or reading data

compare and contrast distributed system middleware to an operating system.

compare: layers (middleware is kind of even a layer over the operating system to the distributed system) contrast: middleware is over the OS and not over the hardware duh

object-based architectural style

components are objects, connected to each other through procedure calls objects may be placed on different machines; calls can thus execute across a network objects Encapsulate data and offer methods on that data without revealing the internal implementation (abstraction)

problems with administrative scalability

conflicting policies concerning usage (and thus payment), management, and security examples: computational grids and shared equipment

an open system should:

conform to well defined interfaces easily interoperate support portability of applications be easily extensible

application layer (grid computing)

contains actual grid applications in a single organization

processing layer

contains the functions of an application (ie wit/o specific data) processes the data, puts it in a useful format (data structure), sends it to the data layer for storage (or right back to the application layer for response) ex components: query generator, html generator, ranking algorithm

infrastructure layer (cloud computing)

deploys virtualization techniques. evolves allocating and managing virtual storage devices (file-/block-level) and virtual servers. examples: amazon s3 amazon ec2

describe how the distributed system should appear as a single coherent system

distribution transparency people shouldn't be able to see when nodes fail when nodes fail, the node can route requests to another node when you save a file, you don't pick a node to save it to (location transparency) the collection of nodes as a whole operates the same, no matter when, where, or how the interaction between a user and the system takes place

structured overlay networks

each node has a Well Defined set of neighbors with whom it can communicate (tree, ring, or a similar graph-like structure) example: whiteboard multicasting

overlay network

each node in the collection communicates with only other nodes in the system (its neighbors) a distributed system has certain communication patterns, so you build a virtual network over the hard-wired network it's a set of rules about which nodes can talk to which nodes in what ways the set of neighbors may be dynamic (the neighbors may change over time), or may only be known implicitly (via lookup) example: p2p systems

how do autonomous nodes handle the notion of time in a distributed system?

each node is autonomous and will thus have its own notion of time (no global clock) this leads to fundamental synchronization and coordination problems in many distributed systems nodes communicate with each other (they send messages), and the order of these messages is important, so there needs to be some global abstract concept of time, so that nodes can see the order of messages they receive ways to mitigate this will be covered later

integrating applications via remote procedure call

effective when execution of a series of actions is needed require caller and callee to be up & running at the same time

Pervasive systems

emerging next generation of distributed systems nodes are small, mobile, and often embedded in a larger system characterized by the fact that the system naturally blends into the user's environment

extensibility

extend an app/system by adding new features to it again, only possible if there is a well defined open spec

important design goals of Supporting resource sharing

fast- sharing not only pieces of the data, but the resources of the underlying network, using pipelines between you and everyone at once "the network is the computer" - lol how many HECKING times have i heard that examples: cloud-based shared storage and files p2p-assisted multimedia streaming shared mail services (outsourced mail systems) shared web hosting (content distribution networks)

layered architecture

from the perspective of a layer, you're talking to another layer through a protocol Layer N talks to Layer N-1 through Layer N-1's interface Corresponding layers in different parties literally are communicating via protocol (there are services connecting layers & their interfaces) this is good for distributed systems because from the layer's perspective, they are handling every message the same way (via protocol/interface that mimics protocol)

collective layer (grid computing)

handles access to multiple resources (discovery, scheduling, replication)

integrating applications via file transfer

has been used as a way to make applications communicate simple but not flexible figure out file format & layout, figure out file management, update propagation and notifications

Access transparency

hide differences in data representation and how an object is accessed here is the api if you want to get a thing. when you call this api you get back an object like this thing. you don't care how i internally store it. you have this thing that does the thing and that's it.

Replication transparency

hide that an object is replicated if a node goes down, another node can still do the thing bc it has a copy of the object being worked on even though you're hiding a failure, it takes time to replicate the object across nodes & the user will still see that it also helps you hide distance via caching, push the copy to a node that's local to the people using it a lot keeping replicas exactly up-to-date with the master takes time; immediately flushing write operations to disk for fault tolerance. if you're modifying a replica, you have to then distribute those modifications to every replica and the master

caching as a form of replication

make copies of data available at different machines examples: replicated file servers and databases mirrored web sites web caches (in browsers and proxies) file caching (at server and client) consistency is an issue for obvious reasons

hiding communication latencies (technique for scaling)

make use of asynchronous communication have separate handler for incoming response problem: not every application fits this model

dimension of scalability: geographical

maximum distance between nodes

connector

mechanism that mediates communication, coordination, or cooperation among components example: facilities for (remote) procedure call, messaging, or streaming

sequential consistency

method of keeping replicas up-to-date, v expensive if i write to an object, then that object is locked, and all the updates to all the replicas are then sent out, and only when the machines with the replicas respond that their replicas have been updated, then the object is unlocked

many distributed systems are needlessly complex because of...

mistakes that have to be patched false assumptions

integrating applications via a shared database

more flexible than using file transfer, but still requires common data scheme bottleneck risk

composability

multiple services can combine to become SUPER services only possible if the systems are highly open & interoperable

important design goals of Being open

not only that we have an open specification, but also that the specification is Good and can be duplicated, will work with other apps following the spec

dimension of scalability: size

number of users and/or processes most systems only account for it to a certain extent common solution is having multiple powerful servers operating independently in parallel

one-way call

one layer may call the layer below it or may bypass that layer to call an even lower layer example: you got an application that calls a database server, which calls a file system. but also the application could bypass the database to call the file system.

how DNS uses partitioning and distribution to scale

partition data and computations across multiple machines not every server has all the names. the authoritative name server does. but you try to find your data by jumping around the servers

difference between a policy & mechanism??

policy: what service/function you want to provide in your system mechanism: how to implement that policy the stricter the separation between policy and mechanism, the more we need to ensure proper mechanisms, potentially leading to many configuration parameters and complex management. hardcoding policies often simplifies management and reduces complexity at the price of less flexibility.

hardware layer (cloud computing)

processors, routers, power & cooling systems. and literally all hardware. also datacenters customers normally never get to see these. (duh)

platform layer (cloud computing)

provides higher-level abstractions for storage (databases) and such. examples: amazon s3 storage system offers an API for (locally created) files to be organized and stored in so-called buckets. azure google app engine

fabric layer (grid computing)

provides interfaces to local resources (for querying state and capabilities, locking, etc)

(scaling technique) facilitating solution by moving computations to client (or somewhere other than the server)

reducing server load have your internet browser generate the html layout from data (and xslt stylesheet) pulled from the server, rather than the server making and sending the html

an architectural style is formulated in terms of...

replaceable components with well-defined interfaces the way that components are connected to each other the data exchanged between components how these components and connectors are jointly configured into a system

dimension of scalability: administrative

scale privileges number of administrative domains

examples where knowledge of location is important

scheduling appointments, picking a geographically close mirror, literally any location-based app

data layer

stores the data it gets from the processing layer contains the data that the client wants to manipulate through the application

coordination in a distributed system

temporal and referential coupling

event-based coordination

temporally coupled, referentially decoupled if you're not live when an event happens you don't handle it

shared data space coordination

temporally decoupled, referentially decoupled no direct communication. component publishes to a shared data space, and all the subscribed components go get that data if they're live

middleware

the OS of distributed systems contains commonly used components and functions that need not be implemented by applications separately offers communication facilities for integration examples: Remote procedure call (RPC)- requests are sent through local procedure call, packaged as message, processed, responded through message, and result returned from call Message oriented middleware (MOM)- messages are sent to logical contact point and forwarded to subscribed applications

root causes for size scalability problems with centralized solutions

the computational capacity, limited by the CPUs The storage capacity, including the transfer rate between the CPUs and disks the network between the user and the centralized service

describe and list the false assumptions when developing a distributed applications

the network is reliable the network is secure the network is homogenous- all of the nodes in the network are the same hardware, OS, etc. the topology doesn't change- the overlay network will not change latency is zero- there will not be any misc. delays in the network bandwidth is infinite transport cost is zero there's only one administrator

referential coupling

the sending node has to have a reference to the receiving node to send the message

important design goals of Making distribution transparent

the user doesn't wanna know this stuff, and we don't want the user to know most of this stuff. it's just gonna freak them out

keeping duty-cycled networks in sync

there's this whole algorithm that's probably p interesting but he didn't actually explain it ¯\_(ツ)_/¯

unstructured overlay networks

think flooding each node has references to Randomly selected other nodes from the system "take this and forward it on, it goes to someone over there" we haven't really built an overlay network on top of the underlying network so you have to use something LIKE flooding to get things from one node to the next examples: random walk - randomly choose a neighbor p2p (in which peers can come and go)

interoperability

to be open, systems need to be able to interoperate easily this means they have to have well defined specifications

application interface layer

top level contains units for interfacing to users or external applications sends user input to the processing layer

trade-offs of providing a single coherent view of the distributed system

transparency makes it easy for people to build clients for your system (via abstraction) but it's hard to make your system transparent- sometimes you add so much overhead by replicating to hide failures (distributing updates & copies to all nodes) cache coherency protocol, locking, etc. increase overhead in performance full distribution transparency may not be viable; there are communication latencies that can't be hidden. Completely hiding failures of networks & nodes is impossible user: is this computer slow or failing? did the server do the thing before it crashed?? idk!

Request / response downcall

typical recursion, u know the drill downcall is initiated by upper layer calling a lower layer

that one layered architecture w an upcall & u pass a handle to the lower layer

upcalls are implemented with a lower layer calling a higher layer lower layer finds out something happens, invokes an upper-layer method to handle the event

RESTful architectures

view a distributed system as a collection of resources individually managed by components . resources may be added, removed, retrieved, and modified by remote applications. resources are identified through a single naming scheme, all services offer the same interface, messages sent to or from a service are fully self-described, and after executing an operation at a service, that component forgets everything about the caller

why do we want an overlay network

we want a virtual network over the system where we know who we are and who our connected nodes in the system are so we know where to send messages (think p2p nets) who knows about who, and who is responsible for getting things from one place to the next

the difficult issue when determining how much distribution transparency a distributed system should attempt to provide

what information is actually useful to the user (i.e. location-based services) sometimes you just can't hide something, and sometimes the user wants to know something failed full transparency may cost too much performance (via overhead), which would tip the user off anyways

mobility patterns

what's the relationship between information dissemination and human mobility? pocket-switched networks relay message using known neighbors (friends) as steps

migration transparency

while an object is being moved i should still be able to access it t0: you access t1, t2, t3: object being moved, but you can still access t4: still access the object could even be updated while it's being moved. hide that an object may move to another location mid-access most virtual machines on distributed systems have this (shell)

neutral specification

you can write your service in any language and it will still be able to interface w the thing (through the idl)

describe Figure 1.1

you got these computers. and theyre in a distributed system. so they got their own local OS's. and THEN.... they got a Distributed System Layer on top of that. that's the middleware thanks to that middleware. the applications running on top of it on each separate computer are interacting with the same interface on every computer. even though all the computers have different local OS's.


Related study sets

ECON Final: Monopolistic Competition

View Set

Chapter 20: Developing an Evidence-Based Practice

View Set

Chapter 5: Written and Oral Communication/Understand the Processes, Conventions, and Modes of Written and Oral Communication

View Set

CyberOps Associate – FINAL Exam 0-67

View Set

APUSH Ch. 12 Check Your Understanding

View Set

Chapter 6: Entrepreneurial Opportunities

View Set

Groundwater and Karst Topography

View Set

Chapter 11: Connect Master Intro to Business

View Set

Practice Questions ~ Fundamentals

View Set