Distributed Systems

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

What is Optimistic Concurrency Control? (Assumption?) Explain the phases: Working, Validation, and Update What is time stamp ordering?

*Assumes transactions are more likely to complete than not. *a fully optimistic system uses no locks Three Phases 1) Working: uses a private workspace to keep non-committed results isolated 2) Validation: when a transaction is ready to commit, check if any conflicts took place. 3) Update: tentative changes are made permanent Time Stamp Ordering keeps track of two time stamps per object. *Time stamp of last read, and timestamp of last write If transaction wants to write, its timestamp must be newer than last write.

Coda *goals *client modification logs *disconnection and reintegration *AVSG *resolution

*built on AFS to support replicated servers and disconnected operation *Accessible Volume Storage Group: subset of the VSG the client can currently access *if no servers can be accessed, client goes into disconnected operation mode. Any file updates are logged to the client mod. log. *Reconnection: client plays back the log to send updates to the server if conflicts arise, user must handle them *Hoarding: user-directed caching

Centralized Deadlock Detection How can False Deadlock occur? What is the Global Wait For graph?

*central coordinator manages a resource graph of processes and the resources they are using * each time process wants / or releases a resource a message is sent to coordinator and it builds a "wait-for graph". From this graph, cycles are detected. False Deadlock can occur if "release" and "waiting-for" messages are sent out of order to the coordinator. Global Wait For Graph is composed of all local wait for graphs

EXPLAIN: Two Phase Commit Protocol, Disadvantages

*coordinator asks every member "Can you commit" *every member must respond *coordinator send commit or abort message *every member sends acknowledgement Disadvantage *failed processes MUST recover or transaction blocks

EXPLAIN: Three Phase Commit Protocol, Disadvantages

*coordinator sends "can you commit?" to all members *participants respond *coordinator sends "prepare to commit" to all members *participants acknowledge *coordinator sends "commit" *participants acknowledge Disadvantages *partitioned network may cause a subset to elect another coordinator and vote on a different outcome *if a coordinator dies, then recovers it may read its write ahead log and resume in an obsolete state

Chubby

*highly available and persistent distribute lock service and file system *one master, +4 replicas *clients can subscribe to events for an open file *files can only be read and written in their entirety *all chubby data is cached in memory *uses Paxos for master election and also to propagate data to replicas

NFSv4 Features

*protocol now stateful *supports better caching (similar to oplocks) *compound RPC support added

EXPLAIN: Paxos Algorithm What happens with a lot of concurrent requests?

1) Client tells the proposer what command it wants to add to system 2) proposer picks the next highest sequence number and asks all acceptors to reserve this number 3) Acceptors agree and the proposer sends the message from the client with the associated number 4) This message is propagated by learners proposer picks a sequence number, acceptors promise not to accept any proposals with lower sequence numbers.

4 Conditions for Deadlock 4 Solution to Deadlock

1) Mutual Exclusion 2) Hold and wait 3) No preemption 4) Circular Wait Solutions: 1) ignore 2) detect 3) prevent : stop 1/4 conditions from happening 4) Avoid : almost impossible

4 Things a DNS server does for a query. What do Root Name Servers do? Zone?

1. Answer if it knows answer 2. contact another name server to search for it 3. return error if domain does not exist 4. return referral Root name server keeps track of servers with top-level domains *Zone: subtree of Internet Domain Space *Each DNS server has a zone *iterative: name resolution process where you go through name hierarchy *recursive: name resolution process where server itself will invoke other name servers

Caching Considerations *4 places *write through *delayed-writes (write behind) *read-ahead (prefetch) *write on close *centralized control

4 Places: Server's Disk, Server's buffer cache, Clients buffer cache, Clients disk write-through: all accesses require checking with server delayed-writes: semantics ambiguous read-ahead: request chunks of data before its needed write on close: session semantics centralized control: stateful system with signaling traffic

Why does Paxos use a Leader? What is a quorum?

A leader is used to ensure that there are unique, icnreasing sequence numbers (Progress is only guaranteed with a Leader) The quorum is a majority of live acceptors

Lease vs. a Lock What is Hierarchical Leasing?

A lease is a lock with a timeout. Leases are used because if a system locked a resource then died, that resource would be locked indefinitely. Hierarchical Leasing Top-level lease (coarse grained AKA Big and Long) Coordinator holds this lease and dishes out small short sub leases (fine grain) to processes that need resources.

How does Chandy-Misra-Haas deadlock detection work?

A process sends a probe message to a process that is holding a resource (prior to waiting for the resource). The receiving process forwards the probe to every process that contains resources it is waiting for (edge chasing). If original process receives its own probe message then it knows theres a dependency cycle, and will not wait for that resource since deadlock would occur.

Distributed Transactions model "ACID"

Atomic: *transaction happens as a single nondivisible action *entire transaction succeeds or else its rolled back Consistent: *transaction cannot leave data in an inconsistent state *invariants must hold Isolated (Serializable): *transactions can not interfere with each other *if two are executed at the same time they must be able to get there in some serial order Durable: *once a transaction commits, the results are permanent

BigTable

Bigtable is a table indexed by rowkey, column name, and time stamp *Column Family: has unique name and contains a list of columns Tablet: subtable containing all columns but a subset of rows *Table Splitting: as a tablet grows it can split into multiple tablets *Chubby is used to ensure only 1 master, store bootstrap location of Bigtable data, discover tablet servers, store Bigtable schema info, store access control lists *Eventual Consistency *Master Server assigns tablets to tablet servers, balances tablet server load, garbage collection, schema changes

Brewer's CAP Theorem What is BASE

Consistency, Availability, Partition Tolerance ^CHOOSE TWO BASE Eventually consistent, system. (Doesn't have ACID consistency)

AFS *Goals *caching *service model, semantics *cache coherence (callbacks)

Goal: Support information sharing on a large scale (10K+ clients) *most files small, reads more common, files usually accessed by 1 user at a time *whole-file caching and long-term caching (entire file is cached, client writes file to server if modified) *upload/download model *session semantics *when client downloads file, server makes callback promise. when file changes, server notifies and invalidates all clients in the call back list

DFS (AFS v3) *goals *token mechanism

Goal: avoid the unpredictable lost data problems of session semantics if multiple clients are modifying the same file Tokens: permission given by the server to the client to perform certain operations on a file and cache's file data. Tokens give the server control of who is doing what to a file

GFS

Goal: store huge files across thousands of machines *chunk-servers hold 64MB chunks of data *chunk-servers are replicated *master is a fast and reliable machine that manages metadata. maps requests to chunk-servers *chunk handle: 64 bit unique number *operation log: name-to-chunk map stored on disk *Large chunks: reduces need for frequent communication with master. dataflow: primary replica -> secondary replica -> secondary -> ... control flow: primary to all 3 replicas at same time

Dropbox *goal

Goal: synchronize part of user's file system to remote servers. Propagate changes back to any devices linked to user's account. Server Load traffic: about 1:1 read tow rite ratio Metadata Server: has a database of metadata Notification Server was added to send clients updates instead of clients polling for updates

SMB / CIFS *goals SMB2

Goals: *connection-oriented, stateful file system. *Priority on file consistency and file locking rather than client caching and performance RPC-like and used oplocks (tells client how to cache data) Dfs: adds consistent naming SMB2 pipelining, credit-based flow control, compounding

NFS goals of automounter UDP vs. TCP Directory and file access protocol inconsistencies with aching / validation user level lock manager NFSv4 features

NFS *stateless, RPC-based, remote access file server. *No need for open/close procedures *Client reads a 8KB at a time and performs read-ahead *Has ambigous semantics because server has no idea what client has cached. *uses Validation: client compares modification times from server requests to the data that is cached (only if there are file operations done to the server). otherwise invalidates blocks after a few seconds *Lock Manager was added to allow file locking *Automounter mounts remote directories only when they are first accessed. (mount/unmount according to client demand) *uses UDP. Any machine can be client / server

Remote Access vs. Upload/Download

Remote Access *file service provides functional interface (Create, delete, read bytes, write bytes, etc) *Advantages: client only get whats needed. Server manages coherent view of file system *Disadvantage: possible server / network congestion Upload/Download Read: copy file from server to client Write: copy file from client to server Advantage: Simple! Disadvantage: Wasteful, Problematic, Bad Consistency

EXPLAIN: Strict Two Phase Locking What does it solve from two phase locking?

Same as two phase locking,except phase 2 releases all locks at the same time co Cascading aborts do not occur.

What is a schedule? What does a Lock Manager do?

Schedule *valid order of interleaving transactions Lock Manager: allows a transaction to get exclusive locks

Sequential vs. Session vs. Ambiguous Semantics

Sequential *reads return the result of last write *easily achieved if one server and no cache Session *changes to an open file are only visible to process that modified it *lock file under modification of other clients *last process to modify wins

Stateful vs. Stateless

Stateful *server maintains stale Advantages: *shorter requests *better performance processing requests *cache coherence is possible *file locking is possible Stateless *server maintains no info on client accesses Advantages: *server can crash and recover *client can crash and recover *no open/close needed *no server space used fore storing state Disadvantage: *problems if file is deleted on server *no file locking

EXPLAIN Two Phase Locking Why is it used? What are cascading aborts?

Two Phase Locking *Phase 1: Growing phase: acquire locks *Phase 2: Shrinking phase: release locks *used to maintain serializability *Cascading aborts: occurs when a transaction has released some locks and then aborts, because the released locks many have modified data from a transaction that never committed. Any transaction using this data (and any that depend on them) must abort.

4 Consensus Algorithm Goals

Validity: outcome must be one the proposed values Uniform Agreement: no two processes may agree on different values Integrity: process must ultimately agree on a single value Progress: algorithm must eventually terminate such that every process will decide on the same value

*Virtual Synchrony *State Machine replication *Active-Active, Active-Passive *Process groups *fail-silent *byzantine

Virtual Synchrony: allows multicasting a message to a process group Replication: make replicas for higher availability and scalability Active-Active: all components work and accept requests Active-Passive: 1 master, replicas there for fail-silent: process no communication byzantine: process sends erroneous messages

*Group View *View Change *GMS *State transfer *Stable message *flush

group view: set of processes in a given process group view change: change of membership GMS: keeps track of which processes are in the group state transfer: happens when process joins group message becomes stable when every process confirms they received flush: sends all unstable message to get acknowledge when a view change wants to occur

Map Reduce

master assigns map tasks to all workers, worker uses map function. Map function emits (key, value) pair. After Master sees all workers are complete, it sorts the (key, value) by key, and calls one reduce per key. shard: each chunk of the input

EXPLAIN Separating reads and writes into write locks and read locks EXPLAIN Two-Version Locking

read lock -> read lock -> object <-write lock waits... read lock -> write lock -> object <- read locks wait... Two Version Locking Reads can lock until a commit lock A write locks a copy of a file, then commit locks to replace the real copy.

What is wait-die prevention? wound-wait prevention?

wait-die: older process wants resource, waits for if younger process has younger process wants resource, kills itself if older process has wound-wait: older process kills younger process or younger waits for older

write-ahead log transaction manager

write-ahead log used during transactions to: *enable rollback if the transaction aborts *maintains the state of the transaction in a stable place so a computer can recover if it dies transaction manager *a process responsible for running a consensus algorithm to decide whether to commit or abort its subtransaction *runs on each system in group, one is elected coordinator


संबंधित स्टडी सेट्स

Chapter 3 - External Analysis: Industry Structure

View Set

Physics Chapter 3 Force and Motion & Chapter 4 Work and Energy

View Set

Chapter exam 1 & 2 life policies & provision

View Set

AP Computer Science: Worksheet Overviews

View Set

Chapter 23: Conditions Occurring After Birth

View Set

Chapter 7: Voting, Elections, and Political Participation

View Set

Disorders of Special Sensory Function-Chapter 38

View Set