Distributed Systems Exam 3

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is the golden rule of recoverability?

"Never modify the only copy" transaction makes changes to local copy of resource until a commit or a rollback

Describe the AFS architecture

(client caches files on local system) - client requests to access a specific file - cache is checked first, request made to server if necessary - server sends copy back to client for modifications - changes are ONLY applied to client until the file is closed - client copy is copied back to the server with new updates - client also retains copy in cache for future updates Amdahl's Law: MAKE THE COMMON CASE FAST implements session semantics

How is concurrency control implemented in NFS?

(clients require locks on files) - client informs the server of intent to lock - if available, the server grants the lock to the client with lease - if client dies while holding the lock, the lease expires - client may renew lease before the old lease expires

Name the messaging modes we discussed in class

- PubSub: 1 --> many - Point to point: 1 --> 1 Both of these are considered MOM (message oriented middleware)

What is an RDD?

- Resilient Distributed Dataset - fundamental data structure in Spark - immutable --> we do not modify, we create new ones

Explain Network Time Protocol

- allows users to synch with UTC (coordinated universal time) over the internet - atomic clocks are at the top of the graph - time servers (like CMU clock) synch with atomic clocks - our devices synch with time servers - think of this as a hierarchy

How does a client read a file sequentially in GFS?

- client computes the chunk index based on the byte offset of where it wants to start reading the file - client calls master server with file name and chunk index - master server returns chunk ID and location of replicas - client calls chunk server directly and receives the chunk (no caching in this sequence)

Explain the 'reduce' stage of MapReduce

- combines intermediate values into 1+ final values for each key - can have multiple reduce stages in a pipeline

Name some issues with microservice architectures

- data replication becomes more difficult - system becomes more complex --> each new service requires connections to other services - debugging can become more difficult - performance hit when we switch from network communication instead of memory-based communication

What are two types of decoupling that make messaging systems more resilient?

- decoupled in space: sender does not know identity of receivers, and vice versa - decoupled in time: system can store messages until they can be delivered successfully

Explain how GFS splits and organizes files

- each file is mapped to set of chunks (each 128Mb) - each cluster has one master and hundreds of chunk servers - each chunk is replicated on multiple chunk servers - master has metadata and knows locations of chunk replicas - chunk servers know what replicas they have

What are benefits of a microservice when compared to monoliths?

- easier to replicate parts and scale - adding new features does not impact existing system - more equipped to handle failures since we can replicate more effectively

Apache Spark

- faster and more flexible version of MapReduce - uses RDDs

What parts does every S3 object have?

- key - data - user metadata (tags, etc.) - system metadata (time of creation, etc.) - storage class --> pay more for faster access (DUSKS)

How are changes allowed and propagated in GFS?

- master server grants permission to a process (lease) - if available, master provides locations of chunks - program accesses these chunkservers directly - modifying chunkserver (always primary chunk holder) propagates any changes to servers with backup copies - no changes saved until all chunkservers acknowledge (atomic)

Explain the 'shuffle and sort' phase of MapReduce

- responsible for redistributing the map output across the reducers - ensure that all values associated with a particular key end up on the same reducer

What are some requirements of GFS?

- run reliably with daily component failures - Google issue: not massive # of files, but file sizes are huge - write once, append / read many - long reads/appends dominate access --> no caching needed - throughput more important than latency

Explain the 'map' stage of MapReduce

- takes records from source as key/value pairs - mapper produces 1+ intermediate values with an output key - can have multiple map stages in a pipeline

Name some issues with monolithic architectures

- to scale, we have to replicate the entire architecture - adding new features/updates impacts the entire service - tied to one coupled installation

What is Java Messaging Service (JMS)?

- widely used abstraction API for interacting with different MOM systems (JMS IS NOT MOM) - client-facing service

What is a system in the context of service architectures?

A collection of operating microservices

Explain the concept of a consistency model. Provide some examples.

A contract between processes and a data store If the processes agree to obey certain rules, the store promises to work correctly. Examples: Strict, sequential, eventual

What parts of the CAP Theorem does S3 uphold

AP S3 is available and tolerates partitions, but it is only consistent eventually

What is NTP accurate enough for? What is it not accurate enough for?

Accurate enough to keep logs on personal machine Not accurate enough to keep logs across distributed systems with multiple machines

How do we add instrumentation to microservice architectures?

Add timers to microservices to log performance speeds Our primary concerns are latency and throughput

Atomicity

All or nothing, no 'intermediate' steps We either commit the whole transaction or abort the entire process

In a mobile app, why are long-running tasks run in a background thread? a) It makes that task faster b) It keeps the GUI responsive while the task is still running c) It makes the GUI faster d) It's just a programming convenience and isn't really needed

B

List some common use cases of S3

Backup storage and archival storage Replicating objects across regions for performance and fault tolerance Data for static websites Source and destination of data for applications running on EC2

In a mobile app, which of the following is not a way for GUI components to be defined? a) Programmatically -- usually in the main method b) In a setup file using xml c) By an asynchronous thread d) Interactively, using a GUI editor

C

Session semantics consistency model

Changes are initially visible only to the process that modifies the file Changes become visible when the file is closed

How does a generic distributed file system work?

Consists of client and server computers Client module: interface used by apps --> makes calls to server Server: contains flat file service and directory service, both of these provide an RPC interface for clients to use

Which flat file operation does not pass a UUID

Create()

What storage systems have weak consistency?

DNS

Define continuous delivery and continuous deployment

Delivery: code changes are automatically tested, software is always in a 'deployable' state Deployment: automatically deploy changes (one step further than delivery)

Consistency

Different meaning than in DB's Data is in a consistent state when a transaction starts and when it ends Ex. Total $$$ in two accounts is the same after money is moved between them

What is a transaction coordinator?

Framework (class, methods, etc) that we build to carry out transactions and maintain isolation Example method Transaction t tid = openTransaction(); a.withdraw(tid, 100); b.deposit(tid, 100); c.withdraw(tid, 200); d.deposit(tid, 200); closeTransaction(tid)

What were the goals of the Andrew File System

Goals 1) SCALABILITY 2) Reduce client-server interactions using client caches

What were the goals of Sun NFS (Network File System)

Goals 1) appear like a UNIX file system 2) Implement a POSIX API (standards for UNIX-like systems) 3) Files available from any machine

What is two-phase locking?

Growing phase: acquire all locks that are needed for transaction Shrinking phase: release all locks Once any lock is released, no new locks may be acquired

How are Google File System and Hadoop related?

Hadoop is an open source implementation of GFS

Explain how serial transactions can produce different answers

If both transactions operate on the same data, order of operations can become relevant Must accept both answers as correct

What is the central idea behind Hadoop and GFS

Instead of moving data to the code, we move code to data With VMs, one machine can act as many With Hadoop/GFS, it is opposite --> many act as one

How is concurrency control implemented in AFS

It isn't... No support for large shared databases or updating files that have multiple replicas

What type of evaluation do RDD's use?

LAZY EVALUATION - the execution does not start until an action is triggered (i.e. reduceByKey, etc.)

What timekeeping approach have we adopted for dist. systems?

Lamport Clocks (Logical Clocks)

What is the most popular way to achieve transaction isolation?

Locking

List directory service operations

Lookup(directory, name) --> fileID AddName(directory, name, fileID) UnName(dirName) GetNames(dir, pattern) --> nameSeq

What is the primary aim of a server that supports transactions?

Maximize concurrency Data consistency is more important than transaction speed

What are the main components needed to use JMS?

Messaging clients: produce and consume messages Message destinations: queues/topics to send and receive messages JMS-Compatible MOM

What does "smart endpoints and dumb pipes" mean?

Microservices (endpoints) are smart because they contain all the business logic Communication (pipes) are dumb because message contents are kept simple; the concern is transferring the data, not trying to understand it

What should we never do when correcting clocks?

NEVER SET THE CLOCK BACKWARDS - we will end up w/ duplicate timestamps for events

Describe the NFS architecture

NFS uses RPC over TCP or UDP NFS is a virtual file system --> the NFS server is receiving requests from the NFS client and converting them to RPC calls to access the UNIX file system Directories are distributed Remote mount: a file system in one system can also exist in the hierarchy of the file system on another system

What is the CAP Theorem argument?

Nodes A and B are partitioned - We write x to B and then attempt to read x from A A is either unavailable or not consistent ... we must choose We know that partitions happen (P), so we must choose AP or CP

What is serial transaction execution?

One transaction runs to completion before the other transaction begins. Then the second transaction runs to completion

Two-phase commit

Phase 1: voting --> coordinator sends message to all participating nodes to verify whether each node can commit the transaction, nodes send back 'yes' or 'no' Phase 2: commit --> coordinator sends 'commit' or 'abort' message to nodes based on results of voting phase

In JMS, are message production and consumption synch or asynch?

Production: always synchronous Consumption: asynch --> register as listener on queue or topic synch --> read and block until message is available (or timeout)

Explain how a cache is utilized with strict consistency

Programs cannot observe any differences between cached copies and stored data after an update Every process works with the same cache

Who decides the number of mappers and reducers?

The program

Sequential consistency

The result of any execution is the same as if the (read and write) operations of all processes were executed in some sequential order, and the operations of each individual process appear in this sequence in the order specified by its program

What is a serial equivalent interleaving?

The result of the interleaved transactions is the same as if the transactions were serial. Some interleavings are not serially equivalent

What is the primary purpose of the directory service?

Translate text names to UFIDs The directory service is a client of the flat file service

Isolation

Two transactions do not interfere with each other

How are non-distributed file systems typically organized? Why?

Typically layered - each layer depends only on layer below it SEPARATION OF CONCERNS!

What file system(s) guarantee(s) strict. consistency?

UNIX NFS and AFS only approximate this S3 makes no real attempt to approximate this

Define Venus and Vice

Venus: client-side software used to implement AFS, responsible for caching files on client system and making calls to AFS if required Vice: another name for AFS servers, responsible for managing file system and access controls

Eventual consistency

Weak consistency Eventually, all of the copies of a data store will return the same value

What are the four requirements of deadlock?

1) Mutual exclusion: resources need mutual exclusion. They are not thread safe 2) Reservation: resources may be reserved while a process is waiting for more 3) You cannot force an object to give up a resources 4) Circular wait is possible

What are three solutions for deadlocking?

1) Prevention -- disallow one of the four requirements 2) avoidance -- study what is required before beginning 3) detection -- use timeouts or wait-for graphs

What are four properties of a microservice architecture?

1) Small (as measured in complexity) 2) Focused: low coupling, separation of concerns 3) Autonomous: separate process or service 4) Well-defined API: easy for other services to interact with it

How are flat file service operations performed in general DFS?

1) The client module makes a call on one of the operations 2) The directory service receives the call 3) Directory service returns unique file ID 4) client sends request to flat file service using file ID 5) flat file service returns data or status

Name and describe two common distributed file system models

1) remote access model: the client sends requests to access a file on the server, but the file itself never leaves the server 2) upload/download model: client sends request to access a file, and the file is downloaded to the client for editing --> it is uploaded back to the server upon completion

What are the main tasks that Hadoop/GFS handle?

1) storing files 2) running applications on top of files

Can a module be both a queue listener and a queue writer? a) No, they're the same b) Yes, because every queue needs both c) No, the JMS protocol only allows one or the other d) Yes, that's how the data is transferred from one queue to another e) No, that's only possible with topics

d

What is the difference between JMS and Message Oriented Middleware (MOM)? a) None, they're the same b) MOM provides an interface to JMS c) JMS provides Point-to-Point queues and topics, but MOMs only provide topics d) JMS only interacts with Servlets e) JMS provides an interface to MOM

e

What is the highest level of computation in MapReduce

job

What are interleaved transactions?

The individual actions of the transactions are mixed, but the program order remains a b c d w x y z Interleaving: a b w x y c d z

What is an inconsistency window?

The period between a data update and the moment that all replicas have the updated value

Strict one copy

A read after a write always gets the value that was just written

Names some valid JMS message types

Text Object Map Bytes Stream (TOMBS)

What happens when the server is stateless in a distributed file system?

Each request by the client must have all information needed to perform the job May have to authenticate and authorize for every request

What kind of consistency does S3 maintain?

Eventual consistency If you PUT a new object, subsequent reads will return the object If you overwrite with a PUT, the change will be reflected eventually If you DELETE, it will eventually be removed

Strict consistency

Every read on data item x returns a value corresponding to the result of the most recent write. NOT POSSIBLE in a distributed system due to message latency

Durable

The commit causes a permanent change to stable storage We can recover from crashes, probably using some sort of log-based recovery algorithm

What kind of storage is S3? How is data stored and accessed?

Remote object storage Data is stored as objects, which have no defined format Data is accessed using REST - PUT, GET, DELETE

What consistency model to transactions typically utilize?

Sequential consistency

Why can't a distributed system have a global clock?

Skew: two clocks will have two different times Drift: clocks vary in speed

What is a challenge of end-to-end instrumentation w/ microservices?

System clocks are not exactly synched

What is the highest level of computation in spark

application


Ensembles d'études connexes

PSY 200 Final Exam Question Bank

View Set

Management - Chapters 4, 5, 6 Quiz

View Set

Scarcity, Choice, and Economic System

View Set

Match the term and definition (Prelab Exercise 5)

View Set

chapter 5 - marketing management

View Set

Unit 5 (chapter 22) History of Graphic Design

View Set