Designing Data Intensive Apps (Part 2: Replication, Partitions, Transactions, Issues of Distributed Systems)

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Distributed System Models for Failures

1) Crash-Stop Faults --> Nodes can only fail in 1 way (crashing); Nodes can suddenly stop at any moment & be gone forever 2) Crash-Recovery Faults --> Nodes can crash, but then start responding again at some later time. Nodes use stable storage that is preserved across crashes & In-memory state is assumed to be lost 3) Byzantine (Arbitrary) Faults --> Nodes can & will do literally anything; The world is crazy in this 1... high levels of randomness & unexpected behavior

Byzantine Fault Tolerance Vulnerabilities

1) Doesn't protect against Software bugs --> Best approach for bugs is to Fix them / prevent them from occurring in code 2) Doesn't protect against security issues & malicious attacks --> in most systems, if an attacker can compromise one node, they can probably compromise all of them, because they are probably running the same software. For cyber security, it's best to use traditional tools / mechanisms (i.e. authentication, access control, encryption, firewalls, and so on)

Pitfalls of Leader-Based Systems using Async Replication

(1) Discarding Writes due to Crashes If async replication is used, the old leader's write data may not have been processed before it crashed. It's possible the data is already outdated or causes merge conflicts, so it's usually best to just discard this data. However, this may violate client's durability expectations. (2) Split Brain

Typical Real World Distributed System Model is...

1) Partially Synchronous Network AND 2) Crash-Recovery

Ways of Optimizing Garbage Collection

1) Scheduling GC --> schedule it ahead of time for a node like a planned node outage, so other nodes know to take up the added load while the node does GC 2) Only use short-lived objects --> short-lived objects are faster to clean up 3) Restart Processes before Objects live too long --> Just restart the process & dump everything b/c restarting is faster than doing GC sometimes. If we do this as a rolling restart, then no outage / pause will occur

Models for Update Handling in Distributed Systems

3 main methods for handling updates across all nodes for pretty much all distributed database systems: (1) Single-leader (2) Multi-leader (3) Leaderless

Happens-Before Relationship

3 possible relationships between 2 Operations: (1) They're concurrent (2) A happens before B (3) B happens before A Algorithm for capturing happens-before relationships: (1) server keeps a version # for every key --> Version # is incremented with each write to a key & is stored along with the new data (2) when client reads a key, server returns all data that hasn't been overwritten + the version #. Clients must always read a key before writing. (3) When client writes, it must note the version # from the prior read & merge together all values from the previous read. (4) When server receives a write with a certain version #, it can overwrite all values with that version # or below (since we know it's forward-compatible with the new value) If we make a write w/o a version #, it's concurrent w/ all other writes so it won't overwrite anything, and it'll just be returned as 1 of the values on subsequent reads

Leases

= A way for node per DB or per partition to make itself the leader & ensure it's the only one writing at a time to a resource Similar to a lock with a timeout Leases are passed around different nodes to set new leaders as needed after the TIMEOUT is reached Leaders can request a refresh if desired to reset the TIMEOUT If there's any time sync issues or delays, it's possible the node fails to renew & loses the lease

Best-effort Basis

= It'll do what it can, but if there's any failures, then it's up to the APP to recover from errors Leaves the onus on the APP to handle errors in these instances & clean up the DB data is needed if desired to resolve partial writes

Tripwire Locks - Detecting Writes that Affect Prior Read (& how they prevent Invalid Premises)

= Lock that simply notifies the transactions that the data they read may no longer be up to date. When a transaction writes to the database, it must look in the indexes for any other transactions that have recently read the affected data, so we know that there's a chance the data's been updated since it was originally read by the current transaction Allows transactions to warn each other when one of them may be stale, so the stale transaction can then be aborted while the up-to-date one(s) can commit successfully

Network Congestion & Queueing

= Queues that make it hard to know exactly how long it'll take for a network request to get processed & ACK'd Some Queues: TCP Queues --> Q for network packets to be sent OS Process Queues --> Q of processes for OS to run VM Queues -->Q for VM's to take control of a shared CPU & begin running their processes Best way to determine network delay timeouts is by experimentally measuring how long tasks usually take in the real world across multiple machines

Coordination Services

= Services that store metadata to track cluster assignments for use in coordination / access of resources i.e. ZooKeeper

Monotonic Reads

= TLDR: a guarantee that you won't see time go backwards = If a process reads the value of a data item x, any successive read operation on x by that process will always return that same value or a more recent value. 1 way of guaranteeing this is by having the user always read from the same replica. Different users can read from different replicas. The replica can be chosen based on a hash of the User ID instead of randomly.

Network Partitions / Net Splits

= When 1 part of a network gets cut off from another part & they each become their own networks This is usually a failure of the network & not something that's deliberately done I.e. cables connecting U.S. networks with Europe are cut, resulting in European Networks & U.S. networks being their own isolated systems You must know how your software behaves to various networks faults & such

Deadlocks in 2PL

= When a transaction requires the lock, but it's not available, so it just waits until the lock is available If deadlock occurs, it's best to just abort the transaction & retry later so the DB can proceed with other transactions (remember, the DB is running things serially, so any blocked transaction will stop the entire DB from processing things)

Shared Nothing Architectures

= When all machines share nothing in terms of hardware Each machine / VM running the DB Software is called a node All coordination between nodes is done via network calls Pros: (1) Decoupling of machines (2) Upgrades can be machine specific to optimize the cost-performance ratio per use case (3) Cloud deployments make horizontal scaling easier Cons: (1) Tons of additional complexity to beware of b/c data is distributed across multiple nodes, there's way more constraints & trade-offs to beware of within these distributed systems

Partial System Failures

= When only a part of a distributed System fails. This is usually non-deterministic & extremely hard to troubleshoot Distributed systems must be able to react accordingly to remedy the failure or restart appropriately OR we must at least try our best to cope with partial failures when they appear. Single Node/Machines failures always take out the entire node if a partial failure occur (Total Failure), but Distributed Systems will continue to run even if a partial failure occurs within part of the system

Synchronous replication

= When replication is guaranteed to happen in specific orders since the leader waits until a follower ACK's the update before the leader moves onto the next follower Pros: Guarantees data goes to every node (eventually) b/c we wait for an ACK per follower before moving onto the next 1, we know for sure all followers will eventually get the data. However, this isn't necessarily good as it means if there's major delays in a few followers in the line, then all other followers don't have up-to-date data for that duration of time. Cons: Performance bottleneck potential varies greatly b/c it's dependent on turnaround time for every follower

Hinted Handoff

= When the actual nodes within the set of n are back up, then the nodes that temporarily accepted writes on behalf of this subset of n nodes then writes to the designated n nodes (basically look at the analogy above; like crashing in a neighbor's house til I can get back into mine)

System Models

= When we make assumptions about the system's behavior & then design the system so it meets those assumptions We can play with this idea as having a desired state then doing anything we can to realize that system state By using this approach, algorithms can be made reliable in the overall system even if the actual underlying systems aren't very reliable

Multi-Leader Replication Topology

= a description of the different communication paths between nodes / the layout of how nodes talk to each other 3 Main Topologies: Star → a central node receives all writes & propagates them to all other nodes Con: Single node failure can break the system due to low # of node-to-node routes Pro: Very simple to implement well Circular → Writes are passed along in a hot-potato / musical chairs style of message passing Con: Single node failure can break the system due to low # of node-to-node routes Pro: Very simple to implement well Single main node means we have a single source of truth All-to-all → Mesh topology; everyone talks to everyone Con: Complex due to high # of node-to-node routes Suffers from lack of causality (messages arriving in non-deterministic order from multiple nodes may cause data ordering to be a mess) Pro: Fault Tolerant due to high # of node-to-node routes

Quorums for Reading & Writing (w + r > n)

= a formula for determining the min. # of nodes we need to read & write to in order to know if we'll get up-to-date data from our system when reading Formula is: w + r > n W = # of write requests sent to nodes R = # of read requests sent to nodes N = Total # of replicas in the system Usually reads & writes are always sent to all n replicas in parallel. W & r just tell us how many responses to wait for before we can proceed with our lives, confident that data is up to date and we can finish processing things If fewer than the required w or r nodes are available, reads & writes return an error

Transactions

= a group of reads/writes that must all execute successfully together (commit) or else the entire group of actions is aborted & reverted Doesn't allow for partial execution of commands / partial writes Data Concept used to help make it easier to handle errors & reason around data manipulation

Consistent Prefix Reads

= a guarantee that if a sequence of writes happens in a certain order, then anyone reading those writes should read them in the same order as they're written. This is a very important issue for Partitioned DB's. Partitioned DB's write in independent orders so there's no global ordering of writes → suddenly becomes possible to potentially see "into the future" (parts of the DB in an older state & some parts in a newer state) One way of ensuring constant prefix reads in partitioned DB's is to keep all related topics / reads / writes on the same partition. The hard part is in determining what "related" looks like quantifiably.

Log Position Association

= a method where a DB snapshot is associated to an exact position in the leader's replication log, so we know where to start checking for deltas from; the position has various names: In Postgres → the log sequence # In MySQL → the binlog coordinates

Controller Node

= a node that was previously elected as leader just chooses the new leader Crucial for executing failover in leader-based systems

Read Your Own Writes / Read-After-Write Consistency

= a paradigm whereby when you enter a write query, you are guaranteed to always at least see your own write when reading from the DB system This is just a really neat guarantee to ensure that if you reload a page after submitting something, you'll always be able to at least see the changes you submitted, even if the rest of the replicas may still be lagging behind Implementation Depends on the Scenario: Read about each scenario in the notes if needed. (1) When there's Many Writes & very few Reads (2) Frequent Reads --> Possible to implement Time Thresholding to help with this

Fencing Tokens

= a token that's granted with a lock / lease & it is a # that increases each time a lock is granted Requires the server to check it's valid; the client probably won't do this for you Always a good idea for a server to check over critical assumptions; we should never rely on the client to properly send valid operations / data CON --> Doesn't protect against Bad Actors that deliberately send fake fencing tokens

Partition along the primary key AND some additional index

= adding additional data axes to partition along so that we can more effectively spread out & load balance partitions in Key Range Partitioning Systems i.e. both time & location partitioning --> Partition A holds sensor data for Node A at Time X, Partition B holds data for Node B at Time X, etc...

Premise

= an assumption on the data that may or may not be correct due to subsequent updates We need to beware of invalid premises in snapshot isolation b/c data may be outdated by the time we try to write to the DB based on our snapshot 2 major cases where premises may be invalidated: (1) Writes Already Happening → Detecting reads of a stale MVCC (multi-version concurrency control) object version (uncommitted write occurred before the read) (2) Detecting writes that affect prior reads (the write occurs after the read)

Version Vectors & Algorithmic Implementation

= collection of Version #s from all replicas that are sent from the DB replica to clients when values are read. The Version Vector must then be sent back to DB by client when the client writes to the DB --> Allows DB to distinguish between overwrites & concurrent writes Why do we need these? In the case of multiple replicas in leaderless systems, we can't just have a version # per key; we have to have a version # per replica as well as per key. Algorithm: (1) Each replica increments its version # when processing a write (2) Each replica also tries to track the version # from each of the other replicas (3) This Composite Version # indicates which values to overwrite & which values to keep as siblings aka vector clocks (not entirely the same things though b/c when comparing the state of replicas, version vectors are the right data structure to use. (Read more about this in notes)

Clock Smearing

= common way for NTP Servers to perform a leap second gradually over the course of a day rather than all at once

Replication

= copying data to multiple machines connected via a network Reasons to replicate: (1) Increase availability by keeping data geographically closer to users (2) Fault Tolerance → 1 system can fail, but others can kick in and take over (3) Scale → increase # of machines that can service queries Biggest challenges of replication: (1) Memory limitations → how much you can actually hold on a machine (2) Update Handling → Making changes to data & ensuring consistency across all nodes

Stored procedures

= entire body of transaction code submitted to the DB ahead of time by the APP before it's finally; literally all code needed to run the transaction is submitted as a stored procedure Cons: (1) Out of Date Implementations --> Most implementations don't use clean code practices & kinda suck compared to today's standards (2) Harder Maintenance --> DB-run code tends to not be very usable / friendly to developers vs APP code (which can also be more easily version controlled, tested, deployed, measured etc...) (3) Failures are More Costly --> DB's are more performance sensitive than APP's and a single DB Server outage can potentially down multiple APP Servers Pros: (1) Lack of Network use --> No delays here (2) Replication --> Stored procedures can potentially be used across multiple nodes! They must be deterministic in order for this to replicate effectively. This is really interesting & very reminiscent of functional programming determinism

Leader-Follower Partitioning

= extension of the classic leader-follower topology for replication at the node level that goes deeper down to the partition level! we can assign leaders & followers on the partition level now, not just the node level

Election Process in Failover

= getting consensus among all nodes regarding which follower should become the new leader Crucial for executing failover in leader-based systems

Durability

= idea of how long & effectively data persists in the DB The job of a DB is to store data for future writes/reads so it needs to do a good job of ensuring inputs/changes aren't forgotten due to hardware failures or DB Crashes

Isolation

= idea that DB actions don't clash & cause race conditions due to concurrency issues; idea that transactions are independent of one another & will not cause race conditions to occur Typically, DB's guarantee levels of weak isolation rather than true isolation (serializability) for concurrent operations b/c serializability's performance cost is brutal & not worth it.

Snapshot Isolation & Repeatable Reads (Definition, Pros, Cons, Implementation)

= idea where each transaction reads from a consistent snapshot of a DB → A means of creating even greater isolation between Transactions by providing each transaction their own DB Snapshot to reference. Basically an expanded version of Read Committed Isolation. (1) Each transaction gets its own snapshot (2) Even if data is changed by another future transaction, the original transaction just sees w/e is in its own snapshot Pros: (1) Accounts for Read Skew / non-repeatable Reads → Great for long-running queries / operations Cons: (1) More complex to implement b/c we create a snapshot of the entire DB per transaction (2) Doesn't handle some of the harder race conditions (write skews & phantoms) Implementation: Writes never block reads & reads never block writes!! Writes need locks. Read committed isolation creates a snapshot of just the particular object we're updating, but in Snapshot Isolation, we take a snapshot of the entire DB

Predicate Locks

= locks on all objects that match some sort of search condition We'd have to explicitly search through all locks on object to find these matching predicate locks; they're not readily apparent w/o searching through all locks When 2PL includes predicate locks, the DB prevents all forms of write skew and other race conditions; The isolation level truly becomes serializable. can be applied to objects that don't even exist yet in the DB, but which may be added in the future (phantoms). Thus, we're able to prevent phantoms & write skews using these types of locks

Clock Drift

= natural fluctuation / drifting of the computer's clock time over a long period due to the varying frequencies of atomic clocks

Atomic Write Operations

= operations that either all complete or are all aborted together entirely Many DB's provide atomic update operations by default, and it's usually the best choice if a DB provides it Pros: (1) Best choice with highest guarantee of handling concurrent write issues (2) Many DB's support this by default Cons: (1) Supportability → Not all writes support atomic operations (2) Abstraction → atomic locking is implemented by the DB so a lot of the logic is abstracted away from us (both a pro & a con) (3) Performance Cost → All operations happen in series Implementation → Place an exclusive lock on the object when it's read so no other transactions can read it until the update is applied

Service Discovery

= process where frameworks allows clients to find out which IP Address & Port # it can find a particular service

Gossip Protocol

= protocol whereby a request can be sent to any node & that node forwards it to the proper node for the requested partition Adds more complexity on the DB nodes but removes dependency on external coordination services like ZooKeeper

Rebalancing Partitions

= redistributing data among partitions / reorganizing partitions so that the data sets & queries are evenly distributed across every node in the system Typical Requirements when Rebalancing: (1) load (data storage, read & write requests) should be fairly equal between all nodes after rebalancing (2) While rebalancing occurs, DB shouldn't be blocked (can continue reading/writing data) (3) Only move the min. amount of data needed for effective rebalancing in order to minimize Disk I/O + Network Load

Vertical Scaling

= scaling by improving / increasing processes / sub-systems within an individual system

Compound Primary Keys

= several columns used together to create a sort of multi-value /concatenated key Only the first column / part of the compound key is hashed while the rest are stored as a concatenated index for sorting data Can't do range searches on the 1st column since it's hashed, but if we specify a hardcoded value for the first column then query for the rest of the key, then we can effectively scan for a range of values! Enables powerful one-to-many relationship

Index-Range Locks

= simplified approximation of predicate locking Rather than searching for the perfect match, we just look for the approximate conditions! Key Constraint: We must ensure that the original predicate is definitely within the approximations we make here Key Trade-off vs Predicate Locks: Sacrifice specificity for the sake of performance (assuming the ranges aren't too large & hit too often)

Term-Based Partitioning

= strategy that uses a global index that covers data in all partitions & the global indexes are stored across every node after being partitioned Pro: (1) Easier / faster reads --> Every node holds the index, so any node can service read requests Con: (1) Harder writes --> basically have to update every replica of the global index

Replication Lag

= the delay time before all replicas are updated with the latest data from the leader 3 Major Issues Related to Replication lag: (1) Read Your Own Writes (2) Monotonic Reads (3) Consistent Prefix Reads 2 Solutions for Replication lag: (1) Eventual Consistency (2) Transactional Processing

Safety Guarantees

= the guarantee from the DB that it'll properly handle errors/ failures for transactions if the transaction fails Allows App to ignore certain concurrency issues or error scenarios

Deadlines & hard real-time sytems

= time range within which a process / APP must respond Hard real-time systems = systems where catastrophic failure can occur if deadlines aren't met I.e. nuclear meltdown, army weaponry, etc...

Chain Replication

= variant of sync replication where the servers are linearly ordered to form a chain with the primary (leader) DB at the end. Read requests are sent to the tail (primary) of the chain as in normal primary-backup systems, and all write requests are sent to the head (a backup), which then passes the update along the chain. To avoid unnecessary work, only the result of the write is passed down the chain. Strong consistency naturally follows, because all requests are ordered by the primary at the tail of the chain.

Network Jitter

= variation in delay times for network requests to be processed & ACK'd Currently, the best way to determine network delay timeouts is by experimentally measuring how long tasks usually take in the real world across multiple machines However, there are even ways for the system to automatically measure response times & variability (jitter) & adjust timeouts accordingly. For example: Phi Accrual Failure Detectors, which are used in Akka & Cassandra

Transactional Processing

= way for a DB to provide stronger guarantees so apps can be simpler Single-node transactions have existed for a LONG time, but many distributed systems have abandoned them Distributed systems think: (1) they're too expensive in performance & availability (2) eventual consistency is inevitable in a scalable system, so they just trust the system will catch up These 2 points are somewhat correct, but there's a lot of nuances here that we'll dive deeper into in Ch's 7 & 9

Causal Dependency

= when 1 process builds upon a previous process If processes are causally dependent, then they can't be concurrent I.e. if Operation B uses some value from Operation A, then Operation B is causally dependent on A

Split Brain

= when 2 nodes believe they're the leader b/c 1 crashed & recovered but still thinks it's the leader after a follower was promoted Deadlocks & race conditions can easily occur here if both leaders try to write using the same resources... OOF Must have a mechanism in place to check for more than 1 leader in leader-based systems & shut down any extra leaders if there's more than 1

Explicit Locking

= when APP provide explicit locks b/c the DB doesn't do it for you; Major diff with atomic locking is that the APP handles explicit locking instead of the DB Pros: (1) More control & versatility than atomic locking Cons: (1) Complexity → More complex to implement b/c you need to explicitly handle it now; Easy to forget a necessary lock in the code & accidentally introduce a race condition if we're not super careful (2) Performance Cost → All operations happen in series

Statement Based Replication + Pitfalls

= when Leader logs every write request as a statement that it executes; It then sends the statement log to followers For a relational DB, this means all INSERT, UPDATE, and DELETE statements are forwarded to followers & each follower parses & executes the SQL statement like it was received from a client Pitfalls: (1) Non-determinism (i.e. when functions such as RANDOM or NOW are used, these are non-deterministic, so they can cause issues) (2) Dependencies (i.e. ENV based dependencies, ordering of operations /resources, etc... there's more info in notes) (3) Side Effects (i.e. network requests, user-defined functions, stored procedures, triggers, etc...) Statement based replication generally pales in comparison to other techniques due to these pitfalls

Thrashing

= when OS spends way more time doing swapping / being blocked rather than doing actual work Considered a major kind of Process Pause

Anti-Entropy Process

= when a background auditing process checks the data in replicas & updates replicas that are missing data Unlike replication logs in leader-based system, the anti-entropy process doesn't copy writes in any particular order May be significant delay before the data is copied from 1 replica to another

Read Repair

= when a client makes reads from many nodes in parallel, checks the version of the returned data & then writes to replicas that have out of date data to update it to the latest values Works well for data that's frequently read b/c this only occurs when the data is read

Dirty Reads

= when a transaction reads data from a DB that isn't committed yet In the instance of dirty reads, 1 transaction may be running & have gone through a few write transactions (but not all), so the transaction isn't fully committed yet If a 2nd transaction starts, if it's able to read some of the written data from the 1st partially finished transaction, then this is a dirty read

Lost updates

= when a write transaction is overwritten by another transaction & it seems like it never happened Can happen if 2 transactions try to read-modify-and-write to the same resources The later transaction clobbers the prior one! Methods to Resolve this: (1) Atomic Write Operations (2) Explicit Locking (3) Compare & Set (4) Conflict Detection & Resolution/Retry (5) Automatically Detect Lost Updates

Shared Memory Architectures

= when all memory, CPU's, disks / resources are shared under a single OS / machine Con: (1) Cost vs Performance ratio doesn't scale linearly As we throw more $$ at upgrades, the performance improvements grow way slower than the cost per upgrade. (2) Limited fault tolerance since it's still a single machine in a single geographic location

Converge Towards a Consistent State

= when all replicas arrive at the same final value when all changes have been replicated In multi-leader DB's writes occur in random orders due to their async nature. As a result of this asynchronicity, it's difficult to ensure all replicas eventually arrive at the same state (have the same data & in the same data order) Ways to achieve Consistency: (1) Give each write a Unique ID --> Implies data loss (2) Give each Replica a Unique ID --> Implies data loss (3) Merge the Writes Together (4) Prompt the User Using App Logic

Clobbering

= when data from a transaction is overwritten & lost b/c another transaction overwrote it & made it appear like it never happened a result of read-modify-write cycles to the same resources w/ no proper concurrency control. causes Lost Updates

Artificial Splitting of Hashed keys for Better Load Balancing

= when hash key(s) are super popular & get called often, we may want to further break this hash key up using an artificial 1-to-many mapping Literally, we take that super popular key, break it up into multiple keys, then partition using these newly generated additional keys Trade-off: Adding additional keys means we get better spread of keys, BUT this also means reads have to do more work: 1) Have to read all the artificially created keys & merge them together to the original key 2) Need additional bookkeeping to track the artificial keys properly

Semi-Synchronous Replication

= when just a few nodes are sync. Updated and the rest are async updated In practice, you usually never use sync replication; at most you'd probably have 1 or 2 other nodes get updated synchronously by the leader to ensure at least 1 or 2 nodes have up-to-date data always

Asymmetric Fault

= when one node can receive all messages sent to it, but outgoing messages from the node are dropped/delayed Nodes that may actually still be alive might be declared dead b/c it fails to respond due to: 1) Network failures 2) Process Pauses

Shared Disk Architecture

= when several machines share data on a set of disks, but have independent CPUs and RAM Communicate over the network Used for some data warehousing workloads Con: (1) Potential for deadlocks & contention are super limiting factors here!

Trigger-Based Replication

= when the Application code handles DB Replication based on a set of Triggers Triggers = a mechanism that executes custom code when a write occurs in a DB The trigger logs the change into a separate table, which is then read by an external process that can then: (1) apply application logic (2) replicate data changes to another system Cons: (1) Complexity --> Lots of custom code & larger surface area for bugs / edge cases Pros: (1) Customizable & Flexible --> App code can apply logic to DB replication process, which allows for more freedom.

Sloppy Quorums

= when the writes end up on different nodes than the reads, so there's no guaranteed node overlaps Let's say we can't satisfy w + r > n though... then we get an interesting trade-off: (1) Should we return errors to all requests for which we cannot reach a quorum of w or r nodes? (2) Or should we accept writes anyway, and write them to some nodes that are reachable but aren't among the n nodes on which the value usually lives? This latter approach is a sloppy quorum, where we only write to w/e nodes we can reach, even if we don't get the minimum # to meet quorum a way of achieving write durability; As long as any nodes are available for writing, then the DB always accepts writes.

Multi-Object Transactions

= when we edit multiple objects of the DB simultaneously Mostly abandoned by distributed data stores since it's hard to do across partitions & can be a hindrance if high availability / higher performance are required Special Edge Cases exist for when we do want it though (b/c we get better error handling w/ atomicity & concurrency handling w/ isolation): (1) Relational / Graph models with lots of references --> when inserting several records that refer to one another, the foreign keys have to be correct and up to date, or the data becomes nonsensical. (2) In a Doc-based Model where denormalization of data is desired --> this is usually the case when JOINs are desired; In these instances, when we want to update denormalized info, you need to update several docs in one go. Transactions are great to prevent denormalized data from going out of sync (3) In DB's with secondary indexes (basically everything except pure key-value stores) --> Indexes must be updated every time you change a value without transaction isolation, it's possible for a record to appear in one index but not another, because the update to the second index hasn't happened yet.

True Serial Execution (Definition, why it's possible now & Transaction Constraints)

= when we execute transactions one at a time on a single thread; used to be impossible, but now it's actually potentially really useful! 2 Recent developments made it possible to do single-threaded DB's: (1) RAM became cheaper → Possible to run everything in memory now (2) Separation of OLTP & OLAP --> OLTP transactions are fast & have just a few read + writes; OLAP tend to be read only so they can be covered with snapshot isolation outside of the serial execution loop Constraints: (1) Transaction must be small & fast→ long transactions will lead to stalls / idle times due to network delays / query delays (2) All active data must fit in memory → Anything that needs disk I/O will cause big delays (3) Write throughput must be Reasonable → Must be able to be handled on a single CPU OR must be able to be partitioned without needing cross-partition coordination (4) Limits use of Cross Partition Transactions → Possible to have cross-partition transactions, but we must have a hard limit on the extent to which they can be used

Cursor Stability

= when we place an exclusive lock on an object when it's read so no other transactions can read it until the update is applied Reads block other reads & writes.

Rebalancing Using Dynamic Partitions

= when we split / merge partitions based on max/min partition size thresholds respectively Total # of Partitions = (Dataset Size) ÷ (Max Threshold of Partitions) Use Case --> Better to create partitions dynamically if using Key Range partitioning / in these types of situations Pros: (1) Each node can handle multiple partitions (2) We can shift partitions around as needed to keep the system balanced Cons: (1) More complex because of the thresholding --> more merging & splitting (2) At initial DB Creation, we only have 1 partition since no pre-existing data is present, so there's literally no way to hit the max threshold; As a result, only a single node will process data for a while until we eventually hit the threshold & split data among other nodes --> Workaround: Pre-Splitting

Conflicts: Avoidance

= when we try to avoid write conflicts altogether by locking resources based on various dimensions By ensuring people aren't sharing resources, we can avoid race conditions! One example fo this is space partitioning --> Where we can assign users to a specific leader / datacenter & the requests of those users always go to the same node for writing

Logical (Row-Based) Log Replication

= when we use diff log formats for replication vs for storage Physical log = the log used for storage (write-ahead log) Logical log = the log used for replication; Holds a sequence of records that describes DB Writes at the row-level for each CRUD Operation (look at notes for further detail) Pros: (1) Decoupled from Storage Engine --> Easier to do Rolling upgrades & failover compared to WAL Replication (2) Exportable to External Apps --> Easier for external Apps to parse these types of logs; easier to implement Change Data Capture Processes

Commutative Atomic Operations

= when you can apply Atomic Operations in a different order (and even on different replicas), and still get the same result every time

Change Data Capture

= when you export data from DB to: (1) a data warehouse for analytics (2) build custom indexes (3) Create caches

Compare-And-Set

= when you only let an update happen if the value hasn't changed since you last read it, so you can prevent lost updates Query Based (Compare-And-Set) vs Manager Based (Automatic Lost Update Detection) If current value doesn't match what you previously read, then abort the read-modify-write update & retry later Always check if the DB's compare-and-set operator is safe before using it

Read Skew

= when you read data at an unfortunately timed instance such that it returns the improper value b/c an ongoing write transaction has yet to complete Snapshot Isolation resolves this by using Multi-Version Concurrency Control

Rebalancing Using a Fixed # of Partitions

= when you set the total # of partitions at the time of DB Instantiation; this will never change over the life of the DB b/c partitions are the atomic unit of this system (there's no smaller unit that the partition) These partitions are then evenly distributed across nodes Use Case --> best when used with hashed Key partitioning (not so great with key range partitioning) (Size of Partition) = (Total Size of Dataset) ÷ (Total # of Partitions) Pro: (1) Simple to do Con: (1) Vulnerable to data variability --> too much data = partitions get too large & clunky; Too little data = management of so many partitions becomes super expensive. Best to hit a sweet spot where the total data load is medium sized... w/e that means :)

Placing Timestamps on DB Snapshots for Replication

= when you stick a timestamp on the Snapshot & check for data since that time... This is super prone to complications though due to time synchronization... clocks aren't in perfect sync due to network delays & physical constraints, so space (position) based measurements are actually WAY better for things that care a lot about perfect consistency

Phantoms

= where a write in one transaction changes the result of a search query in another transaction A cause for Write Skews to occur. Snapshot isolation avoids phantoms in read-only queries, but in read-write transactions like the examples we discussed, phantoms can lead to particularly tricky cases of write skew.

Multi-Leader Replication

= where all leaders listen to each other for changes (followers to other leaders) & each leader is able to process writes themselves (master to other leaders) Use Cases: (1) Multi-Datacenter Operation (2) Clients with Offline Operations (3) Collaborative Editing Avoid multi-leader replication if you can; it's super hard to get write & really troublesome when issues arise

Quorums

= where voting is done among nodes & a majority vote will lead to the result Majority Quorum = most common kind & it just means that the majority decides what the end result is based on vote; Very safe b/c there can only be 1 majority in a group

Single-Object Transactions

= where we edit 1 object at a time DB's aim to provide atomicity & isolation at the single-object level 100% of the time. This is b/c if we're editing a single row/ column/ doc/ other object of the DB, if part of the DB gets written but another part is gone, we'd have really horrible race conditions & uncertainty. Thus, the DB tries to guarantee atomicity & isolation for single-object operations: Achieve atomicity using logs for crash recovery Achieve isolation using a lock on each object

Hashed Key Partitioning

= where we feed the primary key into a hash function then partition along the hashed value Pros: (1) Better Load distribution because hashing helps add a layer of randomness / evenness to value distribution across the Dataset Cons: (1) Harder Reads --> Need to do parallelized Reads to gather data then serve it (scatter-gather reads) (2) Data is no longer stored in sorted order; range queries are now harder --> Workaround: Use Compound Primary Keys

Rebalancing using Proportional Partitioning

= where you decide on the # of partitions based on the # of nodes you have & some pre-set threshold for how large an individual partition can be In Fixed # Partitioning, the Size of each partition is a proportional function of the size of the data, thus the Total # of partitions is a proportional function of the size of the data Get a system of EQ's to describe system behavior: [EQ 1]: (Total # of Partitions) = (# of Nodes)(Pre-set Fixed # of Partitions Per Node) [EQ 2]: (Size of Partition) = (Total Dataset Size) ÷ (# of Partitions) Rebalancing: Adding a Node: --> Randomly choose a fixed # of existing partitions to split, then new node takes ownership of half of each of the split partitions. The other half stays on the existing partition Unfair Splitting: Can occur b/c the # of partitions split when adding a new node is a randomly chosen # --> If large data sets & randomization are used, we can get Eventual Partition Load Balancing Choosing Partition Boundaries: Randomly done, so it requires that hash-based partitioning be used so fair boundaries can be picked (since hashing guarantees even spreading of data)

ZooKeeper

A Coordination service that can be used to help track partition assignments Clients subscribe to ZooKeeper topics & get notified if changes / events occur along those topics (i.e. partition assignment changes) Used in: (1) HVase (2) SolrCloud (3) Kafka

eventual consistency

A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually.

Append-Only B Trees

A really interesting & powerful object that's a surprisingly useful anti-pattern. Each write transaction creates a new B-Tree Root The root is a snapshot of the DB at the time when the root was created No need to filter anything out based on txid's b/c subsequent writes can't modify an existing B-tree → can only make new ones due to the append-only nature of this system Pros: (1) Lack of Mutations tends to make things easier to reason around (2) More apparent isolation / modularity Cons: (1) Requires lots of Compaction & Garbage Collection

Single node vs Distributed Systems: How They Perceive Truth/ System State

A single node can easily determine its state while distributed systems are much more complex due to: 1) Time Sync Issues 2) All comm's happen over an unreliable network w/ variable delays 3) Process pauses 4) Partial Failures A node in a distributed system never knows anything for sure; it can only make inferences based on the messages it does/doesn't get via the network This is problematic b/c a node may not respond b/c: (1) bad network (2) Node is dead It's generally hard to know which reason it is though

Serializable Snapshot Isolation (SSI)

A very new serialization technique that's still proving itself; offers a powerful middle ground between concurrency handling from serialization & performance benefits of Weak Isolation Levels Based on Snapshot Isolation: (1) All reads in a transaction are made from a consistent DB snapshot (2) Adds an algorithm to detect serialization conflicts among writes & knowing which transactions to abort Need to beware of Outdated Premises

Handling Errors & Aborts in ACID Systems

ACID DB's are based on this philosophy → The DB would rather abandon a transaction entirely if it can't guarantee ACID, rather than let it be partially finished Transactions should be aborted & safely retired if an error occurred

Doc-Based Partitioning

Aka Local Partitioning = a strategy where each partition is totally separate & handles its own 2ndary indexes independently of other partitions Anytime you make a request to the DB, you only need to deal with the specific partition that holds the data you want --> You could care less what all the other partitions have Pro: (1) easier Writes (2) Greater Modularity / Isolation since we only need to deal with a single Doc / Partition Con: (1) harder reads --> scatter gather needed to aggregate data properly

Algorithmic Correctness

Algorithms are correct based on whether or not they meet a set of criteria / properties that we define for it An algorithm is correct in some system model if it always satisfies its properties in all situations that we assume may occur in that system model.

ACID

Atomicity Consistency Isolation Durability Atomicity & Isolation tend to be the trickiest to guarantee & the most relevant to talk about Started as a set of DB Principles that's used to describe transactional guarantees. However, nowadays it's very loosely used & mainly a marketing term Really important to dig into teach DB & HOW they implement ACID b/c it's such a loose term, to ensure the DB really does satisfy our design criteria & provide us the guarantees we need for our system Rather than blindly relying on tools, we should develop strong understandings of concurrency problems that exist & how to prevent them. Only then, can we build effective & reliable applications with the tools at our disposal

Fault Detection Methods

Automatic fault detection is important: (1) Load balancer needs to stop sending requests to dead nodes Take them out of rotation (2) in a distributed DB w/ single-leader replication, if the leader dies, then we need to have automatic failover Network uncertainty makes it hard to know if nodes are actually dead or not though, what if you kill something that's actually still working!? The only truly reliably way to know that a node is still running properly is for the APP itself to handle responses

Multi-Version Concurrency Control (MVCC) [Definition & Implementation]

B/c multiple transactions can be in-progress at the same time & we maintain snapshots of all data in the DB, we now have to handle several versions of the same object at the same time → MVCC used in Snapshot Isolation to prevent read skews Implementation in Postgres (similar in other DB's too): (1) Transactions are given a unique, always increasing transaction ID (txid) when they're started (2) Anytime the transaction writes to the DB, the data is tagged with the txid of the writer (3) Each row in a table has: [a] a created_on field that contains the txid of the writer that inserted that row into the table [b] a deleted_by field that denotes if a row should be deleted; it's deleted if a txid exists in this field (4) Once no transactions are trying to access a row & if the row has deleted_by populated by a txid, then the DB will remove the row since it knows the data can now be safely removed Updates are translated by the DB as Create & Delete operations → It first creates the new data & then it flags the old data for removal once it's safe to remove

Pessimistic concurrency control

Based on idea that if anything might go wrong, it's better to wait until it's safe before doing anything 2PL uses this ideology → back off & retry later if it's not safe Similar to mutual exclusion which is used to protect data-structures in multi-threaded programming Serial execution is extremely pessimistic

Optimistic Concurrency Control

Based on the idea that even if something may go wrong, transactions continue anyways, in the hope that everything will be alright in the end When transaction wants to commit, the DB checks if anything bad happened (i.e. isolation was broken). If so, then it's aborted & retried later Only transactions using serializability are allowed to commit

Key Range Partitioning

Breaking up partitions based on boundaries set along ranges of keys (i.e. A-C on Partition 1, D-F on Partition 2, etc...) Pros: (1) Easier to do range queries since everything is stored in sorted orders --> Easier reads Cons: (1) Harder to maintain load balancing since boundaries are set arbitrarily & can lead to skewed loads over time unless we update boundaries to help load balance as data set gets larger --> Workaround = Partition along the primary key AND some additional index

Designing Reliable Systems using Unreliable Parts

By using workarounds & corrective techniques, we can build a reliable system from unreliable components! However, this system will never be perfect; it's totally possible to have faults... it's just that the workarounds help eliminate a lot of the lower-level issues that we'd otherwise see It's been done for decades: (1) Error-correcting codes to compensate for bit-rate errors when doing wireless transmissions (2) Internet Protocol is unreliable (drops, delays, duplicates, messes up packet ordering) → TCP provides a reliable transport layer on top of IP

Edge Case Pitfalls of Quorum Consistency

Can cause Quorum Consistency to give false positives or fall apart (1) Partial Writes (2) Sloppy Quorums (3) Concurrency Contentions (2 simultaneous writes OR a read & write happen simultaneously) (4) Faulty Restorations

Atomic Operations in Replicated DB's

Can work well; especially if they're commutative That is, you can apply them in a different order on different replicas, and still get the same result

Unreliable Clocks

Clock Drift & network delays make it difficult to set up reliable clocks Ultimately, clock reliability gets tricky with distributed systems

Replication Models

Covered in the book: (1) Statement Based Replication (2) Write Ahead Log Replication (3) Logical Log Replication (4) Trigger-Based Replication

Hashed Key Mod N Partitioning

DON'T DO IT!! IT'S A TRAP!! = where you take the Hashed key then do Modulo of the output While this does evenly rebalance partitions across all nodes, it IS NOT WORTH IT!!! It'll require you to redistribute data across all nodes multiple times & the bandwidth used for all these write requests will completely obliterate the effectiveness of this approach. There are better strategies!!

Consistency

Describes the APP, rather than the DB. It's on the APP to determine the invariants & ensure that transactions properly satisfy these invariants I.e. if you write bad data that doesn't satisfy the invariants, the DB won't stop you. The DB only stores the data; it's the APP's job to ensure the data is properly cleaned / valid Refers to the idea that the APP will not execute operations that invalidate the constraints (invariants) the DB has set up for itself; it'll remain consistent & stay true to these guarantees/invariants If you start with a DB that satisfies a set of rules, & all executed transactions also satisfy that set of rules, then your DB will definitely satisfy the set of rules after the transactions are run

Atomicity

Describes the DB; especially in regards to how it handles errors & aborts Refers to the idea that if a client makes several requests that are grouped together into a single request / transaction, then if any of those writes fail, the entire transaction is aborted DB must undo all the partial writes that did succeed & revert everything back to the state before the transaction ever started

Replication in Systems Offline Operations

Devices that need to go offline to continue working need to have a local DB leader Local leader stores all changes for writes to Apps & send out updates to data center leaders / other leaders once a network is available again CouchDB is specially designed for this mode of operation & tries to make this kind of multi-leader config much easier

Load Balancing

Distributing a computing or networking workload across multiple systems to avoid congestion and slow performance.

Partitioning

Dividing a large dataset or DB into multiple parts for the sake of gaining: 1) higher availability 2) Scalability 3) greater reliability

Handling Errors in Leaderless Replication Systems

Doesn't follow transactional philosophy of aborting if ACID guarantees aren't preserved Uses Best-Effort Basis

Automatically Detecting Lost Updates

Enables parallel transaction execution by setting up a transaction manager to detect lost updates; If lost update detected, abort the transaction & force it to retry its read-modify-write cycle Pros: (1) Scalability → Operations can happen in parallel! (2) Compatibility → can be used with Snapshot Isolation for greater concurrency handling! (3) Forgiving → don't need APP code to use this, so it's okay if you forget to use a lock or atomic operation & it causes a bug; less detection happens automatically so it's less error-prone! Cons: (1) Complexity → Adds a transaction a manger to abort & retry transactions when necessary Atomic Operations & Explicit locks prevent lost updates by forcing read-modify-write cycles to occur in series, which is less scalable than Automatic Lost Update Detection Query Based (Compare-And-Set) vs Manager Based (Automatic Lost Update Detection)

High Performance Computing

Focus on Vertical Scalability Uses Checkpoints → Saves checkpoints for running jobs to durable storage from time to time Allows Total Failure → If a node fails, it typically stops the entire cluster workload to repair the node, then restarts from the last checkpoint; Behaves more like a single node than a distributed system Specialized Hardware → each node is quite reliable, and nodes communicate through shared memory and remote direct memory access (RDMA) Specialized Network Topologies → multi-dimensional meshes & toruses that give greater performance for HPC workloads with known communication patterns Nodes are closer together → less network delays Scalability → Harder to scale b/c as it gets larger, nodes will break more often; if we always have total failure whenever a node fails, then we might get a state where the HPC Machine is constantly failing & not able to do work

Handling Node Outages in Leader Based Systems

Follower Failure --> Catch-Up Recovery Each follower keeps a write ahead log of data changes it receives from the leader If a follower crashes, it can just reference the log to determine if any deltas need to be applied Leader Failure --> Failover If a leader dies, a follower must be promoted to be the new leader Clients must recognize the new leader & configure themselves to write to this new leader Followers must start consuming data changes from the new leader

Examples of Process Pauses

Garbage Collector Virtual Machines can be suspended / preemptible Unbounded TIMEEOUTs Steal Time & Paused Threads Slow Disk I/O Paging & Thrashing

Using Partitioning To Improve True Serial Execution

Give each partition its own execution thread running independently from other partition threads in order to increase write throughput. Then, you can give a multi-CPU DB the ability to run things serially yet concurrently! Just give each CPU it own partition to run, and you can then scale performance linearly to the # of CPUs & partitions you have! Serial Execution tends to have write bottlenecks, so we need a way of getting higher throughput. Beware of Cross-Partition vs Single Partition Transactions. Anything that require cross partition coordination is WAY slower than single partition execution

Real-Time Operating Systems (RTOS)

Guarantees that processes will have all the necessary CPU time by scheduling everything in advance Specifically designed for on-time delivery & timing reliability --> Real-time systems will always prioritize timely responses & deadline guarantees over everything else (Including high throughput / scalability) These are too expensive to implement usually, so most systems just choose to not pay $$ & live with the unreliable times

Custom Write Conflict Resolution Logic

How we handle write conflicts largely depends on the app we're working on, so it makes sense to allow the developer to write custom app code to resolve conflicts within multi-leader setups. This code can be triggered on write or on read

Partition Request Routing

How we know which partition & node to route requests to 3 Major Approaches: (1) Round-Robin Message Passing (2) Coordinator / Message Broker / Routing Tier (3) Client knows exactly which partition + node to send messages to

Concurrency Contentions

If 2 writes occur concurrently, we don't know which happened first --> Best way to handle this is to merge the concurrent writes together. If winner is picked based on timestamp (last write wins), then writes can be lost due to clock skew If a Read & Write Occur Concurrently, the write may only be seen in some replicas; in this case, we can't tell whether the read returned data that's out-of-date or not

Handling Concurrent Writes & Conflicts

If operations happen at the same time & they don't about each other & are accessing the same resource, then they're concurrent operations. Conflicts can occur when writing to the same key, doing read repair or during hinted handoff Main problem: Events may arrive in different order @ different nodes due to network failures / delays If we simply overwrite the values from each node... we'd get crazy race conditions; we have to somehow resolve these merge conflicts... aka eventual consistency Methods for Resolving Conflicts: (1) Last Write Wins

Virtual Locking & Materialized Conflicts

If phantoms are caused b/c there's no object for us to attach locks to, then what if we artificially introduce a lock object into the DB? A sort of "virtual" lock, where the resource doesn't actually exist but we can create the concept of the lock anyways. = when we take a phantom and turn it into a lock conflict on a concrete set of rows that exist in the database Cons: (1) Hard & error-prone to materialize conflicts effectively (2) ugly to let a concurrency control mechanism leak into the application data model. SHOULD BE A LAST RESORT if no alternative is possible. Even serializable isolation is more preferable in most cases b/c it's just simpler & easier to get right

Randomization & Eventual Partition Load Balancing

If we randomize assignment of data to partitions over a very large # of nodes, we can reach an eventual state where new nodes end up taking a fair share of load from existing nodes This is super similar to eventual consistency & likely has to deal with the Law of Large #'s in Probability Distribution where we eventually reach some Average Gaussian Distribution

Two Phase Locking

Implements two lock modes to achieve concurrency control: (1) Exclusive Mode --> Only 1 transaction can hold the lock (2) Shared Mode --> multiple transactions can acquire the lock The 2 phase nature of 2PL is broken down into 2 stages: (1) While the Transaction is executed & when the lock is acquired (2) When the transaction is aborted/committed & the lock is released During each phase, the mode on the lock must be updated to shared/exclusive as appropriate Key diff vs Snapshot isolation: Snapshot isolation has the mantra readers never block writers, and writers never block readers. With 2PL, this type of locking can block BOTH reads & writes from occurring if an exclusive lock is used. Cons: Performance eats massive costs here b/c: (1) Overhead→ More complexity from managing a ton of locks (2) Less Concurrency → reduced concurrent transactions (in other systems, reads never block writes & writes never block reads) (3) Retries → When transactions are aborted due to deadlocks, they need to retry & redo ALL their work, which can result in tons of extraneous processing & delays

Monitoring Replication Lag

In Leader based systems, you can see how long it takes for each node to get updated with the replication log Possible b/c write order is consistent (We always know what order we do writes in) In Leaderless Systems though, there's no fixed write ordering, which makes monitoring way more difficult On top of that, if we only use read repair (& no anti-entropy processes), there's no telling how old the old data could potentially be.... OOF

Leader Based Systems

In a Leader-based replication system (aka master-slave or active/passive): 1) one of the replicas is designated as a leader When clients want to write to DB, they ping the leader & the leader writes the new data to its local storage first 2) The leader pings all followers with the changes & tells them to write the new data via a replication log or change stream Each follower takes the log from the leader and updates its local copy of the data‐ base accordingly, by applying all writes in the same order as they were processed on the leader. 3) If a client wants to read from the DB, it can query any of the nodes. However, only the leader can accept write requests Followers are read-only from the client's P.O.V.

Write-Ahead-Log Replication + Pitfalls

In concurrency control, a process that ensures transaction logs are written to permanent storage before any database data are actually updated. Also called a write-ahead protocol. Pitfalls: (1) Tight Coupling with Storage Engine → Describes data on a very low level (literally tells you which bytes on disk to change). If the storage engine changes its format from one version to another, we typically can't use that same log for the diff versions. Thus, we can't have different version of the DB Software on the leader & followers if the storage engine format changes (2) Downtime due to Versioning → Because of the tight coupling described above (that we can't have different versions of the DB Software on followers & leaders), we MUST have downtime in order to do version upgrades / software updates...

Safety Property of Algorithms

In distributed system models, we often want safety to be guaranteed All safety properties should be satisfied all the time in all possible situations of a system model Even if all nodes crash, or the entire network fails, the algorithm must nevertheless ensure that it does not return a wrong result; the safety properties must remain satisfied

Quorum Consistency Vs Leader-Based Replication

In general, quorum consistency is really useful for helping to minimize the probability of seeing stale data on reads, but it's not an absolute guarantee! In fact, if you want absolute guarantees, the leader based mechanisms related to resolving replication lag are MUCH better for this.

Durability Criteria: Single Node DB vs Replicated DB

In single node DB, usually means: (1) data is written to nonvolatile storage like a hard drive or SSD (2) there's a write-ahead log or similar that allows for recovery of data if disk gets corrupted In Replicated DB, usually means: (1) Data successfully copied to some minimum # of nodes (2) DB waits until writes/replications are done before reporting the transaction is successfully committed

Last Write Wins

Just store the most recent values & throw away the older values. If we have a way of knowing which write is newest, then we can just replicate this write across all replicas & have consistency! Methods of knowing which write is newest: (1) Timestamping --> Not so good (2) Logical Sequencing with ID's / some other marker --> Great! This is how Cassandra does it using UUID's that are unique with each write

Pessimistic vs Optimistic Concurrency Control

Key Diff → While pessimistic & optimistic controls both retry transactions, when they retry them is different (pessimistic queues them for retry before the transaction even starts, optimistic lets the transaction run first then retries if contention is detected)

Sync vs Async vs Hybrid Networks

Key trade-offs: Variable delay vs Resource Utilization Static allocation (Sync) --> Known & Bounded Delay Cons: (1) Bursty Traffic → Lose the ability to handle bursty traffic.. It literally can't b/c it's all fixed allocation (2) Idle Time → You might allocate resources, only to find that the channel is never even used... not very scalable at all! Dynamic Allocation (Async) --> Unbounded Delay, high utilization, scalable but less reliable Hybrid --> A mix of Sync & Asnyc Network Tools Key things when trying to achieve: 1) Quality of Service Guarantees 2) Admission Control 3) Scheduling / Windowing Controls

Handling Indexes in Snapshot Isolation (Global vs Append-Only)

Knowing what version of an object to look at when given a particular index gets a bit tricky (since each index now points at multiple versions of a doc rather than just 1) 2 Main Approaches: (1) The Global Approach --> Indexes just point at all version of an object & require you to filter for just the version that's visible to the current transaction (2) Append-Only / Copy-on-write --> Literally create a new copy of the DB for each transaction so it's entirely isolation from all other versions

Last Write Wins & Logical Clocks

Last Write Wins Fundamentally flawed if timestamps are used b/c badly synchronized times will make you have very bad results It's impossible to implement this 100% safely if you use time stamps b/c NTP is fundamentally limited by network delays & clock drift If you want to use LWW, use logical clocks instead Logical clocks = clocks that don't track time using an incrementing counter rather than time-of-day data

Tombstones

Markers used to indicate data that should be removed during concurrent write conflicts I.e. we must leave a marker w/ an appropriate version # to show an item has been deleted when merging siblings.

Read Committed Isolation

Most basic level of transaction isolation 2 guarantees: (1) No Dirty reads; you can only read from the DB what's already been committed (2) No Dirty Writes; you can only overwrite data that's already committed Pros: (1) allows aborts (atomicity) (2) prevents incomplete results from transactions (atomicity) (3) Prevents concurrent writes from mingling (isolation) Cons: (1) Doesn't account for Read Skew (2) can't fully account for write skews & phantoms & some other strange race conditions Implementation: (1) No Dirty Writes --> Place a Lock on Objects that must be obtained before you can write to data (2) No Dirty Reads --> Create a snapshot if a lock is held onto that object by a transaction for write purposes. The DB will remember the state of object's data & use this for read purposes.

Async replication

Most commonly used replication methodology for DB's = When the leader just sends out updates per node without waiting for each node to finish before proceeding Pros: Leader can continue processing requests instead of blocking operations while waiting for a follower's ACK Cons: (1) Write durability is decreased; no guarantee the write will make it to all followers (2) If the leader crashes, we may end up losing new write data altogether since none of the followers may have actually processed the data prior to the crash

Multi-datacenter Operation

Multi-leader & leaderless configs are good for multi-data center systems Leaderless Config pros --> Tolerates conflicting concurrent writes, Networks interruptions, & latency spikes

Multi-Threaded Code on Single Machine vs Distributed System

Multi-threaded code on a single machine has TONS of ways of making concurrency safe: mutexes Semaphores Atomic counters Lock-free data structs Blocking queues Etc... In distributed systems, there's no super good solutions b/c: No Shared Memory → No sure-fire / reliable common space to help control everything Network Issues → Everything's passed in messages over the network, which can have all kinds of weird issues

Network Faults

Networks are asynchronous systems that are somewhat unreliable due to physical issues within the world. Network faults are strange non-deterministic errors that occur every now & then that can impact the behavior of a system Some faults include: (1) A message between nodes is lost (2) An ACK is lost & the operation is retried even if it worker (3) A node on the network dies, so it'll never get the message(s) (4) the client itself dies before it receives an ACK from a process it launched before it died

Pitfalls of Durability & Reliability

Nothing is 100% reliable / durable; crazy edge cases can occur to completely obliterate data in the DB & cause it to be lost forever (i.e. all nodes & backups could be simultaneously removed, in which case data is just gone forever When trying to achieve replication & durability, we can: (1) Save to local hard disks (2) Replicate to remote nodes (3) Create backups In practice, there is no one technique that can provide absolute guarantees. There are only various risk-reduction techniques (i.e. writing to disk, replicating to remote machines, and backups)—and they can and should be used together. it's wise to take any theoretical "guarantees" with a healthy grain of salt

BASE

Nowadays, it's basically anything that isn't an ACID compliant system; again, it's a very loose definition of what it means to a BASE system, as there's no widely well-defined definition = Basically available, Soft State & Eventual Consistency

Parallel Query Execution / Massively Parallel Processing (MPP)

Often used in analytics relational DB's (OLAP Purposes) b/c there's a ton of complex queries (JOINs, FILTERs, GROUPs, aggregations, etc...), MPP tries to speed this up by breaking the operations into several execution stages / partitions These parts can then be run in parallel on diff nodes of the DB in order to more quickly finish execution

Properties of Algorithms

Properties describe constraints / guarantees of the algorithm; for example: (1) Uniqueness → No two requests for a fencing token return the same value. (2) Monotonic sequence → If request x returned token tx , and request y returned token ty , and x completed before y began, then tx < ty . (3) Availability → A node that requests a fencing token and does not crash eventually receives a response. Divided into 2 major groups: 1) Safety --> Don't let bad things happen. If violated, then there's always a specific point in time that we can reference for when it occurred. Once violated, the damage is undoable; it's already done & we need to do cleanup / damage control 2) Liveness --> Something good eventually happens No specific point in time where it can be referenced to occur, but you can always hope it'll be eventually handled For example, a node may have sent a request but not yet received a response. However, you can at least hope it'll eventually be received & processed.

Limits of Quorum Consistency (w + r <= n)

Quorums don't have to always be majorities. Instead, it just matters that the set of nodes for reads & the set of nodes for writes have at least 1 node that overlap Thus, it's possible to use other quorum assignments that may better suit our use case when designing distributed algorithms Cons: (1) the quorum condition isn't satisfied (2) More likely to read stale values since you've got a smaller # of reads & writes (smaller sets of each type, so overlap is less likely) Pros: (1) Less latency b/c less network requests (2) Higher availability since fewer operations (3) If many nodes go down, the smaller # of requests makes it more likely that you can still process everything quickly The process only fails if the total reachable replicas drops below w or r

Merging Concurrently Written Values

Requires client to do extra work (more complexity) for the sake of ensuring no data gets silently dropped. Client must clean up merge conflicts after writes occur if there are multiple concurrent writes Situations Regarding Concurrent Writes: (1) In append-only systems --> Just do a UNION Join since values are only added & you just need to deal with duplicate values as they occur (2) In systems where data can be removed/updated --> Not only need to do UNION Joining, but we'll also need some way of indicating which value(s) are removed (Tombstones)

Time of Day Clock

Returns current date & time according to some calendar (wall clock time) Usually synced using NTP so that a timestamp on 1 machine should mean the same as a timestamp on another Cons → Can't measure elapsed time well b/c of these: (1) Ignores leap seconds (2) May get reset forcibly & appear to jump back to a previous point in time (3) Resolution wasn't so great in the past (+- 10 ms variability between machines), but this isn't as big of an issue nowadays (4) Hard/impossible to get perfect syncing between all machines in distributed systems

SQL vs NoSQL on Use of Transactions

SQL first introduced transactions in 1970s & it's gained widespread use in all SQL DB's NoSQL didn't bother trying make transactions a thing b/c they believed it's an anti-pattern for high availability & scalability; they do however have in-built partitioning & replication by default!

Multi-datacenter Operation in Riak

Similar to multi-leader replication model keeps all communication between clients and database nodes local to one datacenter N = # of replicas w/in 1 (the local) datacenter Cross-datacenter replication happens asynchronously in the background

Multi-Datacenter Operation in Cassandra & Voldemort

Similar to traditional leaderless replication model # of n designated replicas includes nodes in all datacenters & you can choose how many of the n nodes to have in each datacenter Each write is sent to all replicas regardless of datacenter Client only waits for ACK from quorum of nodes in its local datacenter though so its' unaffected by delays & interruptions between cross-datacenter communications High latency writes to other data centers are set up to happen asynchronously, though we can modify the settings on this works if desired

CRDT's

Since handling concurrent writes & merge conflicts is such a major issue, there's research going on to produce data structures capable of automatically merge siblings in sensible ways, including preserving deletions --> = CRDT's in Riak DB

Pattern to Create Write Skews

Situations where Write Skew can occur follow a simple pattern: (1) SELECT query checks if some constraint is satisfied for rows that match some search condition (2) APP code decides how to continue based on result of the first query from Step (1) --> Should it proceed or should it log an error & abort? (3) If APP proceeds, it makes a write to the DB & commits the transaction --> The write breaks the constraint from Step (2) b/c it changes the resources that used to satisfy this condition, so it's no longer satisfied Thus, if you were to repeat these steps from Step 1, you'd get a diff result b/c the constraint is now broken. The steps above can occur in a different order & you'd still get write skews

Effects of Clocks on Snapshot Isolation

Snapshot isolation depends on timing to know whether or not reads / writes should be ignored / used when looking at snapshots In Distributed DB's, snapshot isolation gets more difficult.. You need to coordinate this monotonically increasing transaction ID across multiple nodes, which is way too costly While timestamps would be great, clock drift & network delays make them impossible to implement reliably. Instead, we can always wait for the length of 1 confidence interval before beginning the next transaction This creates the relationship: If A_earliest < A_latest < B_earliest < B_latest, B definitely happened after A

Detecting Stale MVCC Reads (& how this prevent Invalid Premises)

Snapshot isolation is done using MVCC: MVCC ignores writes that weren't committed yet during the time when the snapshot was created, so it's possible a write was already happening, but isn't captured in the snapshot To prevent this case from causing invalid premises, we have to track when a transaction ignores another transaction's writes due to MVCC Visibility rules At the time of commit, the DB checks if the transaction ignored any writes, if so, it's aborted & later retried Abort happens at time of commit b/c: (1) Ignored Write is Aborted → The ignored writes may be aborted, so it turns out we didn't actually ignore any writes (2) Current Transaction is Read-Only → if the transaction is read-only, then there's no chance of write skew so no abort is needed (3) Ignored Write isn't Committed Yet → The ignored write isn't done processing yet, so the current transaction isn't actually stale! Overall, we want to avoid aborts + retries b/c they add additional overhead that leads to decreased performance

Repeatable Read & Naming Confusion

Snapshot isolation is really great (especially for read-only transactions), but it's called many different things across many different DB's SQL's standard definition of isolation levels is fundamentally flawed b/c it's ambiguous, imprecise & not very implementation-independent As always, you have to look into the DB & see if it really works well for what you want! Don't just take their "guarantee" at face value; really check if it's doing what it says it will.

Liveliness Property of Algorithms

Something good eventually happens No specific point in time where it can be referenced to occur, but you can always hope it'll be eventually handled For example, a node may have sent a request but not yet received a response. However, you can at least hope it'll eventually be received & processed. In distributed systems, we can make caveats for this property: for example, we could say that a request needs to receive a response only if a majority of nodes have not crashed, and only if the network eventually recovers from an outage

Distributed System Models for Timing

System Models allow us to make certain assumptions, so we can design algorithms for our expected system behavior. Models for timing: 1) Synchronous Model --> know that network delay, pauses, and clock drift will never exceed some fixed upper bound. Highly unrealistic 2) Partially Synchronous Model --> Behaves with bounded delays most of the time but we do have excess network delays, process pauses & clock drift sometimes; Very realistic of many systems 3) Async Model --> No assumptions can be made about timing; Highly restrictive & most algorithms can't work in this ENV Models for

Theoretical Design vs Real World Distributed Systems

System models are great for helping to design algorithms that are effective & reliable However, remember the model is just an abstraction at the end of the day & real world problems will come back to haunt you still. There are always weird exceptions / edge cases for these models Proving an algorithm correct does not mean its implementation on a real system will necessarily always behave correctly, but it's a very good first step b/c it helps reveal problems that might remain hidden in real system for a long time

Distributed Systems: Struggling w/ "1 of Something" Constraint

Systems generally require there to be only 1 of something b/c we need there to be unique identification: Only 1 node can be the leader Only 1 node can hold a lock Only 1 user can have the username In distributed systems, this concept of having only 1 of something becomes harder due to quorums & all the faultiness of distributed systems. For example, if a leader node fails to respond, it may be declared dead by the rest of the system so a new leader may be set; However, if the original node still thinks it's a leader, then we'll end up with 2 leaders, not just one!!

TCP vs UDP

TCP transport is used for logging on, file and print sharing, replication of information between domain controllers, transfer of browse lists, and other common functions. TCP can only be used for one-to-one communications UDP is often used for one-to-many communications, using broadcast or multicast IP datagrams. Much less reliable than TCP, but arguably more scalable for use cases where ensured transmission isn't super important.

Conflict Resolution in Replicated DB's

TLDR; let the APP Logic resolve merge conflicts within Replicated DB's; If you want to know why, then read all the stuff below preventing lost updates gets even more complex b/c there's copies of data on multiple replicas. Data can be concurrently changed on diff nodes, so more steps need to be taken to prevent lost updates Locks & Compare and Set Operators fall apart in multi-leader / leaderless replication b/c several writes can happen concurrently & replication can occur asynchronously, so there's no guarantee that there's a single up-to-date copy of data. In replicated DB's, it's best to allow concurrent writes to create several diff conflicting versions of a value & then to have the APP code or special data structures resolve & merge the versions later on

Multi Version Concurrency Control (MVCC) Visibility Rules [Implementation of Visibility Rules]

The DB maintains a ledger of what resources are available to each transaction based on whether or not the txid is assigned to objects in the DB Visibility Criteria: An object is visible to the transaction if both are true: (1) At the time when the reader's transaction started, the transaction that created the object had already committed (2) The object is not marked for deletion, or if it is, the transaction that requested deletion had not yet committed at the time when the reader's transaction started. How to Create a Snapshot per Transaction (1) Made a list of all transactions that are in progress (not yet committed or aborted); anything partially written data is ignored / we look at the old data before the write (2) Ignore writes from aborted transactions Can exist if the aborted transactions weren't cleaned up yet (3) Ignore writes made by later transactions (anything with a larger txid is a later transaction so just ignore those) regardless of whether or not they're committed yet (4) all other writes are visible to the APP's queries

Serializability

The only way to truly remove race conditions entirely is to stop running things in parallel → Serialize them 3 Main ways of doing serializability: (1) Actually running in serial (2) Two Phase Locking (3) Optimistic Concurrency Control While it used to be really difficult to do in the past due to memory & processing power constraints, nowadays it actually is do-able w/o major performance impacts (as long as you satisfy some constraints, that is)!

Handling Failures in Chain Replication

The protocol considers only three failure cases, all of which are fail-stop failures: (1) Fail-stop of the head When this occurs, the next DB becomes head (2) Fail-stop of the tail (primary) Similarly, when the tail fails the penultimate DB becomes tail (2) Fail-stop of a middle server Servers keep a history of requests that they've processed. If a middle node fails, the 2 neighboring nodes connect, compare logs, and determine what deltas need to be updated.

Weak Forms of Lying

These include: 1) Bad configs 2) Software bugs 3) Hardware issues Ways to protect against weak forms of lying: (1) Checksums --> Protect against network packet corruption; Always useful when you transfer data over a network (2) Basic sanitization & sanity checking of data (3) Backup Sources of Truth --> Trust but verify; always ask multiple sources to make sure your data is actually correct.

Adding New Followers in a Leader-Based System

To set up a new up-to-date replica: (1) Take a snapshot of the leader's DB at some point in time (without locking the DB if possible) (2) Copy the snapshot to the new follower node (3) New node connects to the Leader DB & Requests all data changes since the snapshot was made; How do we know where to start looking for deltas? --> Log Position Association or Time Stamping can be used (4) Once the follower copies the backlog of updates since the snapshot, it's now caught up

Weak Isolation Levels

True Isolation (serializability) is extremely expensive & really not worth doing; instead, we use weak isolation levels to handle concurrency issues. However, weak isolation can also cause very severe issues, so it must be implemented carefully!! The key is to understand all the possible concurrency issues & address the possible ones that may occur. Concurrency issues only occur if you're trying to access the same resources, so it reasons that it's possible to have weak isolation levels to deal with these contentions / limited resource issues.

Snapshot Isolation vs Read Committed Isolation

Typically, read committed Isolation uses a separate snapshot for each query, while snapshot isolation uses the same snapshot for an entire transaction Snapshot Isolation is basically a more expanded version of Read Committed Isolation; it really builds on the concept of read committed isolation & takes it a step further.

Network Time Protocol

Used to communicate time synchronization information between devices. Set the computer' clock time based on the time reported by a group of servers (consensus on time) Servers themselves get their concept of time from more reliable sources like GPS receivers

Replication in Systems With Collaborative Editing

Uses Multi-Leader Replication Model Shares similar ideas with Offline Client Operations → writes are instantly applied to the local replica & later asynchronously replicated to the server & any other users who are editing the same doc To resolve editing conflicts locks must be placed on the document before edits can occur If we want fast collaboration, we can just mostly remove locks & only lock very small units (i.e. keystrokes) & make it so many people can edit at the same time → Looks really similar to multi-leader replication

Implementing Failover in Leader-Based Systems

When a Leader Node dies, we must delegate a new leader & being the failover process so the system doesn't go down entirely. Steps are: (1) Determine the leader has failed (2) Choose a new leader --> via Election Process or Controller Node (3) Reconfigure the system to use the new leader; All write requests must now be routed to the new leader. If an old leader comes back online, it must also recognize that it's no longer the leader & has to submit to being a follower of the new leader.

Faulty Restorations

When a node carrying a new value fails, we might restore data from a replica that's out of date. In this instance, if the total # of nodes with the up-to-date data falls below w, then we no longer have quorum consistency

Cascading Failures

When nodes are killed in a system that's already overstrained, and that dead node's responsibilities are transferred to a new node that gets overloaded, and the cycle goes on, so all other nodes become overloaded & subsequently die

Partial Writes

When the total # of writes succeeds in fewer than w replicas, the write operation is reported as unsuccessful EVEN THOUGH some writes may have gone through on some replicas The successful writes are not rolled back in this instance, so it's possible that subsequent reads might not return the data from the successful writes

Read-Scaling Architecture

When you have just a single leader & many many many more replicas b/c you can read from ANY replica → helps massively with scalability for read requests Works best when you have only a few write requests here & there, but know you'll have a ton of read requests coming in Major Con: Async Replication MUST use Async Replication --> if we did this using sync replication, a single node failure would bottleneck the entire app or maybe even take it down. Also, the more nodes you have, the more unstable sync replication becomes & in Read-Scaling Arch's, you WANT more nodes, so having a synchronous Read-Scaling architecture would be an anti-pattern

Consistent Hashing

When you randomly choose the boundaries / thresholds for hash key partitioning Avoids need for central controller or distributed consensus = way of load balancing across an internet-wide systems of caches (i.e. CDN's; Content Delivery Networks) Doesn't actually work well for DB's, so it's rarely used in practice Some DB's still refer to consistent hashing, but the term isn't used accurately so it's best to just avoid the term altogether --> Instead, just call it hash partitioning

Lock-Step Processes

Where each transaction happens 1 after another for each partition within a multi-partition Serially Executed Transaction. Locks are used to ensure serial execution is implemented.

Pre-splitting

Where we manually split up data amongst multiple partitions when using a Fixed # of partitions for Rebalancing This is a workaround for the Dynamic Partitioning issue whereby the initial dataset size doesn't justify the massive # of partitions & the fact that we'd likely get hot spots without initially manually allocating data across all the partitions

Pitfalls of Retrying Aborted Transactions

While we may wish to retry aborted transactions, it's actually sometimes dangerous to do so. Pitfalls: (1) Lost ACK's due to Network Failure --> the transaction really did finish, but we lost the success message along the way... in this case, if we retry, then we get duplicated data manipulations which can be REALLY bad (2) Overloading --> If the failure is a result of too much data / time / some resource being used up to execute the transaction, then retrying it will just cause the same ERROR & be really dangerous. Workaround → Limit # of retries, use exponential backoff, handle overload errors differently from other errors (if possible) (3) Constraint Violation --> Some transactions fail b/c they break DB guarantees/constraints; We shouldn't retry transactions that break DB guarantees (4) Side Effects --> Sometimes, transactions cause external systems / side effects to occur. In these instances, we may end up triggering the side effects multiple times with retries (i.e. sending notification emails). Workaround --> 2 phase-commits can help ensure the different systems commit / abort together (5) Client Failures --> Client fails while retrying & data it was trying to write to DB is lost, so retrying would be pointless

Write Skews vs Dirty Writes vs Lost Updates

Write Skews can occur if 2 transactions read the same object, then the different transactions update different objects; When diff transactions update the same object → You get dirty writes / lost updates When they update different objects, we see a new type of race condition → Write Skew Write Skews can be thought of as a generalization of the lost update problem & is primarily caused by timing of transactions updating different objects simultaneously. Write Skew Examples: (1) At least 1 Doctor being on call (2) Booking a meeting room (3) Claiming a Username (4) Preventing Double-Spending

Monitoring Staleness

You should set alerts to ensure you know when too much data becomes stale It's great if your system can tolerate stale data, but eventually you should clean it up when it gets too messy In Leader-based systems, this is possible by monitoring replication lag. However, in leaderless systems, this is still a newer problem Ideally, we'd want metrics for measuring eventual consistency of a database & how we get there

Mutli-Partition Serially Executed Transactions

a special case that occurs when using true serial execution with partitions that each have their own isolated execution thread. In the case of multiple partitions being used, you'll need transactions to be performed in lock-step (one after another for each partition) to ensure serializability across the entire system Data with multiple secondary indexes is likely to require a lot of cross partition coordination → Multi Partition Anything that require cross partition coordination is WAY slower than single partition execution

Siblings in Concurrent Writes

all values that are being written concurrently are called siblings Need some way of handling siblings so they don't kill each other & we lose data or we get some unwanted behavior / race condition / deadlock

TIMEOUT

amount of time we're allowed to wait before determining the process is dead & we should move on with our lives

Reliability Engineering

consists of a variety of techniques to build reliability into products and test their performance You must know how your system responds to failures/faults of various kinds.

Dirty Writes

if an earlier write is part of a transaction that hasn't been committed, then if the later transaction overwrites the earlier partially written data, then this is considered a dirty write Reasons to Prevent Dirty Writes: (1) Multi-Object Updates --> If transactions try to update multiple objects at once, if dirty writes are allowed, we can end up with crazy confusing situations whereby some objects are updated by 1 transaction and other objects are updated by the other transaction (2) Loss Of Updates --> If a transaction is reading data & mid-write, but another transaction reads the data prior to the first transaction completing its write, we can end up with Lost Data

Cloud Services

is a type of Internet-based computing that provides shared computer processing resources and data to computers and other devices on demand Focuses on horizontal scalability Network-based Message Passing On-Demand Usage Cons: (1) Network Randomness (2) Downtime is Deadly / Total Failure is not okay, but it can certainly handle partial failures Pros: (1) Scalability (2) High Availability (3) Geographically Distributable (4) Uses many many simple machines to help offset the cost vs performance trade-off

Monotonic Clocks

measure durations / intervals of time extremely well b/c they're guaranteed to always move forward in time Pros: (1) No synchronization needed → Works well on Distributed Systems b/c using a monotonic clock for measuring elapsed time (e.g., timeouts) is usually fine b/c we don't actually need to sync up times & resolution is good enough usually Cons: (1) monotonic clock value on 1 machine doesn't mean same thing on another computer; never compare monotonic clock times from different machines Resolution is really really good, within microseconds or less

Handling Write Conflicts in Multi-Leader Systems

multi-leader systems will almost certainly have write conflicts b/c writes are async across the system & race conditions/deadlocks can potentially occur Ways to Handle Conflicts: (1) Conflict Avoidance (2) Converge Towards a Consistent State (3) Custom Conflict Resolution Logic

Leaderless Replication

no leader; any node can accept writes from clients Aka Dynamo-Style Replication since Amazon DynamoDB popularized it again Some leaderless implementations: (1) client writes directly to some replicas (2) Clients write to a coordinator doe that writes on behalf of the client Doesn't enforce particular ordering of writes like a leader DB Model would This lack of write ordering has MAJOR consequences for how the DB is used

Byzantine Fault Tolerance

ridiculously hard to achieve in Distributed systems = when a system can operate correctly even if some nodes are malfunctioning & not obeying the protocol or even if malicious attackers interfere with the network Some examples where this is relevant: (1) Aerospace where environmental radiation can make nodes send weird messages in unpredictable ways (2) malicious actors trying to defraud a financial system Usually when we have a central authority to dictate what is / isn't allowed, Byzantine faults typically don't occur b/c we can just make that server/central point reject bad requests In peer-to-peer networks, where there is no such central authority, Byzantine fault tolerance is more relevant.

Secondary Index Partitoning

secondary keys are critical for showing relationships between data, thus we need to partition them effectively as well to gain all the scalability benefits 2 main methods for this: (1) Doc-Based Partitioning (2) Term-Based Partitioning

Hot Spots

set of nodes that are taking on skewed loads & becoming bottlenecks

Distributed Systems

systems in which application processing is distributed across multiple computing devices 2 Main Implementation Philosophies: (1) Cloud Services (2) High Performance Computing

Time since Epoch

the time since midnight UTC on 1/1/1970 according to the Gregorian calendar, not counting leap seconds

deterministic procedures

transactions that produce the same result across all nodes it's run on

Version Numbers

used to determine which data is most up to date from the many replicas that the client reads from Very useful if a node goes down & we need to quickly figure out if the node's version is up-to-date once it goes back online

Skewed Loads

when a single node takes on too many requests compared to other ones

Unbounded Delays/Timeouts

when no timeout is set, so a process runs forever Networks usually use these b/c there's No guarantees around: (1) Network Delivery time → How long it takes for a request to get from 1 point to another. In fact, it might just never arrive (2) Server Response time → How long it takes for the server to handle the request then send a response


Set pelajaran terkait

"What is the Supply Chain?" Vocabulary

View Set

Food Science: Chapter 11: Fruits

View Set

Management of Patients with Burn Injury- ch. 57

View Set

Pharm Chapter 38: Agents to Control Blood Glucose Levels

View Set