Chapter 7: Transactions

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

After a transaction has acquired the lock, it must continue to hold the lock until ____________. This where the name "two-phase" comes from: the first phase is when the lock is acquired, and the second phase is when all the locks are released.

(the) end of the transaction

Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: If the transaction also has side effects outside of the databases, those side effects may happen even if the transaction is ____________.

Aborted

The rate of ____________ significantly affects the overall performance of SSI. For example, a transaction that reads and writes data over a long period of time is likely to run into conflicts and abort, so SSI requires that read-write transaction be fairly short (long-running read-only transactions may be okay). (However, SSI is probably less sensitive to slow transactions than two-phase locking or serial execution.)

Aborts

For examples that check for the ____________ of rows matching some search condition and then adds a row matching the same condition, a query that adds locks to a row (e.g., SELECT FOR UPDATE) cannot be used because the query is looking to update only when there aren't any rows that match the condition, and the lock can be attached to nothing. These examples are susceptible to write skew.

Absence

Contention between transactions can be reduced with commutative ____________ operations. (e.g., increment counter)

Atomic

In general, __________ refers to something that cannot be broken down into smaller parts

Atomic

Many databases provide _____________ update operations, which remove the need to implement read-modify-write cycles in application code. These operations are usually the best solution to lost update problems.

Atomic

ACID __________ describes what happens if a client wants to make several writes, but a fault occurs after some of the writes have been processed -- for example, a process crashes, a network connection is interrupted, a disk becomes full, or some integrity constraint is violated.

Atomicity

ACID __________ guarantees that if the writes are grouped together into an atomic transaction, and the transaction cannot be competed (committed) due to a fault, then the transaction is aborted and the database must discard or undo any undo any writes it has made so far in that transaction.

Atomicity

With ACID __________, if an error occurs halfway through a sequence of writes, the transaction should be aborted, and the writes made up to that point should be discarded. In other words, the database saves you from having to worry about partial failure, by giving an all-or-nothing guarantee.

Atomicity

Without ACID __________, if an error occurs partway through making multiple changes, it's difficult to know which changes have take effect and which haven't. The application could try again, but that risks making the same change twice, leading to duplicate or incorrect data. With this guarantee, if a transaction was aborted, the application can be sure that it didn't change anything, so it can safely be retired.

Atomicity

The safety guarantees provided by transactions are often described by the well-known acronym ACID, which stands for __________.

Atomicity, Consistency, Isolation, and Durability

Systems that do not meet the ACID criteria are sometimes called BASE, which stands for __________.

Basically Available, Soft state, and Eventual consistency

To figure out whether your application needs transactions, you first need to understand exactly what __________ transactions can provide, and what costs are associated with them.

Safety guarantees

Dirty writes and lost updates are two kinds of race conditions that can occur when different transactions concurrently try to write to the _____________ objects.

Same

In the case when different transactions update the _____________ object, you get a dirty write or lost update anomaly (depending on the timing).

Same

In a database with __________ (almost everything except pure key-value stores), the indexes also need to be updated every time you change a value. These indexes are different database objects from a transaction point of view: for example, without transaction isolation, it's possible for a record to appear in one index but not another, because the update to the second index hasn't happened yet.

Secondary Indexes

(Serializable Isolation): Simple key-value data can often be partitioned very easily (and hence use the approach of partitioning + actual serial execution on a single thread [per partition] to achieve higher transaction throughput), but data with multiple __________ is likely to require a lot of cross-partition coordination.

Secondary indexes.

Literally executing transactions in ____________ order -- an approach to implementing serializable transactions: if you can make each transaction very fast to execute, and the transaction throughput is low enough to process on a single CPU core, this is a simple and effective option.

Serial

__________ of transactions can become a viable way of achieving serializable isolation within certain constraints: - Every transaction must be small and fast, because it takes only one slow transaction to stall all transaction processing. - It is limited to use cases where the active dataset can fit in memory. Rarely accessed data could potentially be moved to disk, but if it needed to be accessed in a single-threaded transaction, the system would get very slow. - Write throughput must be low enough to be handled on a single CPU core, or else transactions need to be partitioned without requiring cross-partition coordination - Cross-partition transactions are possible, but there is a hard limit to the exent to which they can be used.

Serial Execution

The classic database textbooks formalize ACID isolation as __________, which means that each transaction can pretend that it is the only transaction running on the entire database. The database ensures that when the transactions have committed, the result is the same as if they had run serially (one after another), even though in reality they may have run concurrently.

Serializability

A database with ____________ isolation must prevent phantoms

Serializable

An advantage of implementing ____________ isolation using actual serial execution is that a system designed for single-threaded execution can sometimes perform better than system supports concurrency, because it can avoid the coordination overhead of locking.

Serializable

Automatically preventing write skew requires true _____________ isolation.

Serializable

In practice, __________ isolation is rarely used, because it carries a performance penalty. Some popular databases typically offer a weaker isolation guarantee, such as snapshot isolation.

Serializable

Most databases that provide ____________ isolation today use one of three techniques: 1. Literally executing transactions in serial order. 2. Two-phase locking 3. Optimistic concurrency control techniques

Serializable

The simplest way of avoiding concurrency problems is to remove the concurrency entirely: to execute only one transaction at a time, in serial order, on a single thread. The resulting isolation is by definition ____________.

Serializable

This disadvantage of implementing ____________ isolation using actual serial execution is that the system's throughput is limited to that of single CPU core. In order to make the most of that single thread, transactions need to be structured differently from their traditional form.

Serializable

When using partitioning and actual serial execution on a single thread (for each partition), for any transaction that needs to access multiple partitions, the database must coordinate the transaction across all the partitions that it touches, to ensure __________ isolation across the whole system.

Serializable

With ____________ isolation, the database guarantees that if the transactions behave correctly when run individually, they continue to be correct when run concurrently; i.e., the database prevents all race conditions.

Serializable

With stored procedures and in-memory data (i.e., all data in the transaction fits in memory), executing all transactions on a single thread become feasible. As they don't need to wait for I/O, systems using this approach to implement __________ isolation avoid the overhead of other concurrency control mechanisms and can achieve quite good throughput on a single thread

Serializable

__________ isolation means that the database guarantees that transactions have the same effect as if they ran serially (i.e., one at a time, without any concurrency)

Serializable

____________ isolation guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency.

Serializable

____________ isolation is usually regarded as the strongest isolation level.

Serializable

____________ (SSI) is a very promising approach to full serializability (i.e., serializable isolation) that has only a small performance penalty compared to snapshot isolation, a weaker isolation level.

Serializable Snapshot Isolation

Compared to serial execution, ____________ isolation is not limited to throughput of a single CPU core.

Serializable snapshot

Compared to two-phase locking, the big advantage of ____________ isolation is that one transaction doesn't need to block waiting for locks held by another transaction. Like under snapshot isolation, writers don't block readers, and vice versa. This design principle makes query latency much more predictable and less variable. In particular, read-only queries can run on a consistent snapshot without requiring any locks, which is very appealing for read-heavy workloads.

Serializable snapshot

For detecting writes that affect prior reads with ____________ isolation, when a transaction writes to the database, it must look in the indexes for any other transactions that have recently read the affected data. This process is similar to acquiring a write lock on the affected key range, but rather than blocking until the readers have committed, the lock acts as a tripwire: it simply notifies the transactions that the data they read may no longer be up to date. (The later ending write transaction is aborted)

Serializable snapshot

In order to prevent reads on a stale MVCC object with ____________ isolation, when the transaction wants to commit, the database checks whether any of the ignored writes (on concurrent transactions were in-progress but not committed when the transaction started) have now been committed. If so, the transaction must be aborted. Why wait until committing? At the time when the transaction makes a stale read, the database doesn't yet know whether that transaction is going to later perform a write. By avoiding unnecessary aborts, SSI preserves snapshot isolation's support for long-running reads from a consistent snapshot.

Serializable snapshot

In order to provide ____________ isolation, the database must detect situation in which a transaction may have acted on an outdated premise and abort the transaction in that case. There are two cases to consider to know if a query result might have changed: 1. Detecting reads on a stale MVCC object version (uncommitted write occurred before the read). 2. Detecting writes that affect prior reads (the write occurs after the read).

Serializable snapshot

____________ -- a type of race condition: a fairly new algorithm that avoids most of the downsides of the previous approaches. It uses an optimist approach, allowing transactions to proceed without blocking. When a transaction wants to commit, it is checked, and it is aborted if the execution was not serializable.

Serializable snapshot isolation (SSI)

The lock used in Two-phase locking is used as follows: If a transaction wants to read an object, it must first acquire the lock in ____________ mode. Several transactions are allowed to hold the lock in this mode simultaneously, but if another transaction already has an exclusive lock on the object, these transactions must wait.

Shared

Storage engines almost universally aim to provide atomicity and isolation on the level of a __________ (such as a key-value pair) on one node. Atomicity can be implemented using a log for crash recovery, and isolation can be implemented using a lock on each object (allowing only one thread to access an object at any one time).

Single object

__________ operations (such as increment and compare-and-set) are useful, as they can prevent lost updates when several clients try to write to the same object concurrently. However, they are not transactions in the usual sense of the word.

Single-object

In (PostgresSQL) MVCC-based _____________ isolation, if a transaction deletes a row, the row isn't actually deleted, but it is marked for deletion. At some later time, when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes any rows marked for deletion and frees this space.

Snapshot

In 2PL (Two-Phase Locking), writes block both other writers and readers and readers block other readers and writers. ____________ isolation, on the other hand, has the mantra readers never block writers, and writers never block readers. On the other hand, because 2PL provides serializability, it protects against race conditions like lost updates and write skew, which is not guaranteed with weaker isolation levels.

Snapshot

It is very hard to reason about the meaning of a long-running, read-only query if the data on which it operates is changing at the same time as the query is executing. The solution for this is to enforce _____________ isolation; when a transaction can see a consistent snapshot of the database, frozen at a particular point in time, it is much easier to understand.

Snapshot

____________ isolation avoids phantoms in read-only queries, but in read-write transactions, phantoms can lead to particularly tricky cases of write skew.

Snapshot

_____________ isolation is a useful isolation level, especially for read-only transactions. However, many databases call it by different names (e.g., serializable in Oracle, and repeatable read in MySQL)

Snapshot

_____________ isolation is the most common solution to solving problems of temporary inconsistencies (such as read skews).

Snapshot

The idea behind _____________ isolation is that each transaction reads from a consistent snapshot of the database -- that is, the transaction sees all the data that was committed in the database at the start of the transaction. Even if the data is subsequently changed by another transaction, each transaction sees only the data rom that particular point in time.

Snapshot (isolation)

Another options for preventing lost updates is to allow the read-modify-write cycles to execute in parallel, an if the transaction manager detects a lost update, abort the transaction and force it to retry its read-modify-write cycle. The advantage of this approach is that databases can perform this check efficiently in conjunction with _____________. This lost update detection is a great feature, because it doesn't require application code.

Snapshot isolation

A pro of ____________ is that modern implementations use existing general-purpose programming languages (i.e. VoltDB uses Java or Groovy).

Stored Procedures

Systems with single-threaded serial transaction processing don't allow interactive multi-statement transactions. Instead, the application must submit the entire transaction code to the database, as a ____________.

Stored procedure

A con of ____________ is that each database vendor typically has its own language for these procedures that look quite ugly and archaic. Furthermore, code running a database i difficult to manage, debug, and version control.

Stored procedures

A __________ is a way for an application to group several reads and writes into a logical unit.

Transaction

A __________ is usually understood as a mechanism for grouping multiple operations on multiple objects into one unit of execution.

Transaction

An application with simple access patterns, such as reading and writing only a single record, can probably manage without ____________.

Transactions

For complex access patterns, ____________ can hugely reduce the number of potential error cases you need to think about.

Transactions

Without ____________, various error scenarios (process crashing, network interruptions, power outages, disk full, unexpected concurrency, etc.) mean that data can become inconsistent in various ways.

Transactions

____________ are an abstraction layer that allows an application to pretend that certain concurrency problems and certain kinds of hardware and software faults don't exist.

Transactions

Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: it is only worth retrying after ____________ errors (for example due to deadlock, isolation violation, temporary network interruptions, and failover); after a permanent error (e.g., constraint violation) a retry would be pointless.

Transient

In ____________ (2PL), a lock is maintained on each object (e.g., row) in the database. The lock can either be in shared mode or in exclusive mode:

Two-phase locking

The big downside of ____________ is that transaction throughput and response times of queries are significantly worse under this approach for achieving serializable isolation than under weak isolation. This is most importantly due to the reduced concurrency.

Two-phase locking

Traditional relational databases don't limit the duration of a transaction, so when using ____________ to achieve serializable isolation, a queue may form if several transactions want to access the same object. For this reason, databases running this approach can have quite unstable latencies, and they can be very slow at high percentiles if there is contention in the workload.

Two-phase locking

With ____________, an approach to achieving serializable isolation, several transactions are allowed to concurrently read the same object as long as nobody is writing to it. But as soon anyone wants to write an object, exclusive access is required: 1. If transaction A has read an object and transaction B wants to write to that object, B must wait until A commits or aborts before it can continue. 2. If transaction A has written an object and transaction B wants to read that object, B must wait until A commits or aborts before it can continue.

Two-phase locking

____________ -- a type of race condition: for decades this has been the standard way of implementing serializability, but many applications avoid using it because of its performance characteristics.

Two-phase locking

____________ is an approach to achieving serializable isolation that doesn't perform well in general.

Two-phase locking

Things can go wrong in data systems: A client may read data that doesn't make sense because it has only partially been __________.

Updated

In a relational data model, a row in one table often has a foreign key references to a row in another table. (Similarly, in a graph-like data model, a vertex has edges to other vertices.) Multi-object transactions allow you to ensure that these references remain __________: inserting several records that refer to one another, the foreign keys have to be correct and up to date, or the data becomes nonsensical.

Valid

Concurrency bugs caused by ____________ transaction isolation have caused substantial loss of money, led to investigation by financial auditors, and caused customer data to be corrupted. Even many popular relational database systems (which are usually considered "ACID") use this isolation, so they wouldn't necessarily have prevented these bug from occurring.

Weak

____________ isolation levels (such as read committed) have good performance, but are prone to various race conditions (lost updates, write skews, phantom, etc.)

Weak

There are advantages to __________ transactional guarantees or abandoning them entirely (e.g., to achieve higher performance or higher availability).

Weakening

Serializable isolation has performance costs. It's therefore common for systems to use __________ levels of isolation, which protect against some concurrency issues but not all. These levels of isolation are much harder to understand, and they can lead to subtle bugs, but they are nevertheless used in practice.

Weaker

When using serial execution to achieve serializable isolation on the ____________, to avoid the system idling on a single long-running transaction by waiting for user input over a long checkout process that can span multiple requests, a transaction is committed within the same HTTP request -- a transaction does not span multiple requests. A new HTTP request starts a new transaction.

Web

When implementing serializable isolation, executing all transactions serially makes concurrency control much simpler, but limits the transaction throughput of the database to the speed of a single CPU core on a single machine. For applications with high __________ throughput, the single-threaded transaction processor can become a serious bottleneck

Write

Examples that follow a certain pattern are susceptible to ____________: 1. A SELECT query checks whether some requirement is satisfied by searching for rows that match some search condition (e.g., check if a username exists already) 2. Depending on the result of the first query, the application code decides how to continue (e.g., whether or not the username is available) 3. If the application decides to go ahead, it makes a write (INSERT, UPDATE, or DELETE) to the database and commits the transaction. The effect of this write changes the precondition of the decision of step 2. In other words, if you were to repeat the SELECT query from step 1 after committing the write, you would get a different result, because the write changed the set of rows matching the search condition. These steps may occur in a different order.

Write Skew

Unlike dirty writes and lost updates, a _____________ involves concurrent transactions updating different objects.

Write skew

____________ -- a type of race condition -- occurs when a transaction reads something (typically involving multiple objects), makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true. Only serializable isolation prevents this anomaly.

Write skew

_____________ can occur if two transactions read the same objects, and then update some of those objects (different transactions may update different objects).

Write skew

Some databases allow you to configure constraints, which are then enforced by the database (e.g., uniqueness, foreign key constraints, or restrictions on a particular value). However, most databases do not have built-in support for a constraint that involves multiple objects. If they did, this might be a solution for some examples of ____________.

Write skews

In ACID, atomicity and isolation describe what the database should do if a client makes several __________ within the same transaction.

Writes

The database will __________ a transaction if the entire transaction succeeds.

Commit

___________ bugs are hard to find by testing, because such bugs are only triggered when you get unlucky with the timing. Such timing issues might occur very rarely, and are usually difficult to reproduce.

Concurrency

Atomicity, isolation, and durability are properties of the database, whereas __________ (in the ACID sense) is a property of the application. The application may rely on the database's atomicity and isolation properties in order to achieve this guarantee.

Consistency

In the context of ACID, __________ refers to an application-specific notion of the database being in a "good state".

Consistency

It is the application's responsibility to define its transactions correctly so that they preserve ACID __________. This is not something that the database can guarantee. In general, the application defines what data is valid or invalid -- the database only stores it.

Consistency

For MVCC-based snapshot isolation: When a transaction reads from the database, transaction IDs are used to decide which objects it can see and which are invisible. By carefully defining visibility rules, the database can present a _____________ snapshot of the database to the application: 1. At the start of each transaction, the database makes a list of all the other transactions that are in progress at that time. Any writes that those transaction have made are ignored. 2. Any writes made by aborted transactions are ignored 3. Any writes made by transactions with a later transaction ID are ignored, regardless of whether those transaction have committed. 4. All other writes are visible to the application's queries These rules apply to both creation and deletion of objects

Consistent

Things can go wrong in data systems: The application may __________ at any time (including halfway through a series of operations)

Crash

Atomic operations, which are useful for solving lost update problems) are usually implemented by taking an exclusive lock on the object when it is read so that no other transaction can read it until the update has been applied. This technique is known as _____________.

Cursor stability

In 2PL, it can happen quite easily that transaction A is stuck waiting for transaction B to release its lock, and vice versa. This situation is called ____________. The database automatically detects when this happens between transactions and aborts one of them so that the others can make progress. The aborted transaction needs to retried by the application.

Deadlock

In Two-phase locking, when a transaction is aborted due to ____________ and is retried, it needs to do its work all over again. If this situation is frequent, this can mean significant wasted effort.

Deadlock

If a transaction needs to update several objects (multi-object transaction), a ____________ means that another transaction may see some of the updates but not others.

Dirty Read

____________ -- a type of race condition -- occur when one client reads another client's writes before they have been committed. The read committed isolation level and stronger levels prevent dirty reads.

Dirty Reads

Imagine a transaction has written some data to the database, but the transaction has not yet committed or aborted. Can another transaction see the uncommitted data? If yes, that is called a ____________.

Dirty read

If a transaction aborts, any writes it has made need to be rolled back. If the database allows ____________, that means a transaction may see data that is later rolled back.

Dirty reads

To satisfy read committed isolation, most databases prevent _____________ by using the following process: for every object that is written, the database remembers both the old committed value and the new value set by the transaction that currently holds the write lock. While the transaction is ongoing, any other transactions that read the object are simply given the old value. Only when the new value is committed do transactions switch over to reading the new value.

Dirty reads

Transactions running at the read committed isolation must prevent ____________. This means that any writes by a transaction only becomes visible to others when that transaction commits (and then all of its writes become visible at once).

Dirty reads

Consider when two transactions concurrently try to update the same object in a database. If the earlier write is part of a transaction that has not yet committed and the later write (from another transaction) can overwrite the uncommitted value of the first transaction; this is known as a __________. Transactions running at the read committed isolation level must prevent this situation, usually by delaying the second write until the first write's transaction has committed or aborted.

Dirty write

____________ -- a type of race condition -- occur when one client overwrites data that another client has written, but not yet committed. Almost all transaction implementations prevent dirty writes.

Dirty writes

ACID __________ is the promise that once a transaction has committed successfully, any data it has written will not be forgotten, even if there is a hardware fault or the database crashes.

Durability

Historically, __________ meant writing to an archive tape (hard drive). Then it was understood as writing to a disk or SSD. More recently, it has been adapted to mean replication. In practice, there is no one technique that can provide absolute guarantees. There are only various risk-reduction techniques, including writing to disk, replicating to remote machines, and backups -- and they can and should be used together.

Durability

In a replicated database, __________ may mean that the data has been successfully copied to some number of nodes. In order to provide a durability guarantee, a database must wait until these replications are complete before reporting a transaction as successfully committed.

Durability

In a single-node database, __________ typically means that the data has been written to nonvolatile storage such as a hard drive or SSD. It usually also involves a write-ahead log or similar, which allows recovery in the event that the data structures on disk are corrupted. In order to provide a durability guarantee, a database must wait until these writes are complete before reporting a transaction as successfully committed.

Durability

Perfect __________ does not exist: if all your hard disks and all your backups are destroyed at the same time, there's obviously nothing your database can do to save you.

Durability

The lock used in Two-phase locking is used as follows: If a transaction first reads and then writes the object, it may upgrade its shared lock to an ____________ lock.

Exclusive

The lock used in Two-phase locking is used as follows: If a transaction wants to write to an object, it must first acquire the lock in ____________ mode. No other transaction may hold the lock at the same time (in any mode), so if there is any existing lock on the object, the transaction must wait.

Exclusive

Things can go wrong in data systems: The database software or hardware may __________ at any time (including in the middle of a write operation).

Fail

Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: If the transaction actually succeeded, but the network failed while the server tried to a acknowledge the successful commit to the client (so the client thinks it __________), then retrying the transaction causes it to be performed twice.

Failed

Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: if the client process ____________ while retrying, any data it was trying to write to the database is lost.

Fails

Transactions were create namely to simply the programming model for application accessing a database. By sing transactions, the application is free to ignore certain potential error scenarios and concurrency issues, because the database takes care of them instead; we call these safety __________.

Guarantees

Unfortunately, predicate locks do not perform well: if there are many lock by active transactions, checking for matching locks becomes time-consuming. For that reason, most databases with 2PL actually implement ____________ locking (also known as next-key locking), which is a simplified approximation of predicate locking.

Index-range

____________ locking is a approximation of predicate locking that matches a greater set of objects (typically covered by an index) than covered by any predicate. In other words, an approximation of the search condition is attached to one of the indexes.

Index-range

For working with _____________ with MVCC-based snapshot isolation, one option is to have the index simply point to all versions of an object and require an index query to filter out any object versions that are not visible to the current transaction. When garbage collection removes old object versions that are no longer visible to any transaction, the corresponding index entries can also be removed.

Indexes

Things can go wrong in data systems: __________ in the network can unexpectedly cut off the application from the database, or one database node from another.

Interruptions

The idea of ACID consistency is that you have certain statements about you data, called __________, that must always be true -- for example, in an accounting system, credits and debits across all accounts must always be balanced. If a transaction starts with a database that is valid according to these conditions, and any writes during the transaction preserve the validity, then you can be sure that these conditions are always satisfied.

Invariants

(Unlike in multi-threading programming) In the context of ACID, atomicity is not about concurrency. That is covered under __________.

Isolation

Databases have long tried to hide concurrency issues from application developers by providing transaction __________, which should in theory let you pretend that no concurrency is happening.

Isolation

With ACID __________, concurrently running transactions shouldn't interfere with each other. For example, if one transaction makes several writes, then another transaction should see either all or none of those writes, but not some subset.

Isolation

__________ in the sense of ACID means that concurrently executing transactions are isolated from each other: they cannot step on each other's toes.

Isolation

When implementing serializable isolation using actual serial execution, if you find a way of partitioning your dataset so that each transaction only needs to read and write data within a single transaction, then each partition can have its own transaction processing thread running independently from the others. This way, you can give each CPU core its own partition, which allows your transaction throughput to scale __________ with the number of CPU cores.

Lineraly

Another option for preventing lost updates is for applications to explicitly _____________ objects that are going to be updated: if any other transactions tried to concurrently read the same object, it is forced to wait until the first read-modify-cycle has completed. This works, but to get it write, you need to carefully think about your application code -- it is easy to introduce race conditions.

Lock

If you can't use a serializable isolation level to solve write skews, the second best option in some cases is probably to explicitly ____________ the rows that the transaction depends on (e.g., using FOR UPDATE with SQL)

Lock

Incrementing a counter or updating an account balance is an example of writes that can suffer from the _____________ problem

Lost update

The _____________ problem can occur if an application reads some value from the database, modifies it, and writes back the modified value (a read-modify-write cycle). If two transactions do this concurrently, one of the modifications can be lost, because the write does not include the first modification. (We sometimes say that the later write clobber the earlier write).

Lost update

____________ -- a type of race condition -- occur when two clients concurrently perform a read-modify-write cycle. One overwrites the other's write without incorporating its changes, so data is lost. Some implementations of snapshot isolation prevent this anomaly automatically, while others require a manual lock (SELECT FOR UPDATE).

Lost updates

Storage engines that support snapshot isolation typically use _____________ for their read committed isolation level as well; a typical approach is that read committed isolation uses a separate snapshot for each query, while snapshot isolation uses the same snapshot for an entire transaction.

MVCC

____________ is an approach for preventing write skews with Snapshot isolation. It takes a phantom and turns it into a lock conflict on a concrete set of rows that exist in the database (i.e., instead of a query that checks for an absence of rows). However, this approach can be hard and error-prone to implement correctly. A serializable isolation level is much preferable in most cases.

Materializing conflicts

ACID isolation and atomicity guarantees typically assume that you want to modify several objects (rows, documents, records) at once (i.e., several writes in a single transaction). Such __________ transactions are often needed if several pieces of data need to be kept in sync.

Multi-object

In a document data model, the fields that need to be updated together are often within the same document, which is treated as a single object -- no multi-object transactions are needed when updating a single document. However, document databases lacking join functionality also encourage denormalization. When denormalized information needs to be updated, you need to update several documents in one go. __________ transactions are very useful in this situation to prevent denormalized data from going out of sync.

Multi-object

Many distributed datastores have abandoned __________ transactions because they are difficult to implement across partitions, and they can get in the way in some scenarios. (However, there is nothing that fundamentally prevents transactions in a distributed database)

Multi-object

To implement snapshot isolation, the database must potentially keep several different committed versions of an object, because various in-progress transactions may need to see the state of the database at different points in time. This technique is known as _____________ (MVCC)

Multi-version concurrency control

For preventing write skews, atomic single-object operations don't help, as _____________ objects are involved.

Multiple

For MVCC-based snapshot isolation: an _____________ is visible if both of the following conditions are true: - At the time when the reader's transaction started, the transaction that created the object had already committed. - The object is not marked for deletion, or if it is, the transaction that requested deletion had not yet committed at the time when the reader's transaction started.

Object

Conceptually, all the reads and writes in in a transaction are executed as __________.

One operation

If there is enough spare capacity and if contention between transactions is not too high, ____________ concurrency control techniques tend to perform better than pessimistic ones.

Optimistic

Serializable snapshot isolation is an ____________ concurrency control technique. Meaning that instead of blocking if something potentially dangerous happens, transactions continue anyway, in the hope that every thing will turn out all right. When a transaction wants to commit, the database checks whether anything bad happened (i.e., whether isolation was violated. If so, the transaction is aborted and has to be retried. Only transactions that executed serializable are allowed to commit.

Optimistic

____________ concurrency control (used by Serializable snapshot isolation) performs badly if there is high contention (many transactions trying to access the same objects), as this leads to high proportion of transaction needing to abort. The additional transaction load from retired transactions can make performance worse.

Optimistic

Things can go wrong in data systems: Several clients may write to the database at the same time, __________ each other's changes.

Overwriting

With transactions, error handling becomes much simpler for an application, because it doesn't need to worry about __________.

Partial failures

Two-phase locking and serial execution are so-called ____________ concurrency control mechanisms: they based on the principle that if anything might possible go wrong (as indicated by a lock held by another transaction in 2PL), it's better to wait until the situation is safe again before doing anything. It is like mutual exclusion, which is used to protect data structures in multi-threaded programming.

Pessimistic

A ____________ is where a write in one transaction changes the result of a search query in another transaction.

Phantom

____________ -- a type of race condition -- occur when a transaction reads objects that match some search condition. Then, another client makes a write that affects the results of that search. Snapshot isolation prevents straightforward phantom reads, but phantoms in the context of write skew require special treatment, such as index-range locks.

Phantom reads

A ____________ lock restricts access as follows: 1. If transaction A wants to read objects matching some condition, like in that SELECT query, it must acquire a shared-mode lock on the conditions of the query. If another transaction B currently has an exclusive lock on any object matching those conditions, A must wait until B releases its lock before it is allowed to make its query. 2. If transaction A wants to insert, update, or delete any object, it must first check whether either the old or the new value matches any existing predicate lock. If there is a matching predicate lock held by transaction B, then A must wait until B has committed or aborted before it can continue.

Predicate

A ____________ lock works similarly to the shared/exclusive lock but belongs to all objects that match some search condition (instead of a single object).

Predicate

If two-phase locking includes ____________ locks, the database prevents all forms of write skew and other race conditions, and so its isolation becomes serializable

Predicate

For write skews (race condition) in snapshot isolation, the transaction is taking an action based on a ____________ (a fact that was true at the beginning of the transaction, e.g., "There are currently two doctors on call"). Later, when the transaction wants to commit, the original data may have changed -- the original condition may no longer be true.

Premise

Most databases are accessed by several clients at the same time. If these clients are accessing the same database records, you can run into concurrency problems, also known as __________

Race conditions

Things can go wrong in data systems: __________ between clients can cause surprising bugs.

Race conditions

___________ only come into play when one transaction reads data that is concurrently modified by another transaction, or when two transactions try to simultaneously modify the same data.

Race conditions

Read Skew is considered acceptable under _____________ isolation.

Read committed

________ transaction isolation does not prevent the race condition between two counter increments (e.g., copy-modify-update), where transaction A increments and writes the counter (42 + 1 = 43) after transaction B reads the old value (42) then B increments and writes (42 + 1 = 43) after A commits (counter is 43 but should be 44 since it was incremented twice). In this case, the second write (by B) happens after the first transaction (A) has committed , so it's not a dirty write. It's still incorrect tho.

Read committed

____________ is the most basic level of transaction isolation. It makes two guarantees: 1. When reading from the database, you will only see data that has been committed (no dirty reads) 2. When writing to the database, you will only overwrite data that has been committed (no dirty writes)

Read committed

For preventing dirty reads, the approach of requiring ____________ does not work well in practice, because one long-running write transaction can force many read-only transactions to wait until the long-running transaction has completed.

Read locks

____________ (non-repeatable reads) -- a type of race condition -- occurs when a client sees different parts of the database at different points in time. This issue is most commonly prevented with snapshot isolation, which allows a transaction to read from a consistent snapshot at one point in time. It is usually implemented with multi-version concurrency control (MVCC).

Read skew

_____________ (a.k.a nonrepeatable read) happens when a transaction A reads a value set by another transaction B that started after it but committed before it (i.e., A starts -> B starts -> B commits X -> A reads X).

Read skew

Some situations cannot tolerate _____________ : Taking a backup requires making a copy of the entire database, which may take hours on a large database. During the time that the backup process is running, writes will continue to be made to the database. Thus, you could end up with some parts of the backup containing an older version of the data, and other parts containing a newer version. If you need to restore from such a backup, the inconsistencies (such as disappearing money) become permanent.

Read skews

Some situations cannot tolerate _____________ : you may want to run a query that scans over large parts of the database (i.e., for an analytic query or for an integrity check). These queries are likely to return nonsensical results if they observe parts of the database at different points in time.

Read skews

Read committed and snapshot isolation levels are primarily about the guarantees of what a _____________ transaction can see in the presence of concurrent writes.

Read-only

Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: if the error is due to overload, retrying the transaction will make the problem worse, not better. To avoid such feedback cycles, you can limit the number of _________, use exponential backoff, and handle overload-related errors differently from other errors (if possible).

Retries

The database will __________ (or abort) a transaction if it fails.

Rollback

Most commonly, databases prevent dirty writes by using __________ locks: when a transaction wants to modify a particular object (row or document), it must first acquire a lock on that object. It must then hold the lock until the transaction is committed or aborted.

Row-level

Databases that don't provide transactions sometime offer an atomic _____________ operation: the purpose of this operation is to avoid lost updates by allowing an update to happen only if the value has not changed since you last read it. However, it's important to check whether this operation is safe with your database before relying on it; for instance, check if the database allows the WHERE clause to read from an old snapshot.

compare-and-set

In most ACID databases, if the database is in danger of violating its guarantee of atomicity, isolation, or durability, it would rather abandon the transaction entirely than allow it to remain half-finished. However, datastores with __________ replication work much more on a "best effort" basis, which could be summarized as "the database will do as much as it can, and if it runs into an error, it won't undo something it has already done" -- so it's the application's responsibility to recover from errors.

leaderless

A common approach to prevent _____________ in multi-leader and leaderless replicated databases is to allow concurrent writes to create several conflicting versions of a value (also known as siblings), and to use application code or special data structures to resolve and merge these versions after the fact.

lost updates

In replicated databases, preventing _____________ takes on another dimension: since they have copies of the data on multiple nodes, and the data can potentially be modified concurrently on different nodes, some additional steps need to be taken.

lost updates

Options for preventing _____________ include: atomic operations, explicit locking, automatically detection, compare-and-set operations, and conflict resolution in replicated databases.

lost updates

A key idea with ____________ locks is that it applies even to objects that do not yet exist in the database, but which might be added in the future (phantoms).

predicate

Object-relational mapping frameworks make it easy to accidentally write code that performs unsafe _____________ cycles instead of using atomic operations provided by the database.

read-modify-write

Databases with multi-leader or leaderless replication usually cannot guarantee that there is a _____________ copy of the data. Thus, techniques based on locks or compare-and-set do not apply in this context.

single up-to-date

A key principle of _____________ isolation is readers never block writers, and writers never block readers. This allows a database to handle long-running read queries on a consistent snapshot at the same time as processing writes normally, without any lock contention between the two.

snapshot

Like read committed isolation, implementations of _____________ typically use write locks to prevent dirty writes, which means that a transaction that makes a write can block the progress of another transaction that writes to the same object. However, reads do not require any locks.

snapshot isolation


संबंधित स्टडी सेट्स

11/4 Quiz: EXCLUSIONS, RESTRICTIONS, RIDERS AND PROHIBITED PROVISIONS

View Set

PMP Rita - Chapter 11: Risk Management

View Set

Week 7 Check Your Understanding Assignment PATHO: 370

View Set