Chapter 7: Transactions
After a transaction has acquired the lock, it must continue to hold the lock until ____________. This where the name "two-phase" comes from: the first phase is when the lock is acquired, and the second phase is when all the locks are released.
(the) end of the transaction
Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: If the transaction also has side effects outside of the databases, those side effects may happen even if the transaction is ____________.
Aborted
The rate of ____________ significantly affects the overall performance of SSI. For example, a transaction that reads and writes data over a long period of time is likely to run into conflicts and abort, so SSI requires that read-write transaction be fairly short (long-running read-only transactions may be okay). (However, SSI is probably less sensitive to slow transactions than two-phase locking or serial execution.)
Aborts
For examples that check for the ____________ of rows matching some search condition and then adds a row matching the same condition, a query that adds locks to a row (e.g., SELECT FOR UPDATE) cannot be used because the query is looking to update only when there aren't any rows that match the condition, and the lock can be attached to nothing. These examples are susceptible to write skew.
Absence
Contention between transactions can be reduced with commutative ____________ operations. (e.g., increment counter)
Atomic
In general, __________ refers to something that cannot be broken down into smaller parts
Atomic
Many databases provide _____________ update operations, which remove the need to implement read-modify-write cycles in application code. These operations are usually the best solution to lost update problems.
Atomic
ACID __________ describes what happens if a client wants to make several writes, but a fault occurs after some of the writes have been processed -- for example, a process crashes, a network connection is interrupted, a disk becomes full, or some integrity constraint is violated.
Atomicity
ACID __________ guarantees that if the writes are grouped together into an atomic transaction, and the transaction cannot be competed (committed) due to a fault, then the transaction is aborted and the database must discard or undo any undo any writes it has made so far in that transaction.
Atomicity
With ACID __________, if an error occurs halfway through a sequence of writes, the transaction should be aborted, and the writes made up to that point should be discarded. In other words, the database saves you from having to worry about partial failure, by giving an all-or-nothing guarantee.
Atomicity
Without ACID __________, if an error occurs partway through making multiple changes, it's difficult to know which changes have take effect and which haven't. The application could try again, but that risks making the same change twice, leading to duplicate or incorrect data. With this guarantee, if a transaction was aborted, the application can be sure that it didn't change anything, so it can safely be retired.
Atomicity
The safety guarantees provided by transactions are often described by the well-known acronym ACID, which stands for __________.
Atomicity, Consistency, Isolation, and Durability
Systems that do not meet the ACID criteria are sometimes called BASE, which stands for __________.
Basically Available, Soft state, and Eventual consistency
To figure out whether your application needs transactions, you first need to understand exactly what __________ transactions can provide, and what costs are associated with them.
Safety guarantees
Dirty writes and lost updates are two kinds of race conditions that can occur when different transactions concurrently try to write to the _____________ objects.
Same
In the case when different transactions update the _____________ object, you get a dirty write or lost update anomaly (depending on the timing).
Same
In a database with __________ (almost everything except pure key-value stores), the indexes also need to be updated every time you change a value. These indexes are different database objects from a transaction point of view: for example, without transaction isolation, it's possible for a record to appear in one index but not another, because the update to the second index hasn't happened yet.
Secondary Indexes
(Serializable Isolation): Simple key-value data can often be partitioned very easily (and hence use the approach of partitioning + actual serial execution on a single thread [per partition] to achieve higher transaction throughput), but data with multiple __________ is likely to require a lot of cross-partition coordination.
Secondary indexes.
Literally executing transactions in ____________ order -- an approach to implementing serializable transactions: if you can make each transaction very fast to execute, and the transaction throughput is low enough to process on a single CPU core, this is a simple and effective option.
Serial
__________ of transactions can become a viable way of achieving serializable isolation within certain constraints: - Every transaction must be small and fast, because it takes only one slow transaction to stall all transaction processing. - It is limited to use cases where the active dataset can fit in memory. Rarely accessed data could potentially be moved to disk, but if it needed to be accessed in a single-threaded transaction, the system would get very slow. - Write throughput must be low enough to be handled on a single CPU core, or else transactions need to be partitioned without requiring cross-partition coordination - Cross-partition transactions are possible, but there is a hard limit to the exent to which they can be used.
Serial Execution
The classic database textbooks formalize ACID isolation as __________, which means that each transaction can pretend that it is the only transaction running on the entire database. The database ensures that when the transactions have committed, the result is the same as if they had run serially (one after another), even though in reality they may have run concurrently.
Serializability
A database with ____________ isolation must prevent phantoms
Serializable
An advantage of implementing ____________ isolation using actual serial execution is that a system designed for single-threaded execution can sometimes perform better than system supports concurrency, because it can avoid the coordination overhead of locking.
Serializable
Automatically preventing write skew requires true _____________ isolation.
Serializable
In practice, __________ isolation is rarely used, because it carries a performance penalty. Some popular databases typically offer a weaker isolation guarantee, such as snapshot isolation.
Serializable
Most databases that provide ____________ isolation today use one of three techniques: 1. Literally executing transactions in serial order. 2. Two-phase locking 3. Optimistic concurrency control techniques
Serializable
The simplest way of avoiding concurrency problems is to remove the concurrency entirely: to execute only one transaction at a time, in serial order, on a single thread. The resulting isolation is by definition ____________.
Serializable
This disadvantage of implementing ____________ isolation using actual serial execution is that the system's throughput is limited to that of single CPU core. In order to make the most of that single thread, transactions need to be structured differently from their traditional form.
Serializable
When using partitioning and actual serial execution on a single thread (for each partition), for any transaction that needs to access multiple partitions, the database must coordinate the transaction across all the partitions that it touches, to ensure __________ isolation across the whole system.
Serializable
With ____________ isolation, the database guarantees that if the transactions behave correctly when run individually, they continue to be correct when run concurrently; i.e., the database prevents all race conditions.
Serializable
With stored procedures and in-memory data (i.e., all data in the transaction fits in memory), executing all transactions on a single thread become feasible. As they don't need to wait for I/O, systems using this approach to implement __________ isolation avoid the overhead of other concurrency control mechanisms and can achieve quite good throughput on a single thread
Serializable
__________ isolation means that the database guarantees that transactions have the same effect as if they ran serially (i.e., one at a time, without any concurrency)
Serializable
____________ isolation guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed one at a time, serially, without any concurrency.
Serializable
____________ isolation is usually regarded as the strongest isolation level.
Serializable
____________ (SSI) is a very promising approach to full serializability (i.e., serializable isolation) that has only a small performance penalty compared to snapshot isolation, a weaker isolation level.
Serializable Snapshot Isolation
Compared to serial execution, ____________ isolation is not limited to throughput of a single CPU core.
Serializable snapshot
Compared to two-phase locking, the big advantage of ____________ isolation is that one transaction doesn't need to block waiting for locks held by another transaction. Like under snapshot isolation, writers don't block readers, and vice versa. This design principle makes query latency much more predictable and less variable. In particular, read-only queries can run on a consistent snapshot without requiring any locks, which is very appealing for read-heavy workloads.
Serializable snapshot
For detecting writes that affect prior reads with ____________ isolation, when a transaction writes to the database, it must look in the indexes for any other transactions that have recently read the affected data. This process is similar to acquiring a write lock on the affected key range, but rather than blocking until the readers have committed, the lock acts as a tripwire: it simply notifies the transactions that the data they read may no longer be up to date. (The later ending write transaction is aborted)
Serializable snapshot
In order to prevent reads on a stale MVCC object with ____________ isolation, when the transaction wants to commit, the database checks whether any of the ignored writes (on concurrent transactions were in-progress but not committed when the transaction started) have now been committed. If so, the transaction must be aborted. Why wait until committing? At the time when the transaction makes a stale read, the database doesn't yet know whether that transaction is going to later perform a write. By avoiding unnecessary aborts, SSI preserves snapshot isolation's support for long-running reads from a consistent snapshot.
Serializable snapshot
In order to provide ____________ isolation, the database must detect situation in which a transaction may have acted on an outdated premise and abort the transaction in that case. There are two cases to consider to know if a query result might have changed: 1. Detecting reads on a stale MVCC object version (uncommitted write occurred before the read). 2. Detecting writes that affect prior reads (the write occurs after the read).
Serializable snapshot
____________ -- a type of race condition: a fairly new algorithm that avoids most of the downsides of the previous approaches. It uses an optimist approach, allowing transactions to proceed without blocking. When a transaction wants to commit, it is checked, and it is aborted if the execution was not serializable.
Serializable snapshot isolation (SSI)
The lock used in Two-phase locking is used as follows: If a transaction wants to read an object, it must first acquire the lock in ____________ mode. Several transactions are allowed to hold the lock in this mode simultaneously, but if another transaction already has an exclusive lock on the object, these transactions must wait.
Shared
Storage engines almost universally aim to provide atomicity and isolation on the level of a __________ (such as a key-value pair) on one node. Atomicity can be implemented using a log for crash recovery, and isolation can be implemented using a lock on each object (allowing only one thread to access an object at any one time).
Single object
__________ operations (such as increment and compare-and-set) are useful, as they can prevent lost updates when several clients try to write to the same object concurrently. However, they are not transactions in the usual sense of the word.
Single-object
In (PostgresSQL) MVCC-based _____________ isolation, if a transaction deletes a row, the row isn't actually deleted, but it is marked for deletion. At some later time, when it is certain that no transaction can any longer access the deleted data, a garbage collection process in the database removes any rows marked for deletion and frees this space.
Snapshot
In 2PL (Two-Phase Locking), writes block both other writers and readers and readers block other readers and writers. ____________ isolation, on the other hand, has the mantra readers never block writers, and writers never block readers. On the other hand, because 2PL provides serializability, it protects against race conditions like lost updates and write skew, which is not guaranteed with weaker isolation levels.
Snapshot
It is very hard to reason about the meaning of a long-running, read-only query if the data on which it operates is changing at the same time as the query is executing. The solution for this is to enforce _____________ isolation; when a transaction can see a consistent snapshot of the database, frozen at a particular point in time, it is much easier to understand.
Snapshot
____________ isolation avoids phantoms in read-only queries, but in read-write transactions, phantoms can lead to particularly tricky cases of write skew.
Snapshot
_____________ isolation is a useful isolation level, especially for read-only transactions. However, many databases call it by different names (e.g., serializable in Oracle, and repeatable read in MySQL)
Snapshot
_____________ isolation is the most common solution to solving problems of temporary inconsistencies (such as read skews).
Snapshot
The idea behind _____________ isolation is that each transaction reads from a consistent snapshot of the database -- that is, the transaction sees all the data that was committed in the database at the start of the transaction. Even if the data is subsequently changed by another transaction, each transaction sees only the data rom that particular point in time.
Snapshot (isolation)
Another options for preventing lost updates is to allow the read-modify-write cycles to execute in parallel, an if the transaction manager detects a lost update, abort the transaction and force it to retry its read-modify-write cycle. The advantage of this approach is that databases can perform this check efficiently in conjunction with _____________. This lost update detection is a great feature, because it doesn't require application code.
Snapshot isolation
A pro of ____________ is that modern implementations use existing general-purpose programming languages (i.e. VoltDB uses Java or Groovy).
Stored Procedures
Systems with single-threaded serial transaction processing don't allow interactive multi-statement transactions. Instead, the application must submit the entire transaction code to the database, as a ____________.
Stored procedure
A con of ____________ is that each database vendor typically has its own language for these procedures that look quite ugly and archaic. Furthermore, code running a database i difficult to manage, debug, and version control.
Stored procedures
A __________ is a way for an application to group several reads and writes into a logical unit.
Transaction
A __________ is usually understood as a mechanism for grouping multiple operations on multiple objects into one unit of execution.
Transaction
An application with simple access patterns, such as reading and writing only a single record, can probably manage without ____________.
Transactions
For complex access patterns, ____________ can hugely reduce the number of potential error cases you need to think about.
Transactions
Without ____________, various error scenarios (process crashing, network interruptions, power outages, disk full, unexpected concurrency, etc.) mean that data can become inconsistent in various ways.
Transactions
____________ are an abstraction layer that allows an application to pretend that certain concurrency problems and certain kinds of hardware and software faults don't exist.
Transactions
Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: it is only worth retrying after ____________ errors (for example due to deadlock, isolation violation, temporary network interruptions, and failover); after a permanent error (e.g., constraint violation) a retry would be pointless.
Transient
In ____________ (2PL), a lock is maintained on each object (e.g., row) in the database. The lock can either be in shared mode or in exclusive mode:
Two-phase locking
The big downside of ____________ is that transaction throughput and response times of queries are significantly worse under this approach for achieving serializable isolation than under weak isolation. This is most importantly due to the reduced concurrency.
Two-phase locking
Traditional relational databases don't limit the duration of a transaction, so when using ____________ to achieve serializable isolation, a queue may form if several transactions want to access the same object. For this reason, databases running this approach can have quite unstable latencies, and they can be very slow at high percentiles if there is contention in the workload.
Two-phase locking
With ____________, an approach to achieving serializable isolation, several transactions are allowed to concurrently read the same object as long as nobody is writing to it. But as soon anyone wants to write an object, exclusive access is required: 1. If transaction A has read an object and transaction B wants to write to that object, B must wait until A commits or aborts before it can continue. 2. If transaction A has written an object and transaction B wants to read that object, B must wait until A commits or aborts before it can continue.
Two-phase locking
____________ -- a type of race condition: for decades this has been the standard way of implementing serializability, but many applications avoid using it because of its performance characteristics.
Two-phase locking
____________ is an approach to achieving serializable isolation that doesn't perform well in general.
Two-phase locking
Things can go wrong in data systems: A client may read data that doesn't make sense because it has only partially been __________.
Updated
In a relational data model, a row in one table often has a foreign key references to a row in another table. (Similarly, in a graph-like data model, a vertex has edges to other vertices.) Multi-object transactions allow you to ensure that these references remain __________: inserting several records that refer to one another, the foreign keys have to be correct and up to date, or the data becomes nonsensical.
Valid
Concurrency bugs caused by ____________ transaction isolation have caused substantial loss of money, led to investigation by financial auditors, and caused customer data to be corrupted. Even many popular relational database systems (which are usually considered "ACID") use this isolation, so they wouldn't necessarily have prevented these bug from occurring.
Weak
____________ isolation levels (such as read committed) have good performance, but are prone to various race conditions (lost updates, write skews, phantom, etc.)
Weak
There are advantages to __________ transactional guarantees or abandoning them entirely (e.g., to achieve higher performance or higher availability).
Weakening
Serializable isolation has performance costs. It's therefore common for systems to use __________ levels of isolation, which protect against some concurrency issues but not all. These levels of isolation are much harder to understand, and they can lead to subtle bugs, but they are nevertheless used in practice.
Weaker
When using serial execution to achieve serializable isolation on the ____________, to avoid the system idling on a single long-running transaction by waiting for user input over a long checkout process that can span multiple requests, a transaction is committed within the same HTTP request -- a transaction does not span multiple requests. A new HTTP request starts a new transaction.
Web
When implementing serializable isolation, executing all transactions serially makes concurrency control much simpler, but limits the transaction throughput of the database to the speed of a single CPU core on a single machine. For applications with high __________ throughput, the single-threaded transaction processor can become a serious bottleneck
Write
Examples that follow a certain pattern are susceptible to ____________: 1. A SELECT query checks whether some requirement is satisfied by searching for rows that match some search condition (e.g., check if a username exists already) 2. Depending on the result of the first query, the application code decides how to continue (e.g., whether or not the username is available) 3. If the application decides to go ahead, it makes a write (INSERT, UPDATE, or DELETE) to the database and commits the transaction. The effect of this write changes the precondition of the decision of step 2. In other words, if you were to repeat the SELECT query from step 1 after committing the write, you would get a different result, because the write changed the set of rows matching the search condition. These steps may occur in a different order.
Write Skew
Unlike dirty writes and lost updates, a _____________ involves concurrent transactions updating different objects.
Write skew
____________ -- a type of race condition -- occurs when a transaction reads something (typically involving multiple objects), makes a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true. Only serializable isolation prevents this anomaly.
Write skew
_____________ can occur if two transactions read the same objects, and then update some of those objects (different transactions may update different objects).
Write skew
Some databases allow you to configure constraints, which are then enforced by the database (e.g., uniqueness, foreign key constraints, or restrictions on a particular value). However, most databases do not have built-in support for a constraint that involves multiple objects. If they did, this might be a solution for some examples of ____________.
Write skews
In ACID, atomicity and isolation describe what the database should do if a client makes several __________ within the same transaction.
Writes
The database will __________ a transaction if the entire transaction succeeds.
Commit
___________ bugs are hard to find by testing, because such bugs are only triggered when you get unlucky with the timing. Such timing issues might occur very rarely, and are usually difficult to reproduce.
Concurrency
Atomicity, isolation, and durability are properties of the database, whereas __________ (in the ACID sense) is a property of the application. The application may rely on the database's atomicity and isolation properties in order to achieve this guarantee.
Consistency
In the context of ACID, __________ refers to an application-specific notion of the database being in a "good state".
Consistency
It is the application's responsibility to define its transactions correctly so that they preserve ACID __________. This is not something that the database can guarantee. In general, the application defines what data is valid or invalid -- the database only stores it.
Consistency
For MVCC-based snapshot isolation: When a transaction reads from the database, transaction IDs are used to decide which objects it can see and which are invisible. By carefully defining visibility rules, the database can present a _____________ snapshot of the database to the application: 1. At the start of each transaction, the database makes a list of all the other transactions that are in progress at that time. Any writes that those transaction have made are ignored. 2. Any writes made by aborted transactions are ignored 3. Any writes made by transactions with a later transaction ID are ignored, regardless of whether those transaction have committed. 4. All other writes are visible to the application's queries These rules apply to both creation and deletion of objects
Consistent
Things can go wrong in data systems: The application may __________ at any time (including halfway through a series of operations)
Crash
Atomic operations, which are useful for solving lost update problems) are usually implemented by taking an exclusive lock on the object when it is read so that no other transaction can read it until the update has been applied. This technique is known as _____________.
Cursor stability
In 2PL, it can happen quite easily that transaction A is stuck waiting for transaction B to release its lock, and vice versa. This situation is called ____________. The database automatically detects when this happens between transactions and aborts one of them so that the others can make progress. The aborted transaction needs to retried by the application.
Deadlock
In Two-phase locking, when a transaction is aborted due to ____________ and is retried, it needs to do its work all over again. If this situation is frequent, this can mean significant wasted effort.
Deadlock
If a transaction needs to update several objects (multi-object transaction), a ____________ means that another transaction may see some of the updates but not others.
Dirty Read
____________ -- a type of race condition -- occur when one client reads another client's writes before they have been committed. The read committed isolation level and stronger levels prevent dirty reads.
Dirty Reads
Imagine a transaction has written some data to the database, but the transaction has not yet committed or aborted. Can another transaction see the uncommitted data? If yes, that is called a ____________.
Dirty read
If a transaction aborts, any writes it has made need to be rolled back. If the database allows ____________, that means a transaction may see data that is later rolled back.
Dirty reads
To satisfy read committed isolation, most databases prevent _____________ by using the following process: for every object that is written, the database remembers both the old committed value and the new value set by the transaction that currently holds the write lock. While the transaction is ongoing, any other transactions that read the object are simply given the old value. Only when the new value is committed do transactions switch over to reading the new value.
Dirty reads
Transactions running at the read committed isolation must prevent ____________. This means that any writes by a transaction only becomes visible to others when that transaction commits (and then all of its writes become visible at once).
Dirty reads
Consider when two transactions concurrently try to update the same object in a database. If the earlier write is part of a transaction that has not yet committed and the later write (from another transaction) can overwrite the uncommitted value of the first transaction; this is known as a __________. Transactions running at the read committed isolation level must prevent this situation, usually by delaying the second write until the first write's transaction has committed or aborted.
Dirty write
____________ -- a type of race condition -- occur when one client overwrites data that another client has written, but not yet committed. Almost all transaction implementations prevent dirty writes.
Dirty writes
ACID __________ is the promise that once a transaction has committed successfully, any data it has written will not be forgotten, even if there is a hardware fault or the database crashes.
Durability
Historically, __________ meant writing to an archive tape (hard drive). Then it was understood as writing to a disk or SSD. More recently, it has been adapted to mean replication. In practice, there is no one technique that can provide absolute guarantees. There are only various risk-reduction techniques, including writing to disk, replicating to remote machines, and backups -- and they can and should be used together.
Durability
In a replicated database, __________ may mean that the data has been successfully copied to some number of nodes. In order to provide a durability guarantee, a database must wait until these replications are complete before reporting a transaction as successfully committed.
Durability
In a single-node database, __________ typically means that the data has been written to nonvolatile storage such as a hard drive or SSD. It usually also involves a write-ahead log or similar, which allows recovery in the event that the data structures on disk are corrupted. In order to provide a durability guarantee, a database must wait until these writes are complete before reporting a transaction as successfully committed.
Durability
Perfect __________ does not exist: if all your hard disks and all your backups are destroyed at the same time, there's obviously nothing your database can do to save you.
Durability
The lock used in Two-phase locking is used as follows: If a transaction first reads and then writes the object, it may upgrade its shared lock to an ____________ lock.
Exclusive
The lock used in Two-phase locking is used as follows: If a transaction wants to write to an object, it must first acquire the lock in ____________ mode. No other transaction may hold the lock at the same time (in any mode), so if there is any existing lock on the object, the transaction must wait.
Exclusive
Things can go wrong in data systems: The database software or hardware may __________ at any time (including in the middle of a write operation).
Fail
Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: If the transaction actually succeeded, but the network failed while the server tried to a acknowledge the successful commit to the client (so the client thinks it __________), then retrying the transaction causes it to be performed twice.
Failed
Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: if the client process ____________ while retrying, any data it was trying to write to the database is lost.
Fails
Transactions were create namely to simply the programming model for application accessing a database. By sing transactions, the application is free to ignore certain potential error scenarios and concurrency issues, because the database takes care of them instead; we call these safety __________.
Guarantees
Unfortunately, predicate locks do not perform well: if there are many lock by active transactions, checking for matching locks becomes time-consuming. For that reason, most databases with 2PL actually implement ____________ locking (also known as next-key locking), which is a simplified approximation of predicate locking.
Index-range
____________ locking is a approximation of predicate locking that matches a greater set of objects (typically covered by an index) than covered by any predicate. In other words, an approximation of the search condition is attached to one of the indexes.
Index-range
For working with _____________ with MVCC-based snapshot isolation, one option is to have the index simply point to all versions of an object and require an index query to filter out any object versions that are not visible to the current transaction. When garbage collection removes old object versions that are no longer visible to any transaction, the corresponding index entries can also be removed.
Indexes
Things can go wrong in data systems: __________ in the network can unexpectedly cut off the application from the database, or one database node from another.
Interruptions
The idea of ACID consistency is that you have certain statements about you data, called __________, that must always be true -- for example, in an accounting system, credits and debits across all accounts must always be balanced. If a transaction starts with a database that is valid according to these conditions, and any writes during the transaction preserve the validity, then you can be sure that these conditions are always satisfied.
Invariants
(Unlike in multi-threading programming) In the context of ACID, atomicity is not about concurrency. That is covered under __________.
Isolation
Databases have long tried to hide concurrency issues from application developers by providing transaction __________, which should in theory let you pretend that no concurrency is happening.
Isolation
With ACID __________, concurrently running transactions shouldn't interfere with each other. For example, if one transaction makes several writes, then another transaction should see either all or none of those writes, but not some subset.
Isolation
__________ in the sense of ACID means that concurrently executing transactions are isolated from each other: they cannot step on each other's toes.
Isolation
When implementing serializable isolation using actual serial execution, if you find a way of partitioning your dataset so that each transaction only needs to read and write data within a single transaction, then each partition can have its own transaction processing thread running independently from the others. This way, you can give each CPU core its own partition, which allows your transaction throughput to scale __________ with the number of CPU cores.
Lineraly
Another option for preventing lost updates is for applications to explicitly _____________ objects that are going to be updated: if any other transactions tried to concurrently read the same object, it is forced to wait until the first read-modify-cycle has completed. This works, but to get it write, you need to carefully think about your application code -- it is easy to introduce race conditions.
Lock
If you can't use a serializable isolation level to solve write skews, the second best option in some cases is probably to explicitly ____________ the rows that the transaction depends on (e.g., using FOR UPDATE with SQL)
Lock
Incrementing a counter or updating an account balance is an example of writes that can suffer from the _____________ problem
Lost update
The _____________ problem can occur if an application reads some value from the database, modifies it, and writes back the modified value (a read-modify-write cycle). If two transactions do this concurrently, one of the modifications can be lost, because the write does not include the first modification. (We sometimes say that the later write clobber the earlier write).
Lost update
____________ -- a type of race condition -- occur when two clients concurrently perform a read-modify-write cycle. One overwrites the other's write without incorporating its changes, so data is lost. Some implementations of snapshot isolation prevent this anomaly automatically, while others require a manual lock (SELECT FOR UPDATE).
Lost updates
Storage engines that support snapshot isolation typically use _____________ for their read committed isolation level as well; a typical approach is that read committed isolation uses a separate snapshot for each query, while snapshot isolation uses the same snapshot for an entire transaction.
MVCC
____________ is an approach for preventing write skews with Snapshot isolation. It takes a phantom and turns it into a lock conflict on a concrete set of rows that exist in the database (i.e., instead of a query that checks for an absence of rows). However, this approach can be hard and error-prone to implement correctly. A serializable isolation level is much preferable in most cases.
Materializing conflicts
ACID isolation and atomicity guarantees typically assume that you want to modify several objects (rows, documents, records) at once (i.e., several writes in a single transaction). Such __________ transactions are often needed if several pieces of data need to be kept in sync.
Multi-object
In a document data model, the fields that need to be updated together are often within the same document, which is treated as a single object -- no multi-object transactions are needed when updating a single document. However, document databases lacking join functionality also encourage denormalization. When denormalized information needs to be updated, you need to update several documents in one go. __________ transactions are very useful in this situation to prevent denormalized data from going out of sync.
Multi-object
Many distributed datastores have abandoned __________ transactions because they are difficult to implement across partitions, and they can get in the way in some scenarios. (However, there is nothing that fundamentally prevents transactions in a distributed database)
Multi-object
To implement snapshot isolation, the database must potentially keep several different committed versions of an object, because various in-progress transactions may need to see the state of the database at different points in time. This technique is known as _____________ (MVCC)
Multi-version concurrency control
For preventing write skews, atomic single-object operations don't help, as _____________ objects are involved.
Multiple
For MVCC-based snapshot isolation: an _____________ is visible if both of the following conditions are true: - At the time when the reader's transaction started, the transaction that created the object had already committed. - The object is not marked for deletion, or if it is, the transaction that requested deletion had not yet committed at the time when the reader's transaction started.
Object
Conceptually, all the reads and writes in in a transaction are executed as __________.
One operation
If there is enough spare capacity and if contention between transactions is not too high, ____________ concurrency control techniques tend to perform better than pessimistic ones.
Optimistic
Serializable snapshot isolation is an ____________ concurrency control technique. Meaning that instead of blocking if something potentially dangerous happens, transactions continue anyway, in the hope that every thing will turn out all right. When a transaction wants to commit, the database checks whether anything bad happened (i.e., whether isolation was violated. If so, the transaction is aborted and has to be retried. Only transactions that executed serializable are allowed to commit.
Optimistic
____________ concurrency control (used by Serializable snapshot isolation) performs badly if there is high contention (many transactions trying to access the same objects), as this leads to high proportion of transaction needing to abort. The additional transaction load from retired transactions can make performance worse.
Optimistic
Things can go wrong in data systems: Several clients may write to the database at the same time, __________ each other's changes.
Overwriting
With transactions, error handling becomes much simpler for an application, because it doesn't need to worry about __________.
Partial failures
Two-phase locking and serial execution are so-called ____________ concurrency control mechanisms: they based on the principle that if anything might possible go wrong (as indicated by a lock held by another transaction in 2PL), it's better to wait until the situation is safe again before doing anything. It is like mutual exclusion, which is used to protect data structures in multi-threaded programming.
Pessimistic
A ____________ is where a write in one transaction changes the result of a search query in another transaction.
Phantom
____________ -- a type of race condition -- occur when a transaction reads objects that match some search condition. Then, another client makes a write that affects the results of that search. Snapshot isolation prevents straightforward phantom reads, but phantoms in the context of write skew require special treatment, such as index-range locks.
Phantom reads
A ____________ lock restricts access as follows: 1. If transaction A wants to read objects matching some condition, like in that SELECT query, it must acquire a shared-mode lock on the conditions of the query. If another transaction B currently has an exclusive lock on any object matching those conditions, A must wait until B releases its lock before it is allowed to make its query. 2. If transaction A wants to insert, update, or delete any object, it must first check whether either the old or the new value matches any existing predicate lock. If there is a matching predicate lock held by transaction B, then A must wait until B has committed or aborted before it can continue.
Predicate
A ____________ lock works similarly to the shared/exclusive lock but belongs to all objects that match some search condition (instead of a single object).
Predicate
If two-phase locking includes ____________ locks, the database prevents all forms of write skew and other race conditions, and so its isolation becomes serializable
Predicate
For write skews (race condition) in snapshot isolation, the transaction is taking an action based on a ____________ (a fact that was true at the beginning of the transaction, e.g., "There are currently two doctors on call"). Later, when the transaction wants to commit, the original data may have changed -- the original condition may no longer be true.
Premise
Most databases are accessed by several clients at the same time. If these clients are accessing the same database records, you can run into concurrency problems, also known as __________
Race conditions
Things can go wrong in data systems: __________ between clients can cause surprising bugs.
Race conditions
___________ only come into play when one transaction reads data that is concurrently modified by another transaction, or when two transactions try to simultaneously modify the same data.
Race conditions
Read Skew is considered acceptable under _____________ isolation.
Read committed
________ transaction isolation does not prevent the race condition between two counter increments (e.g., copy-modify-update), where transaction A increments and writes the counter (42 + 1 = 43) after transaction B reads the old value (42) then B increments and writes (42 + 1 = 43) after A commits (counter is 43 but should be 44 since it was incremented twice). In this case, the second write (by B) happens after the first transaction (A) has committed , so it's not a dirty write. It's still incorrect tho.
Read committed
____________ is the most basic level of transaction isolation. It makes two guarantees: 1. When reading from the database, you will only see data that has been committed (no dirty reads) 2. When writing to the database, you will only overwrite data that has been committed (no dirty writes)
Read committed
For preventing dirty reads, the approach of requiring ____________ does not work well in practice, because one long-running write transaction can force many read-only transactions to wait until the long-running transaction has completed.
Read locks
____________ (non-repeatable reads) -- a type of race condition -- occurs when a client sees different parts of the database at different points in time. This issue is most commonly prevented with snapshot isolation, which allows a transaction to read from a consistent snapshot at one point in time. It is usually implemented with multi-version concurrency control (MVCC).
Read skew
_____________ (a.k.a nonrepeatable read) happens when a transaction A reads a value set by another transaction B that started after it but committed before it (i.e., A starts -> B starts -> B commits X -> A reads X).
Read skew
Some situations cannot tolerate _____________ : Taking a backup requires making a copy of the entire database, which may take hours on a large database. During the time that the backup process is running, writes will continue to be made to the database. Thus, you could end up with some parts of the backup containing an older version of the data, and other parts containing a newer version. If you need to restore from such a backup, the inconsistencies (such as disappearing money) become permanent.
Read skews
Some situations cannot tolerate _____________ : you may want to run a query that scans over large parts of the database (i.e., for an analytic query or for an integrity check). These queries are likely to return nonsensical results if they observe parts of the database at different points in time.
Read skews
Read committed and snapshot isolation levels are primarily about the guarantees of what a _____________ transaction can see in the presence of concurrent writes.
Read-only
Although retrying an aborted transaction is a simple and effective error handling mechanism, it isn't perfect: if the error is due to overload, retrying the transaction will make the problem worse, not better. To avoid such feedback cycles, you can limit the number of _________, use exponential backoff, and handle overload-related errors differently from other errors (if possible).
Retries
The database will __________ (or abort) a transaction if it fails.
Rollback
Most commonly, databases prevent dirty writes by using __________ locks: when a transaction wants to modify a particular object (row or document), it must first acquire a lock on that object. It must then hold the lock until the transaction is committed or aborted.
Row-level
Databases that don't provide transactions sometime offer an atomic _____________ operation: the purpose of this operation is to avoid lost updates by allowing an update to happen only if the value has not changed since you last read it. However, it's important to check whether this operation is safe with your database before relying on it; for instance, check if the database allows the WHERE clause to read from an old snapshot.
compare-and-set
In most ACID databases, if the database is in danger of violating its guarantee of atomicity, isolation, or durability, it would rather abandon the transaction entirely than allow it to remain half-finished. However, datastores with __________ replication work much more on a "best effort" basis, which could be summarized as "the database will do as much as it can, and if it runs into an error, it won't undo something it has already done" -- so it's the application's responsibility to recover from errors.
leaderless
A common approach to prevent _____________ in multi-leader and leaderless replicated databases is to allow concurrent writes to create several conflicting versions of a value (also known as siblings), and to use application code or special data structures to resolve and merge these versions after the fact.
lost updates
In replicated databases, preventing _____________ takes on another dimension: since they have copies of the data on multiple nodes, and the data can potentially be modified concurrently on different nodes, some additional steps need to be taken.
lost updates
Options for preventing _____________ include: atomic operations, explicit locking, automatically detection, compare-and-set operations, and conflict resolution in replicated databases.
lost updates
A key idea with ____________ locks is that it applies even to objects that do not yet exist in the database, but which might be added in the future (phantoms).
predicate
Object-relational mapping frameworks make it easy to accidentally write code that performs unsafe _____________ cycles instead of using atomic operations provided by the database.
read-modify-write
Databases with multi-leader or leaderless replication usually cannot guarantee that there is a _____________ copy of the data. Thus, techniques based on locks or compare-and-set do not apply in this context.
single up-to-date
A key principle of _____________ isolation is readers never block writers, and writers never block readers. This allows a database to handle long-running read queries on a consistent snapshot at the same time as processing writes normally, without any lock contention between the two.
snapshot
Like read committed isolation, implementations of _____________ typically use write locks to prevent dirty writes, which means that a transaction that makes a write can block the progress of another transaction that writes to the same object. However, reads do not require any locks.
snapshot isolation