CS3200 Exam 3

¡Supera tus tareas y exámenes ahora con Quizwiz!

Checkpoint Facility

Enables updates to database in progress to be made permanent.

Strict 2PL

Holds only exclusive locks until the end of the transaction. Most database systems implement one of these two variants of 2PL. (Two-Phase Locking)

Deadlock Prevention

Order transactions using transaction timestamps.

Deadlock Handling

Timeouts, Deadlock Prevention, and Deadlock Detection and Recovery

Deadlock Detection

Usually handled by the construction of a wait-for graph (WFG) that shows the transaction dependencies; that is, transaction Ti is dependent on Tj if transaction Tj holds the lock on a data item that Ti is waiting for.

NoSQL Data Type Models

- Key-value : associate a data value with a specific key - Document-oriented : associate a structured data value with a specific key. The structure is embedded in the object - Graph database : consists of nodes and edges. Typically the nodes represent entities and the edges represent relationships. - Columnar database: stores data by columns as oppose to rows. Columns are grouped into families. Typically a family corresponds to a real world object

Types of Failure

1. System crashes due to hardware or software errors, resulting in loss of main memory; 2. Media failures, such as head crashes or unreadable media, resulting in the loss of parts of secondary storage; 3. Application software errors, such as logical errors in the program that is accessing the database, that cause one or more transactions to fail; 4. Natural physical disasters, such as fires, floods, earthquakes, or power failures; 5. Carelessness or unintentional destruction of data or facilities by operators or users; 6. Sabotage, or intentional corruption or destruction of data, hardware, or software facilities.

Timestamping

A concurrency control protocol that orders transactions in such a way that older transactions, transactions with smaller timestamps, get priority in the event of conflict.

Conflict Serializability

A conflict serializable schedule orders any conflicting operations in the same way as some serial execution.

Wait-For Graph

A directed graph G 5 (N, E) that consists of a set of nodes N and a set of directed edges E, which is constructed as follows: 1. Create a node for each transaction. 2. Create a directed edge Ti ® Tj, if transaction Ti is waiting to lock an item that is currently locked by Tj. Deadlock exists if and only if the graph contains a cycle. The deadlock detection algorithm generates the graph at regular intervals and examines it for a cycle. The choice of time interval between executions of the algorithm is important.

Lost Update Problem

A problem that exists in database applications in which two users update the same data item, but only one of those changes is recorded in the data. Can be resolved using locking.

Locking

A procedure used to control concurrent access to data. When one transaction is accessing the database, a lock may deny access to other transactions to prevent incorrect results.

Recoverable Schedule

A schedule in which for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then the commit operation of Ti precedes the commit operation of Tj.

Nonserial Schedule

A schedule where the operations from a set of concurrent transactions are interleaved.

Serial Schedule

A schedule where the operations of each transaction are executed consecutively without any interleaved operations from other transactions. Always considered to be correct.

Schedule

A sequence of the operations by a set of concurrent transactions that preserves the order of the operations in each of the individual transactions.

Conservative 2PL

A transaction obtains all its locks when it begins, or it waits until all the locks are available. This protocol has the advantage that if lock contention is heavy, the time that locks are held is reduced because transactions are never blocked and therefore never have to wait for locks. On the other hand, if lock contention is low then locks are held longer under this protocol. Further, the overhead for setting locks is high, because all the locks must be obtained and released all at once. Thus, if a transaction fails to obtain one lock it must release all the current locks it has obtained and start the lock process again. From a practical perspective, a transaction may not know at the start which locks it may actually need and, therefore, may have to set more locks than is required. As a result, this protocol is not used in practice.

Lock Timeouts

A transaction that requests a lock will wait for only a system-defined period of time. If the lock has not been granted within this period, the lock request times out. In this case, the DBMS assumes that the transaction may be deadlocked, even though it may not be, and it aborts and automatically restarts the transaction.

Constrained Write Rule

A transaction updates a data item based on its old value, which is first read by the transaction.

Timestamp

A unique identifier created by the DBMS that indicates the relative starting time of a transaction.

Recovery Manager

Allows DBMS to restore database to consistent state following a failure.

Wait-Die Timestamp Algorithm

Allows only an older transaction to wait for a younger one, otherwise the transaction is aborted (dies) and restarted with the same timestamp, so that eventually it will become the oldest active transaction and will not die.

Transaction

An action, or series of actions, carried out by a single user or application program, that reads or updates the contents of the database. A logical unit of work on the database that may be an entire program, part of a program, or a single statement.

Deadlock

An impasse that may result when two (or more) transactions are each waiting for locks to be released that are held by the other. There is only one way to break deadlock: abort one or more of the transactions. This usually involves undoing all the changes made by the aborted transaction(s).

RDB ACID vs. NoSQL BASE

Atomicity, Consistency, Isolation, Durability vs. Basically, Available, Soft-state (may change over time), Eventually consistent (asynchronous propagation).

Recovery From Deadlock

Choice of deadlock victim, how far to roll a transaction back, avoiding starvation.

Transaction Records

Contain: - Transaction identifier. - Type of log record, (transaction start, insert, update, delete, abort, commit). - Identifier of data item affected by database action (insert, delete, and update operations). - Before-image of data item. - After-image of data item. - Log management information.

Log File

Contains information about all updates to database: - Transaction records. - Checkpoint records. Often used for other purposes (for example, auditing). Log file may be duplexed or triplexed. Log file sometimes split into two separate random-access files. Potential bottleneck; critical in determining overall performance.

Main Recovery Techniques

Deferred update, immediate update, shadow paging

Benefits of NoSQL

Elastic Scaling • RDBMS scale up - bigger load , bigger server • NoSQL scale out - distribute data across multiple hosts seamlessly DBA Specialists • RDMS require highly trained expert to monitor DB • NoSQL require less management, automatic repair and simpler data models Big Data • Huge increase in data RDMS: capacity and constraints of data volumes at its limits • NoSQL designed for big data • Volume • Variety • Velocity • Veracity Flexible data models • Change management to schema for RDMS have to be carefully managed • NoSQL databases more relaxed in structure of data • Database schema changes do not have to be managed as one complicated change unit • Application already written to address an amorphous schema Economics • RDMS rely on expensive proprietary servers to manage data • No SQL: clusters of cheap commodity servers to manage the data and transaction volumes • Cost per gigabyte or transaction/second for NoSQL can be lower than the cost for a RDBMS

Precedence Graph

For a schedule S, a precedence graph is a directed graph G = (N, E) that consists of a set of nodes N and a set of directed edges E, which is constructed as follows: 1. Create a node for each transaction. 2. Create a directed edge Ti -> Tj, if Tj reads the value of an item written by Ti. 3. Create a directed edge Ti -> Tj, if Tj writes a value into an item after it has been read by Ti. 4. Create a directed edge Ti -> Tj, if Tj writes a value into an item after it has been written by Ti.

CAP in NoSQL

GIVEN: • Many nodes • Nodes contain replicas of partitions of the data Consistency • All replicas contain the same version of data • Client always has the same view of the data (no matter what node) Availability • System remains operational • All clients can always read and write Partition tolerance • multiple entry points • System remains operational on system split (communication malfunction) • System works well across physical network partitions

Basic Timestamp Ordering

Guarantees that transactions are conflict serializable, and the results are equivalent to a serial schedule in which the transactions are executed in chronological order of the timestamps. In other words, the results will be as if all of transaction 1 were executed, then all of transaction 2, and so on, with no interleaving. However, basic timestamp ordering does not guarantee recoverable schedules. (1.) Transaction T issues a read(x). (a.) Transaction T asks to read an item (x) that has already been updated by a younger (later) transaction, that is, ts(T) < write_timestamp(x). This means that an earlier transaction is trying to read a value of an item that has been updated by a later transaction. The earlier transaction is too late to read the previous outdated value, and any other values it has acquired are likely to be inconsistent with the updated value of the data item. In this situation, transaction T must be aborted and restarted with a new (later) timestamp. (b.) Otherwise, ts(T) >= write_timestamp(x), and the read operation can proceed. We set read_timestamp(x) = max(ts(T), read_timestamp(x)). (2.) Transaction T issues a write(x). (a.) Transaction T asks to write an item (x) whose value has already been read by a younger transaction, that is ts(T) < read_timestamp(x). This means that a later transaction is already using the current value of the item and it would be an error to update it now. This occurs when a transaction is late in doing a write and a younger transaction has already read the old value or written a new one. In this case, the only solution is to roll back transaction T and restart it using a later timestamp. (b.) Transaction T asks to write an item (x) whose value has already been written by a younger transaction, that is ts(T) < write_timestamp(x). This means that transaction T is attempting to write an obsolete value of data item x. Transaction T should be rolled back and restarted using a later timestamp. (c.) Otherwise, the write operation can proceed. We set write_timestamp(x) = ts(T).

How Far to Roll a Transition Back

Having decided to abort a particular transaction, we have to decide how far to roll the transaction back. Clearly, undoing all the changes made by a transaction is the simplest solution, although not necessarily the most efficient. It may be possible to resolve the deadlock by rolling back only part of the transaction.

Multiple-Granularity Locking

Hierarchically breaking up the database into blocks which can be locked and can be track what need to lock and in what fashion. Such a hierarchy can be represented graphically as a tree. In addition to S and X lock modes, there are three additional lock modes with multiple granularity: Intention-Shared (IS): explicit locking at a lower level of the tree but only with shared locks. Intention-Exclusive (IX): explicit locking at a lower level with exclusive or shared locks. Shared & Intention-Exclusive (SIX): the sub-tree rooted by that node is locked explicitly in shared mode and explicit locking is being done at a lower level with exclusive mode locks.

Shared Lock

If a transaction has a shared lock on a data item, it can read the item but not update it. Upgrade to exclusive.

Exclusive Lock

If a transaction has an exclusive lock on a data item, it can both read and update the item. Downgrade to shared.

Graph Conflict Serializability

If an edge Ti -> Tj exists in the precedence graph for S, then in any serial schedule S' equivalent to S, Ti must appear before Tj. If the precedence graph contains a cycle, the schedule is not conflict serializable.

Recovery Techniques

If database has been damaged: - Need to restore last backup copy of database and reapply updates of committed transactions using log file. If database is only inconsistent: - Need to undo changes that caused inconsistency. May also need to redo some transactions to ensure updates reach secondary storage. - Do not need backup, but can restore database using before- and after-images in the log file.

Serializability Order Of Operations

If two transactions only read a data item, they do not conflict and order is not important. If two transactions either read or write completely separate data items, they do not conflict and order is not important. If one transaction writes a data item and another either reads or writes the same data item, the order of execution is important.

CAP Theorem Definition

If you cannot limit the number of faults and requests can be directed to any server and you insist on serving every request you receive then you cannot possibly be consistent. e.g. You must always give something up: consistency, availability, or tolerance to failure and reconfiguration

Logging Facilities

Keep track of current state of transactions and database changes.

Rigorous 2PL

Leave the release of all locks until the end of the transaction. In this way, the cascading rollback would not occur, as T15 would not obtain its exclusive lock until after T14 had completed the rollback. (Two-Phase Locking)

NoSQL vs. SQL Properties

Looser schema definition • Applications written to deal with specific documents/data • Applications aware of the schema definition as opposed to the data • Designed to handle distributed, large databases • Trade offs: • No strong support for adhoc queries but designed for speed and growth of database • Query language through the API • Relaxation of the ACID properties

Shadow Paging

Maintain two page tables during life of a transaction: current page and shadow page table. When transaction starts, two pages are the same. Shadow page table is never changed thereafter and is used to restore database in event of failure. During transaction, current page table records all updates to database. When transaction completes, current page table becomes shadow page table.

Optical Disk

More reliable than tape, generally cheaper, faster, and providing random access.

Magnetic Tape

Offline nonvolatile storage medium, which is far more reliable than disk and fairly inexpensive, but slower, providing only sequential access.

Wound-Wait Timestamp Algorithm

Only a younger transaction can wait for an older one. If an older transaction requests a lock held by a younger one, the younger one is aborted (wounded).

Checkpointing

Point of synchronization between database and log file. All buffers are force-written to secondary storage. Checkpoint record is created containing identifiers of all active transactions. When failure occurs, redo all transactions that committed since the checkpoint and undo all transactions active at time of crash.

Magnetic Disks

Provide online nonvolatile storage. Compared with main memory, disks are more reliable and much cheaper, but slower by three to four orders of magnitude.

Thomas's Write Rule

Provides greater concurrency by rejecting obsolete write operations. (a.) Transaction T asks to write an item (x) whose value has already been read by a younger transaction, that is, ts(T) < read_timestamp(x). As before, roll back transaction T and restart it using a later timestamp. (b.) Transaction T asks to write an item (x) whose value has already been written by a younger transaction, that is ts(T) < write_timestamp(x). This means that a later transaction has already updated the value of the item, and the value that the older transaction is writing must be based on an obsolete value of the item. In this case, the write operation can safely be ignored. This is sometimes known as the ignore obsolete write rule, and allows greater concurrency. (c.) Otherwise, as before, the write operation can proceed. We set write_timestamp(x) = ts(T).

Backup Mechanism

Recovery facility that makes periodic backup copies of database.

Avoiding Starvation

Starvation occurs when the same transaction is always chosen as the victim, and the transaction can never complete. The DBMS can avoid starvation by storing a count of the number of times a transaction has been selected as the victim and using a different selection criterion once this count reaches some upper limit.

Drawbacks of NoSQL

Support • RDBMS vendors provide a high level of support to clients • Stellar reputation • NoSQL - are open source projects with startups supporting them • Reputation not yet established Maturity • RDMS mature product: means stable and dependable • Also means old no longer cutting edge nor interesting • NoSQL are still implementing their basic feature set Administration • RDMS administrator well defined role • NoSQL's goal: no administrator necessary however NO SQL still requires effort to maintain Lack of Expertise • Whole workforce of trained and seasoned RDMS developers • Still recruiting developers to the NoSQL camp Analytics and Business Intelligence • RDMS designed to address this niche • NoSQL designed to meet the needs of a Web 2.0 application - not designed for ad hoc query of the data • Tools are being developed to address this need

Choice of Deadlock Victim

The choice of transactions to abort may be obvious. However, in other situations, the choice may not be so clear. In such cases, we would want to abort the transactions that incur the minimum costs. This may take into consideration: 1. How long the transaction has been running (it may be better to abort a transaction that has just started rather than one that has been running for some time) 2. How many data items have been updated by the transaction (it would be better to abort a transaction that has made little change to the database rather than one that has made significant changes to the database) 3. How many data items the transaction is still to update (it would be better to abort a transaction that has many changes still to make to the database rather than one that has few changes to make). Unfortunately, this may not be something that the DBMS would necessarily know.

Data Item

The entire database, a file, a page (sometimes called an area or database space—a section of physical disk in which relations are stored), a record, a field value of a record.

ACID

The four basic properties that define a transaction. Atomicity, Consistency, Isolation, and Durability.

Concurrency Control

The process of managing simultaneous operations on a database without having them interfere with each other through interleaving. Throughput is improved as the CPU doesn't idle for IO operations.

Database Recovery

The process of restoring the database to a correct state in the event of a failure.

Granularity

The size of data items chosen as the unit of protection by a concurrency control protocol.

Serializability

To find nonserial schedules that allow transactions to execute concurrently without interfering with one another, and thereby produce a database state that could be produced by a serial execution.

Phantom Read

Transaction executes a query that retrieves a set of tuples from a relation satisfying a certain predicate, re-executes the query at a later time, but finds that the retrieved set contains an additional (phantom) tuple that has been inserted by another transaction in the meantime.

Fuzzy Read

Transaction rereads a data item it has previously read but, in between, another transaction has modified it. Thus, the transaction receives two different values for the same data item.

Atomicity

Transactions cannot be divided and must be performed entirely or not at all. DBMS recovery subsystem ensures atomicity. "All or nothing".

Isolation

Transactions execute independently of one another. Partial effects of incomplete transactions should not be visible to other transactions. The DBMS concurrency control subsystem ensures isolation.

Transactions and Recovery

Transactions represent basic unit of recovery. Recovery manager responsible for atomicity and durability. If failure occurs between commit and database buffers being flushed to secondary storage then, to ensure durability, recovery manager has to redo (rollforward) transaction's updates. If transaction had not committed at failure time, recovery manager has to undo (rollback) any effects of that transaction for atomicity. Partial undo - only one transaction has to be undone. Global undo - all transactions have to be undone.

Durability

Transactions that are committed have effects that are permanently stored in the database and must not be lost due to failure. The DBMS recovery subsystem ensures durability.

Consistency

Transactions transform the database from one consistent state to another. The DBMS ensures consistency by enforcing constraints (integrity) in the schema. The programmer must ensure consistency as well, because the DBMS cannot detect certain errors.

Immediate Update

Updates are applied to database as they occur. Need to redo updates of committed transactions following a failure. May need to undo effects of transactions that had not committed at time of failure. Essential that log records are written before write to database. Write-ahead log protocol. If no "transaction commit" record in log, then that transaction was active at failure and must be undone. Undo operations are performed in reverse order in which they were written to log.

Deferred Update

Updates are not written to the database until after a transaction has reached its commit point. If transaction fails before commit, it will not have modified database and so no undoing of changes required. May be necessary to redo updates of committed transactions as their effect may not have reached database.

Main Memory

Volatile storage that usually does not survive system crashes. (Primary)


Conjuntos de estudio relacionados

Biol 1610 - Class 8: Energy and Enzymes Pre-class

View Set

История Казахстана 2023

View Set

Maternity and Women's Health Nursing

View Set

Lesson 1: Deploying and Managing Server Images

View Set

APUSH Summer Assignment Quiz Review

View Set