INFS2608 Lecture 7 - Transaction Management

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is Timestamping?

"A concurrency control protocol that orders transactions in such a way that older transactions, get priority in the event of conflict." - *Transactions are ordered globally* so that *older transactions get priority* in the event of conflict - Conflict is resolved by rolling back and *restarting other transactions* - Timestamping is an *alternative method to locks* - No locks, so *no deadlock*

What is a Timestamp?

"A unique identifier created by DBMS that indicates relative starting time of a transaction" Time stamp generated by: - using system clock - incrementing a logical counter (++ for new transactions)

What is a transaction?

"Action, or series of actions, carried out by user or application, which reads or updates contents of database." - *Logical unit of work* on the DB - Transforms DB *from one consistent state to another*, although consistency may be *violated during transaction.* - *Application program* is a series of transactions with *non-DB processing taken place in-between*

What is Deadlock?

"An impasse that may result when two (or more) transactions are each waiting for locks held by the other to be released." aka Neither transaction can continue, because each is waiting for a lock it cannot obtain until the other completes.

What is a checkpoint?

"Point of synchronization between database and log file. All buffers are force-written to secondary storage." - created containing identifiers of all active transactions. - when failure occurs, redo all transactions that committed since the checkpoint and undo all transactions active at time of crash.

What is Database Recovery

"Process of restoring database to a correct state in the event of a failure"

What is Concurrency Control?

"The process of managing simultaneous operations on the database without having them interfere with one another." - Prevents interference when *two or more users* are accessing database simultaneously and *at least one* is writing (updating) data (not only reading) -Although two transactions may be correct in themselves, *interleaving* of operations (based on *parallel execution* of applications) may produce an incorrect result.

What is Two-Phase Locking (2PL)?

"Transaction follows the two-phase locking protocol if *all locking operations precede the first unlock operation in the transaction*" - Two phases for transaction: 1. Growing phase: 加🔐,不能开锁 2. Shrinking phase: 开锁,不能加🔐 - A transaction must acquire a lock on an item before operating on it - Once transaction releases a lock, it can never acquire any new locks It can be proven that if every transaction in a schedule follows the two-phase locking protocol, then the schedule is guaranteed to be conflict serializable (Eswaran et al. 1976). <-- idk who this faggòt is and i doubt he'll ask us this shÍt Go to lecture slide for worked examples!

What happens if a transaction fails?

- *Atomicity for failed transaction* requires effects of transaction to be done - *Durability of other, completed transactions* requires that once transaction commits, its changes cannot be undone These two effects may lead to a *nonrecoverable schedule*, which should not be allowed

Recovery Facilities

- *Backup mechanism* to keep periodic backups of DB - *Logging facilities* to keep track of current state of transactions & DB changes - *Checkpoint facility* to enable updates to DB in progress to be made permanent - *Recovery manager* allows DBMS to restore DB to consistent state following a failure.

What is Thomas's Write Rule: Modified Write (x)

- *Modification* to basic timestamp ordering protocol. - Rejects *obsolete write operations* (rather than rolling back) - *Relaxes conflict serializability* - Provides *greater concurrency*

What is an objective of concurrency control?

- *Objective* of a concurrency control protocol is to schedule transactions in such a way as to *avoid any interference.*

The types of failure

- *System crashes*, resulting in loss of main memory. - *Media failures*, resulting in loss of parts of secondary storage. - *Application software errors.* - *Natural physical disasters.* - Carelessness and *unintentional destruction* of data or facilities. - Sabotage and *intentional destruction* of data or facilities.

Transactions and Recovery

- *Transactions are the basic units of recovery* - *Recovery manager* responsible for *atomicity* ("all or nothing") and *durability* (commit must be permanent). - If failure occurs *after commit* and *before secondary storage* (database buffers being flushed to secondary storage), then to ensure *durability* recovery manager has to *redo (rollforward)* transaction's updates. - If transaction had not committed at failure time, then recovery manager has to *undo (rollback)* any effects of that transaction for *atomicity.* § Partial undo: only one transaction has to be undone. § Global undo: all transactions have to be undone.

What does a transaction log file contain?

- *identifier* - *Type* (transaction start, insert, update, delete, abort, commit). - Identifier of *data item* affected - *Before-image* of data item. - *After-image* of data item. - Log management information.

What is a recoverable schedule?

- A schedule in which for each pair of transactions T1 and T2: if T2 reads a data item previously written by T1, then the commit operation of T1 must precede the commit operation of T2."

What is conflict serializability?

- A schedule is called conflict serializable if it can be transformed into a serial schedule by swapping non-conflicting operations. - Key: Different serial executions (say: T1T2T3, T3T1T2) may have different outcomes, the serialized schedule needs to be *identical in outcome!* - Can be tested for using precedence graphs.

What is a Log File?

- Contains *information about all updates* to DB i.e (transaction & checkpoint records) - Additionally used for purposes other than recovery (auditing) - may be duplexed or triplexed (maintaining 2 or 3 separate copies). - Log file sometimes split into two separate random-access files (for speed). - Potential bottleneck; critical in determining overall performance

The Hierarchy of Granularity/ Lvls of Locking

- Could represent granularity of locks in a hierarchical structure. - Root node (level 0) represents entire database, level 1 represent files, etc. - When node is locked, all descendants are also locked. - DBMS should check hierarchical path before granting lock. Tradeoff: Tradeoff: - coarser, the *lower the degree of concurrency*; - finer, *more locking information* that is needed to be stored. - Best item size depends on the types of transactions.

What is deadlock prevention?

- DBMS *looks ahead to see if transaction would cause deadlock* and never allows deadlock to occur. - Could order transactions to prevent deadlocks using transactions timestamps through either Wait-Die or Wound-Wait.

What is Deadlock Detection & Recovery?

- DBMS allows deadlocks to occur BUT *recognises deadlocks & breaks them* - Usually handled by construction of *wait-for graph* showing transaction dependencies: § Node for each transaction § Edges denote dependencies - *Deadlock exists if and only if wait-for graph contains cycle.* - Wait-for graph is created at regular intervals.

Timestamp Ordering Protocol

- Guarantees that transactions are *conflict serializable*. - Results are *equivalent* to a *serial schedule* in which the transactions are executed in chronological order.

In serializability, when is the order of execution important?

- If one transaction writes a data item and another reads or writes same data item, then the order of execution is important. - If two transactions only read a data item or two transactions either read or write separate data items --> not important

Basic rules for locking?

- If transaction has (owns) shared lock --> R - If transaction has exclusive lock --> R & W - Reads cannot conflict, so more than one transaction can hold *shared locks simultaneously* on same item - *Writes can conflict*, so *exclusive lock* gives transaction *exclusive access* - Some systems allow transactions to change lock type ie upgrade downgrade

How does one prevent Cascading Rollback?

- Leave release of all locks until end of transaction. - This is called rigorous 2PL. (All I can think of rn is Mr House rolling down a hill smfh)

What is the Inconsistent Analysis Problem?

- Occurs when *one transaction reads values but second transaction updates them* during execution of first transaction. - Does not require both operations to write - Again, a "dirty read" (or "fuzzy/unrepeatable read").

What is the Uncommitted Dependency Problem?

- Occurs when one transaction can see intermediate results of another transaction before it has committed. - A "dirty read" of "dirty data".

How do we handle deadlock (incl. 3 techniques)?

- Only 1 way to break deadlock: *abort one or more of the transactions.* - In theory, deadlock should be transparent (invisible) to user, so *DBMS should restart transaction(s).* - In practice, DBMS cannot just restart aborted transaction since it is *unaware of transaction logic.* We have 3 general techniques: a. timeouts b. deadlock prevention c. deadlock detection and recovery

Expain Wound-Wait

- Only a younger transaction can wait for an older transaction. - If younger transaction waits for an older transaction: OK - If older transaction waits for an younger transaction, then the younger transaction is aborted (wounded) and restarted with a new timestamp.

Explain Wait-Die

- Only an older transaction can wait for younger (newer) transaction. - If older transaction waits for an younger transaction: OK - If younger transaction waits for an older transaction, then the younger transaction is aborted (dies) and restarted with some delay and the same timestamp (eventually becoming the oldest transaction).

What is (View) Serializability?

- Running transactions serially (one after the other) limits the degree of concurrency or parallelism in system. - *Serializability* identifies those *parallel executions* of transactions *guaranteed to ensure consistency.*

What is a Nonserial Schedule?

- Schedule when operations from set of concurrent transactions are interleaved.

What is reoverability?

- Serializability identifies schedules that maintain database consistency, assuming no transaction fails. - Recoverability - deals with failed transactions in a schedule

Granularity of Data Items

- Size of data items chosen as *unit of protection* by concurrency control protocol. - Ranging from course to fine: 1. The entire database. 2. A file. 3. A page (or area or database space). 4. A record. 5. A field value of a record.

What is the Lost Update Problem?

- Successfully completed update is overridden by another user. - Hence, the update is lost. i.e one transaction (A) updates a particular variable while another (B) that began very shortly afterward didn't see this update before updating the same value itself. The result of the first transaction (A) is then "lost," as it is simply overwritten by the second transaction (b).

What was wrong in Schlagcuck's Incorrect Locking Schedule example?

- The transactions release locks too soon --> loss of isolation & atomicity || transactions interfered with one another! - To guarantee serializability, we need *an additional protocol* regarding the positioning of lock & unlock in every transaction --> 2PL

What is a timeout?

- Transaction that requests lock will only wait for a *system-defined period of time* - If lock isn't granted in this period, lock request *"times out"* - If it times out, *DBMS assumes transaction may be deadlocked* (even tho it might not be) --> aborts and restarts transaction

What is Locking?

- Transactions use locks to *deny access to other transactions* --> prevents incorrect updates. - *Most widely used* serializability approach - Can be in two forms: Ø *Shared lock (read) lock*: prevents other transactions from *updating* data item Ø *Exclusive lock (write) lock*: prevents other transactions from *reading and updating* data item.

What are the 2 types of storage?

- Volatile (main memory): doesn't survive loss of power (shit like RAM) - Nonvolatile (stable storage medium): loss of power won't lose data (shit like Hard Drives, magnetic tape)

What is a Cascading Rollback?

- situation in which a single transaction leads to a series of rollbacks is called cascading rollback. - *undesirable*, because they potentially lead to the *undoing of a significant amount of work.*

What are the objectives of serializability?

- to find nonserial schedules that allow transactions to execute concurrently without interfering with one another. (such a schedule is called serializable) - basically, to find nonserial schedules to combine into one giant block that can be seen as a serial schedule

Properties of Transactions (ACID!)

1) *Atomicity*: 'All or nothing' property. 2) *Consistency*: Must transform database from one consistent state to another. 3) *Isolation*: Partial effects of incomplete transactions should not be visible to other transactions. 4) *Durability*: Effects of a committed transaction are permanent and must not be lost because of later failure.

What are the 4 key concepts of concurrency control?

1. (View) Serializability (VS) 2. Conflict Serializability (CS) 3. Two-phase locking (2PL) 4. Timestamping (TS)

What are the 2 types of timestamps?

1. *Transaction timestamps*: - *Timestamp of transactions* itself (generated at its *start*) 2. *Data item timestamps*: - *Read-timestamp*: timestamp of last transaction (not sys time) to read item. - *Write-timestamp*: timestamp of *last transaction* to write item.

What are two problems that can still occur within 2PL?

1. Cascading rollbacks 2. Deadlocks

What are the 2 main concurrency control techniques (for conflict serializability)?

1. Locking 2. Timestamping Note: - They're *conservative/pessimistic methods*: they delate transactions in case theres conflict. - *Optimistic methods* assume conflict is rare and only check for conflicts at commit

What two outcomes can a transaction have?

1. Success: transaction *commits* and DB reaches a new consistent state. 2. Failure: transaction *aborts*, and database must be restored to consistent state before it started (*rolled back* or *undone*). - Committed transactions can't be aborted - Aborted transactions can be reattempted once roll back is done

In the "Deadlock Detection" technique, how do we recover?

3 issues we need to dig into 1. Choose a *deadlock victim* by: - aborting transaction that just started - aborting transaction that affected least data items. 2. How far should we *roll a transition back*? - complete vs partial rollback 3. How can we *avoid starvation* - Concurrency control protocol can use a counter (variable?) and use it for decision making

What is a serial schedule?

A schedule where operations of each transaction are executed consecutively without any interleaved operations from other transactions. Key note: No guarantee that results of all serial executions of a given set of transactions will be identical (example: execute transaction "calculate interest" first or transaction "make deposit" first? - different order results in different values)

Why do we need Concurrency Control?

Because of 3 potential problems caused by concurrency: 1) Lost update problem 2) Uncommitted dependency problem 3) Inconsistent analysis problem

Solution to the Lost Update Problem?

Don't let other transactions update the same variable at same time; They should wait so that the value they use reflects the correct value.

Recovery techniques

If DB is *inconsistent*: (bad) - undo changes that made it inconsistent - restore DB using before & after images from log file (no need for backup) - may need to redo some transactions to ensure updates reach secondary storage If DB is *damaged*: (worse) - Need to restore last backup copy - Reapply updates of committed transactions using log file

Solution to the Uncommitted Dependency Problem?

If two transactions both read the same variable, prevent the latest one from reading the variable until after the earlier one commits or aborts.

Solution to Inconsistent Analysis Problem?

If we have any number of transactions in which one is updating while the others are reading, force the reading ones to wait until the updating one has committed/completed.

What is a schedule?

Sequence of reads/writes by set of concurrent transactions.

What is starvation?

When the same transaction is always chosen as the victim, and hence never gets executed.


Conjuntos de estudio relacionados

Principles of Marketing Practice

View Set

Chapter 22: Nursing Management: Patients With Oral and Esophageal Disorders and Patients Receiving Gastrointestinal Intubation, Enteral, and Parenteral Nutrition

View Set

Chapter 23: New Deal America, 1929-1939

View Set

Chapter 18 Autonomic Nervous System

View Set

Patho Test 4 Endocrine Disorders

View Set