Database Design Quiz 4

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Hierarchy of Granularity

- Could represent granularity of locks in a hierarchical structure - Root node represents entire database, level 1s represent files etc. - When node is locked all its descendants are also locked - DBMS should check hierarchical path before granting lock

Main Recovery Techniques

- Deferred Update - Immediate Update - Shadow Paging

Locking - Basic Rules

- If transaction has shared lock on item, can read but not update item - If transaction has exclusive lock on item, can both read and update item - Reads can't conflict, so more than one transaction can hold shared locks simultaneously on same time - Exclusive lock gives transaction exclusive access to that item - Some systems allow transaction to upgrade read lock to an exclusive lock, or downgrade exclusive lock to a shared lock

Ordering of Read/Writes in Serializability

- If two transactions only read a data item, they do not conflict and order is not important - If two transactions either read or write separate data items, they do not conflict and order is not important - If one transaction writes a data item and another reads or writes same data item, order of execution is important

Potential Problems caused by Concurrency

- Lost update problem = successfully completed update is overridden by another user - Uncommitted dependency problem = occurs when one transaction can see intermediate results of another transaction before it has committed, a dirty read - Inconsistent analysis problem = occurs when transaction reads several values but second transaction updates some of them during execution of first

Isolation Levels

- SERIALIZABLE (obtains locks on sets of accessed objects and holds them until the end) - REPEATABLE READ (same locks as for serializable transaction but doesn't lock sets of objects at higher level) - READ COMMITTED (obtains X-locks before writing and holds them until the end, obtains S-locks before reading but releases them immediately after reading) - READ UNCOMMITTED (does not obtain S-locks for reading, not allowed to perform any writes, doesn't request any locks ever)

Types of Failures

- System crashes (resulting in loss of main memory) - Media failures (resulting in loss of parts of secondary storage) - Application software errors - Natural physical disasters - Carelessness or unintentional destruction of data or facilities - Sabotage

Transaction Records Contain

- Transaction identifier - Type of log record (transaction start, insert, update, delete, abort, commit) - Identifier of data item affected by database action (insert, delete, and update operations) - Before-image of data item - After-image of data item - Log management information

Need for Recovery Control

- Two types of storage (volatile/main memory and nonvolatile) - Volatile storage does not survive system crashes - Stable storage represents information that has been replicated in several nonvolatile storage media with independent failure modes

Two Basic Concurrency Control Techniques

1. Locking 2. Timestamping Both are conservative approaches: delay transactions in case they conflict with other transactions, optimistic methods assume conflict is rare and only check for conflicts at commit

Two Outcomes of a Transaction

1. Success - transaction commits and database reaches a new consistent state, committed transaction cannot be aborted 2. Failure - transaction aborts, and database must be restored to consistent state before it started, called rolled back or undone, aborted transaction that is rolled back can be restarted later

Three Techniques for Handling Deadlock

1. Timeouts - transaction that requests lock will only wait for a system-defined period of time, if lock has not been granted within this period, lock request times out, in this case DBMS assumes transaction may be deadlocked even though it may not be and it aborts and automatically restarts the transaction 2. Deadlock prevention - DBMS looks ahead to see if transaction would cause deadlock and never allows deadlock to occur, could order transactions using transaction timestamps (wait-die = only an older transaction can wait for younger one, otherwise transaction is aborted(dies) and restarted with same timestamp, wound-wait = only a younger transaction can wait for an older one, if older transaction requests lock held by younger one, younger one is aborted(wounded)) 3. Deadlock detection and recovery - DBMS allows deadlock to occur but recognizes it and breaks it, usually handled by construction of wait-for graph showing transaction dependencies (create a node for each transaction, create edge T1-> T2 if T1 waiting to lock item locked by T2), deadlock exists if and only if WFG contains cycle, WFG is created at regular intervals Several issue with recovery from Deadlock Detection: - choice of deadlock victim - how far to roll a transaction back - avoiding starvation

Recoverable Schedule

A schedule where, for each pair of transactions T1 and T2, if T2 reads a data item previously written by T1, then the commit operation of T1 precedes the commit operation of T2

Timestamp

A unique identifies created by DBMS that indicates relative starting time of a transaction, can be generated by using system clock at time transaction started, or by incrementing a logical counter every time a new transaction starts

Transaction

Action, or series of actions, carried out by the user or application, which reads or updates contents of database, logical unit of work on the database

Drawbacks of NoSQL

Administration - RDBMS administrator well defined role, NoSQL's goal: no administrator necessary however NoSQL still requires effort to maintain Lack of Expertise - Whole workforce of trained and seasoned RDBMS developers, still recruiting developers to the NoSQL camp Analytics and Business Intelligence - RDBMS designed to address this niche, NoSQL designed to meet the needs of a Web 2.0 application not designed for ad hoc query of the data, tools are being developed to address this need

INNODB and Transactions

All user activity occurs inside a transaction, if autocommit mode is enabled each SQL statement forms a single transaction on its own, perform a multiple-statement transaction by starting it with an explicit START TRANSACTION, auto commit mode is disabled within a session with SET auto commit = 0, the session will have a transaction open until it is explicitly closed

Deadlock

An impasse that may result when two (or more) transactions are each waiting for locks held by the other to be released, only one way to break deadlock: abort one or more of the transactions, Deadlock should be transparent to user, so DBMS should restart transaction, however in practice DBMS can't restart aborted transaction since it is unaware of transaction logic even if it was aware of the transaction history (unless there is no user input in the transaction or the input is not a function of the database state)

RDB ACID to NoSQL BASE

Atomicity Consistency Isolation Durability Basically Available (CP) Soft-state (state of system may change over time) Eventually consistent (asynchronous propagation)

Four Basic Properties that Define a Transaction

Atomicity - 'All or nothing' property, either all actions in a transaction are done or none Consistency - Must transform database from one consistent state to another Isolation - Partial effects of incomplete transactions should not be visible to other transactions Durability - Effects of a committed transaction are permanent and must not be lost because of later failure, if a transaction is committed, changes persist within the database

Changing Face of Data

Big Data can be understood through the four V's of: Volume - enormous amounts of structured and unstructured data Variety - multiple data types including documents, images, videos, and time series Velocity - flow of data is continuous and increasing Veracity - data contains biases, mistakes, noise, and abnormalities

Log File

Contains information about all updates to database (ex. transaction and checkpoint records), often used for other purposes (ex. auditing), may be duplexed or triplexed, sometimes split into two separate random-access files, potential bottleneck critical in determining overall performance

Intention Lock

Could be use to lock all ancestors of a locked node, can be read or write, applied top-down, released bottom-up

Precendence Graph

Create: - a node for each transaction - a directed edge T1 -> T2 if T2 reads the value of an item written by T1 - a directed edge T1 -> T2 if T2 writes a value into an item after it has been read by T1 If precedence graph contains cycle schedule is not conflict serializable

Recovery Facilities

DBMS should proves following facilities to assist with recovery: - Backup mechanism (makes periodic backup copies of database) - Logging facilities (keep track of current state of transactions and database changes) - Checkpoint facility (enables updates to database in progress to be made permanent) - Recovery manager (allows DBMS to restore database to consistent state following a failure)

Controlling Locking Overhead

Declaring transaction as "READ ONLY" increases concurrency, isolation level -trade off concurrency against exposure of transaction to other transaction's uncommitted changes (degrees of serializabilty)

Sharding of Data

Distributes s single logical database system across a cluster of machines, uses range-based partitioning to distribute documents based on a specific shard key, automatically balances the data associated with each shard, can be turned on and off per collection (table)

MongoDM

Document-oriented NoSQL database, hash-based, schema-less database, no data definition language, you can store hashed with any keys and values that you choose, keys are a basic data type but in reality stored as strings, document identifiers will be created for each document, field name reserved by system, application tracks the schema and mapping, uses BSON format

Levels of Locking

Each transaction starts from the root of the hierarchy, to get S or IS lock on a node, must hold IS or IX on parent node, to get X or IX on a node, must hold IX on parent node, must release locks in bottom-up order, equivalent to directly setting locks at the leaf levels

Benefits of NoSQL

Elastic Scaling - RDBMS scale up, bigger load, bigger server, NoSQL scale out distributing data across multiple hosts seamlessly DBA Specialists - RDBMS require highly trained expert to monitor DB, NoSQL require less management automatic repairs and simpler data models Big Data - Huge increase in data, RDBMS capacity and constraints of data volumes at its limits, NoSQL design for big data (volume, variety, velocity, veracity) Flexible data models - change management to schema for RDBMS have to be carefully managed, NoSQL databases more relaxed in structure of data, database schema changes do not have to be managed as one complicated change unit, application already written to address and amorphous schema Economics - RDBMS rely on expensive proprietary servers to manage data, NoSQL clusters of cheap commodity servers to mange the data and transaction volumes, cost per gigabyte or transaction/second for NoSQL can be lower than cost for an RDBMS

Recovery Manager

Ensures the ACID principles of atomicity and durability, keeps actions of committed transactions and discards actions of uncommitted transactions

Buffer Pool Management

FORCE - every write to disk? (poor performance, many writes clustered on same page, guarantees the persistence of the data) STEAL - allow dirty pages to be written to disk? (if so reading data from uncommitted transactions violates atomicity, if no poor performance)

Objective of Serializability

Find nonsocial schedules that allow transactions to execute concurrently without interfering with one another, want to find nonsocial schedules that are equivalent to some serial schedule, such a schedule is serializable

Serializability

Identifies those executions of transaction guaranteed to ensure consistency, objective of a concurrency control protocol is to schedule transactions in such a way as to avoid any interference, could run transactions serially, but this limits degree of concurrency or parallelism in system, no guarantee that results of all serial executions of a given set of transactions will be identical

Recovery Techniques

If database has been damaged (need to restore last backup copy of database and reapply updates of committed transactions using log file) If database is only inconsistent (need to undo changes that caused inconsistency, may also need to redo some transactions to ensure updates reach secondary storage, do not need backup, but can restore database using before and after images in the log file)

Cascading Rollback

If every transaction in a schedule follows 2PL, schedule is serializable, however problems can occur with interpretation of when locks can be released, happens when various transactions are dependent on another, to precent this with 2PL leave release of all locks until end of transaction

CAP theorem for NoSQL

If you can't limit the number of faults and requests can be directed to any server and you insist on serving every request you receive then you can't possibly be consistent, you must always give something up (consistency, availability, or tolerance to failure and reconfiguration) Given: many nodes, nodes contain replicas of partitions of the data CONSISTENCY - all replicas contain the same version of data, client always has the same view of the data no matter what node AVAILABILITY - system remains operational on failing nodes, all clients can always read and write PARTITION TOLERANCE - multiple entry points, system remains operational on system split (communication malfunction), system works well across physical network partitions CAP THEOREM: SATISFYING ALL THREE AT THE SAME TIME IS IMPOSSIBLE

NoSQL Supported Model Types

Key-value - associated a data value with a specific key Document-oriented - associate a structured data value with a specific key. The structure is embedded in the object Graph database - consists of nodes and edges, typically the nodes represent entities and the edges represent relationships Columnar database - stores data by columns as oppose to rows, columns are grouped into families, typically a family corresponds to a real world object

Performance of Locking

Locks force transactions to wait (abort and restart due to deadlock wastes the work done by the aborted transaction, deadlocks are rare due to downgrades approach) Waiting for locks becomes bigger problem as more transactions execute concurrently (allowing more concurrent transactions initially increases throughput, but at some point leads to thrashing, need to limit max number of concurrent transaction to prevent thrashing, minimize lock contention be reducing the time a transaction holds locks and by avoiding hotspots)

How does NoSQL vary from RDBMS?

Looser schema definition, applications written to deal with specific documents/data, applications aware of the schema definition as opposed to the data, designed to handle distributed, large databases Trade offs: no strong support for ad hoc queries but designed for speed and growth of database, query language through API, relaxation of the ACID properties

Shadow Paging

Maintain two page tables during life of a transaction (current page and shadow page table) When transaction starts, two pages are the same Shadow page table is never changed thereafter and is used to restore database in event of failure During transaction current page table records all update to database When transaction completes, current page table becomes shadow page table

Write-Ahead Logging Protocol

Must force the log record to permanent storage before the corresponding data page gets written to disk, must write all log records for a transaction before commit #1 guarantees Atomicity #2 guarantees Durability

Complications from NO FORCE and STEAL

NO FORCE - what if system crashes before a modified page can be written to disk? write as little as possible to a convenient place at commit time to support redoing the data update STEAL - current updated data can be flushed to disk but still locked by a transaction T1, if T1 aborts need to undo the data update done by T1

Conflicting Serializable Schedules

Orders any conflicting operations in same way as some serial execution, under constrained write rule (transaction updates data item based on its old value, which is first read), use precedence graph to test for serializability

Checkpoint

Point of synchronization between database and log file, all buffers are force-written to secondary storage, created containing identifiers of all active transactions When failure occurs, redo all transactions that committed since the checkpoint and undo all transactions active at time of crash

Concurrency Control

Process of managing simultaneous operations on the database without having them interfere with one another, prevents interference when two or more users are accessing database simultaneously and at least one is updating data, although two transactions may be correct in themselves, interleaving of operations may produce an incorrect result

Database Recovery

Process of restoring database to a correct state in the event of a failure

Read/Write Timestamp

Read-timestamp = timestamp of last transaction to read item Write-timestamp = timestamp of last transaction to write item Read/write proceeds only if last update on that data item was carried out by an older transaction, otherwise transaction requesting read/write is restarted and given a new timestamp

Replica Sets

Redundancy and Failover, zero downtime for upgrades and maintenance, master-slave replication (strong, delayed consistency), geospatial features

Nonserial Schedule

Schedule where operations from set of concurrent transactions are interleaved

Serial Schedule

Schedule where operations of each transaction are executed consecutively without any interleaved operations from other transactions

Schedule

Sequence of read/writes by set of concurrent transactions

Recoverability

Serializability identifies schedules that maintain database consistency assuming no transaction fails, could also examine recoverability of transactions within schedule, if transaction fails, atomicity requires effects of transaction to be undone, durability states that once transaction commits its changes can't be undone (without running another, compensating, transaction)

Application Program

Series of transactions with non-database processing in between, transforms database from one consistent state to another, although consistency may be violated during transaction

Granularity of Data Items

Size of data items chosen as unit of protection by concurrency control protocol, ranging from coarse to fine: - The entire database - A file - A page (or area or database spaced) - A record - A field value of a record Tradeoff: - coarser, the lower the degree of concurrency - finer, more locking information that is needed to be stored best item size depends on the types of transactions

Two-Phase Locking (2PL)

Transaction follows 2PL protocol if all locking operations precede first unlock operation in the transaction Growing Phase - acquires all locks but can't release any locks Shrinking Phase - releases locks but can't acquire any new locks

Locking

Transaction uses locking to deny access to other transactions and so prevent incorrect updates, most widely used approach to ensure serializability, generally a transaction must claim a shared (read) or exclusive (write) lock on a data item before read or write, lock prevents another transaction from modifying item or even reading it, in the case of a write lock

Timestamping

Transactions ordered globally so that older transactions, those with smaller timestamps, get priority in the event of conflict, conflict is resolved by rolling back and restarting transaction, no locks so no deadlock

Transactions and Recovery

Transactions represent basic unit of recovery, recovery manager responsible for atomicity and durability If failure occurs between commit and database buffers being flushed to secondary storage then to ensure durability recovery manager has to redo (roll forward) transaction's update If transaction had not committed at failure time, recovery manager has to undo (rollback) any effects of that transaction for atomicity Partial undo - only one transaction has to be undone Global undo - all transaction have to be undone

Immediate Update

Updates are applied to database as they occur, need to redo updates of committed transactions following a failure, may need to undo effects of transactions that had not committed at time of failure, essential that log records are written before write to database, write-ahead lot protocol, if no "transaction commit" record in log, then that transaction was active at failure and must be undone, undo operations are performed in reverse order in which they were written to log

Deferred Update

Updates are not written to the database until after a transaction has reached its commit point, if transaction fails before commit, it will not have modified database and so no undoing of changes required, may be necessary to redo updates of committed transaction as their effect may not have reached database


Set pelajaran terkait

Security Awareness - Ch4 Unit 4 Quiz

View Set

CHAPTER 1: Completeing the App, Underwriting and Delivering the policy

View Set

Chapter 55: Caring for Clients with Disorders of the Male Reproductive System

View Set

Accounting Chapter 3: Connect Multiple Choice

View Set