Advanced Topic and Database

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Parsing Phase

Access Plan- access plan is the series of steps the DBMS will use to execute the query

Horizontal Advantages

Advantages ○ data are stored close to where they are used, resulting in efficiency and better performance ○ only relevant data are available, so security is better ○ it is easy to perform unions across horizontal fragments, so querying is easy

Remote transactions

Also accesses data on single remote site, but the transaction may consist of more than one SQL statement

ACID test by four properties

Atomicity: all operations of the transaction must be completed Consistency: the result of the transaction must leave the database in a consistent state Isolation: data used during a transaction must be isolated from other transactions that require the same data - no other transaction can use the data until the first transaction is completed Durability: once the transaction is completed (committed) , the changes are permanent, even in the case of system failure

Lock Types

Binary and Shared/exclusive locks

Factors in Database Performance

CPU (speed and number of processors) RAM (size) hard disk (size and speed) network (capacity; tuned for performance) operating system (tuned for performance) application (tuned for performance)

SQL transaction management

Commit: this statement signals the end of a transaction and causes all changes to be permanently recorded in the database Rollback: this statement causes all changes made since the last COMMIT statement to be aborted, so the database is rolled back to a previous consistent state

Write-through update

Database is immediately updated by each operation even before the commit point is reached Identify the latest checkpoint (first step always the same) for transactions started and committed before the last checkpoint, do nothing for transactions committed after the last checkpoint, use the after values in the transaction log to redo the transaction

Deadlock Prevention methods

Deadlock detection,Deadlock prevention/avoidance

Transaction Recovery

Deferred update,write-through update

Horizontal Disadvantages

Disadvantages ○ accessing data across fragments results in inconsistent access speed with no data replication, there is increased vulnerability, especially if proper backups are not performed at each site

Transaction logs

Keeps track of all changes in the database

Concurrency control solutions

Locks and time stamping

Execution and fetching Phase

Locks are acquired, if necessary data is retrieved from the data files and placed in the data cache rows that match the conditions are retrieved, sorted, grouped, aggregated and returned to the client

Deferred update

Not immediate Only transaction log is updated Physical database is updated after the commit point

3 stages in Query Processing

Parsing Phase,Execution Phase,Fetching Phase

When to Indexes

Primary and foreign keys are usually indexed automatically by the DBMS Columns with low data sparsity(very few distinct values) are usually not indexed

Time Stamping

Requires that each transaction be assigned a time stamp Time stamps guaranteed to be unique and to always increase in value

Remote requests

Single statement that accesses data on only a single site

SQL performance Tuning Hints

Use indexes judiciously Conditional criteria Use literals when possible Use equality comparisons, instead of inequality comparisons Avoid the use of NOT when possible EX; write emp_sex = 'F'

I/O Activity

Working with the data in the cache

Deadlock prevention/avoidance

abort a transaction if it requests a lock that could create deadlock require that locks be acquired in a specified order; no data can be altered until all locks are obtained method for deadlock avoidance: timestamping

fully Distributed Database

both processing and data are distributed

Shared locks

can be used by all transactions that want read-only access

Transaction

consists of one or more database requests that ,together, represent a logical unit of work to perform some task Transaction must be completed in its entirety or completely aborted

Two-phase locking

describes a method for acquiring and releasing locks Growing and Shrinking phase

Binary Locks

has only two states - locked or unlocked

Consistent state

if all data integrity constraints are satisfied Data is accurate and unambiguous Transaction moves a database from one consistent state to another

Serializability

if two transactions are being executed at the same time, the result must be the same as if they were executed one after another

A database request

is represented by a single SQL statement

Buffer Cache

is shared memory that stores the most recently accessed data blocks in RAM Stores data after it has been read or before it is written

Growing Phase

locks are acquired, but none are released once all locks are acquired, the transaction is in its locked point no data are affected until the transaction is in its locked point

Shrinking Phase

locks are released, and no new locks can be acquired Guarantees Serializability Does not prevent deadlock

Concurrency Control errors

lost update,Uncommitted data,Inconsistent Retrievals

Deadlocks

may occur if two transaction are acquiring locks and each needs a record locked by the other

Exclusive Locks

must be used by a transaction that wants to write to an object; with an exclusive lock, no other transaction can access the object

Locks

one of the most frequently used methods for handling some of the problems arising from concurrent users Lock granularity- level of locking ---May be at the.. • database level • table level • page level • row level • field level

Optimizer Choices (2 choices)

rule based or cost based

SQL Cache

shared memory that stores the most recent executed SQL statements or procedures, including triggers and functions

Deadlock detetion

test for deadlocks periodically and abort one of the transactions

Distributed processing occurs when

the logical processing for a database is shared between two or more physically separate computers connected through a network

With wait/die

the older transaction waits, so we cannot have a wait cycle: Prevents starvation Starvation occurs when a transaction waits forever

Inconsistent Retrievals

the problem of inconsistent retrievals occurs when a transaction is processing a collection of data that another transaction is updating EX: the first transaction accesses some of the data before the second transaction has completed the update and accesses other parts of the data after the second transaction has completed the update

Uncommitted data

the problem of uncommitted data occurs when two transactions T1 and T2 are executing concurrently EX:T1 writes data to the database,T2 uses that data then T1 rolls back

Rule based(talked about in class)

this approach uses preset fixed-cost values for each SQL operation (full table scan, table access by row ID, sorting, etc.)

lost update

updates may be lost if two transactions are updating the same data element and one transaction overwrites the result of the other transaction

Cost Based

used more sophisticated algorithms based on the statistics about the objects being accessed Table with i/o cost and find resulting rows

Indexes

used to speed up SEARCHING,SORTING,AGGREGATING DATA,JOINING DATA Without index on state. A full table scan will be performed

Fragmentation transparency

user/application does not need to know that the database is fragmented

Distributed transactions

• a transaction can reference multiple database sites however, each individual SQL statement in the transaction can reference only one database site

Distributed request

• a transaction can reference multiple database sites • individual SQL statements in the transaction may reference multiple database sites • fully supports location transparency

Distributed Concurrency Control

• a transaction may access data at multiple sites • a final commit cannot be executed until all sites have committed their part of the transaction a DDBMS always has a well-defined commit protocol to ensure concurrency control

Distributed Database types

• centralized database: one database that might be accessed from remote locations • distributed database: a single logical database that is spread physically across computers in multiple locations that are connected by a data communications link note that distributed processing is not the same as a distributed database

Advantages of distributed database

• data can be located closer to its point of use; this results in ○ lower communication costs ○ faster data access for end users when using the locally stored data • data processing is faster because data is processed by multiple processors at multiple sites • growth is facilitated because new sites can be added easily and with no disruption to other sites operating costs may be reduced because it is generally more cost-effective to add workstations to a network than it is to update a mainframe system ○ reliability is increased because the system can continue to function even when a component fails the data is distributed at multiple sites so if one component fails, it does not result in all of the data being unavailable

asynchronous updates

• data update propagation is delayed • some data inconsistency is tolerated • lower data integrity less overhead and therefore faster response time

synchronous updates

• data updates are immediately applied to all copies throughout network • copies of the data are always identical • good for data integrity • high overhead because of the time required to check that an update is accurately propagated throughout the network slow response time because of high overhead

Basic BDMS Processes

• listener (listens for client requests) • user (a process is created for each user) • scheduler (schedules concurrent transactions) • lock manager (manages all locks on database objects) optimizer (optimizes SQL queries)

DDBMS provide services

• maintain a distributed data dictionary • determine locations from which to retrieve data and process query components translate between nodes with different local DBMSs and data models

Query Optimization in a DDBMS - 3 step Process

• query decomposition -- simplify and rewrite into a structured, relational algebra form • data localization -- fragment the query so that each fragment references data at a single site • global optimization -- determine □ the order in which to execute the query fragments □ where to move data between sites where each part of the query will be executed

Performance transparency

• query optimization is more complicated in a distributed database system • the DDBMS must determine an access plan ○ in the case of replicated data, an important component of the access plan is to determine which data will be retrieved from which site

Types of Transactions

• remote requests (level 1) • remote transactions (level 2) • distributed transactions (level 3) • distributed requests (level 4)

Replication transparency

• replicated data is transparent to the user both fragmentation and replication transparency ensure that, to the user, the database looks like a single, logical database

Steps in Query Processing

• the user (client) generates a query • the query is sent to the DBMS (the server) • the DBMS executes the query the DBMS sends the results back to the user (client)

Location transparency

• user/application does not need to know where data resides • queries are constructed as though all the data is local except for response time, the user should see no difference in querying local or remote data

Mixed

○ a subset of rows may be selected to reside at a site ○ however, only selected columns will reside at the site a second site might have the same set of rows, but different columns (or different rows and columns)

Data Replication Disadvantages

○ additional requirements for storage space ○ additional time for update operations ○ complexity and cost of updating integrity issues if replicated data is not updated simultaneously

Distributing the data

○ data replication ○ a separate copy of all or part of the database is stored at two more sites ○ mutual consistency rule • all copies of data fragments must be identical ○ horizontal fragmentation ○ vertical fragmentation ○ mixed fragmentation

Vertical

○ different columns of a table at different sites ○ advantages and disadvantages are the same as for horizontal fragmentation except that combining data across fragments is more difficult because it requires joins (instead of unions)

Horizontal

○ different rows of a table are at different sites ○ the rows at the different sites all contain the same columns

Heterogeneous

○ may use different DBMS at each node ○ are much more complex to manage ○ are common in large corporations because of pre-existing department-level databases, already set up using different DBMSs, and the reluctance of individuals to change to a new system

Data Replication Advantages

○ reliability - if one site fails, data is available at another site ○ fast response - each site with a local copy of data will have quick response to select queries ○ de-couples nodes - transactions proceed even if some nodes are down reduces network traffic at prime time if updates can be delayed

Disadvantages of distributed database

○ software cost and complexity ○ technical difficulty ○ lack of standards ○ increased storage requirements ○ training costs may be higher because of the added complexity of a distributed system ○ infrastructure costs are generally higher because of duplicated infrastructure at multiple sites

Homogeneous

○ use the same DBMS at each node ○ are simpler to manage than heterogeneous distributed databases it is generally difficult to force a homogeneous environment in a large organization

Transparency Features

○ with transaction transparency, either all the actions of a transaction are committed or none are committed ○ DDBMS at each site has a transaction manager that ○ logs before and after images of each transaction ○ ensures that all update operations are synchronized to maintain data integrity


Ensembles d'études connexes

Intermediate Econ Test Two Multiple Choice

View Set

Chapter 29: Respiratory System Functions, Data Collection

View Set

American Heritage Final Exam Study Guide CH 1-2

View Set

PreBoard III Nursing Practice III

View Set

Business Law - Ch. 5 Alternative Dispute Resolution

View Set

Anatomy and Physiology Chapter 1

View Set

Physics Semester 2 Equation Answers

View Set