Database Management Systems Final
Why are Armstrong axioms important?
Important inference laws used to manipulate FDs. They work on top of FDs. Note that FDs may cross relation borders within a database. To determine candidate keys for a specific relation, all the functional dependencies concerning the attributes in the schema of this relation should be considered.
How to deal with insertion, deletion, and update on referencing relation?
Insert: if we want to insert a tuple into a table that references another table, we must first ensure that there is a tuple in the table that is being referenced (customer if inserting into depositor, for example). If there is, then we can just insert the record. If there is not, we can either reject the insert or insert a tuple into the referenced location and then try the original insert again. Delete: no need to worry, just let it go ahead. Update: similar to insert. Make sure that the referenced material is already in the first table, then add it. If it is not, then we can either reject the request or we can add it to the referenced table then try the update again.
How to deal with insertion, deletion, and update on referenced relation?
Insert: no need to worry, just let it go ahead. Delete: if we want to delete, then we have to make sure that there is no tuple in the relation referencing this tuple. If there is, then we must delete that one before we can move ahead with deleting the one in the referenced relation. If there is not, then it is okay to just go ahead and delete it. Update: similar to delete, if we update the tuple then we must make sure that there is no record with previous value of the record in the referencing table. If there is, then we must reject or update the referencing table before it can be changed.
What are integrity constraints?
Integrity constraints help ensure accuracy and consistency of the data in the database. Primary key: uniquely identifies the characteristics of each row. Cannot be null, implies that each one of something in this attribute is something that points to a unique record. Domain constraints: allows us to test the values inserted into the database and to test the queries to make sure comparisons are made appropriately. Referential integrity: specified between two relations and is used to maintain consistency among tuples in the two relations. Assertions: a predicate expressing a condition that we want the database to always satisfy.
What is the motivation behind object-based databases?
Loosely speaking, an object corresponds to an entity in the ERD model. OODB extends advanced features of ERD such as generalization/specialization. The object-orientated paradigm is based on encapsulating code and data related to an object into a single unit. Is able to deal with complex data types through non-first normal form relations.
How to determine if a decomposition is lossless-join decomposition?
Lossy decomposition would produce spurious tuples and cannot be used. Testing the condition for lossless-join decomposition: a decomposition of R into R1 and R2 is lossless join IFF at least one of the following dependencies is in F+: R1 intersect R2 -> R1 R1 intersect R2 -> R2
In ERD context, why is mapping cardinality important?
Mapping cardinality expresses the number of entities to which another entity can be associated with via a relationship set.
How do object-based databases differ from relational databases?
OODBs have a need for complex data types while traditional DBs have used simple data types in the past. Obviously very different because queries are done on based on objects, rather than tuples.
What is the degree of relationships? (Cardinality, ERD)
One-to-one: a student is associated with at most one department via student_dept One-to-many: an instructor is associated with several (including 0) students via advisor. Many-to-one: an instructor is associated with at most one student via advisor. Many-to-many: an instructor is associated with several (possibly 0) students via advisor.
What are the basic steps of query processing?
Parsing and Translation - translate the query into its internal form. Optimization - a relational algebra expression may have many equivalent expressions, annotated expression specifying detailed evaluation strategy is called an evaluation-plan. Evaluation - query-exectuion engine takes a query evaluation-plan, executes the plan, and returns the answers to the query.
What are the three levels of abstraction? Why are they important?
Physical - describes how a record is stored. Logical - describes how data is stored in database and relationships among the data. View - application programs hide details of data types. Views can also hide information for security purposes. They are important because they allow the data to be stored in such a way that it is obstructed from view in many different ways for people who should not have access to the data. It also specifies how the data is stored at both the physical and logical levels.
Provide a brief discussion on database storage hierarchy.
Primary Storage: cache, main memory Flash Memory Secondary Storage: magnetic disks Tertiary storage: optical disk, tape. Hierarchy: cache main memory flash memory magnetic disk optical disk magnetic tape
How does primary index differ from secondary index?
Primary index: the index whose search key specifies the sequential order of the file. Clustering index. Secondary index: the index whose search key specifies an order different from the sequential order of the file. A secondary index is any index that is not the primary index. It can still be unique. Secondary index must be dense.
What are the main issues addressed under DB implementation?
Query processing, transaction processing, and database system architecture.
Give an example to show insertion, deletion, and update anomalies of a bad design.
R=(SID, courseID, section, instructorID, ...) F={courseID, section -> instructorID} relation(SID, courseID, section, instructorID) Insertion: each time a student registers for a course with a particular section, the instructorID will be repeated. This kind of redundancy does not only waste memory, but may also cause inconsistencies. Deletion: if so far only one student has registered for a course with a particular section, and that student has decided to change to some other section, then this only tuple should be removed (along with the instructorID information). Update: suppose a student wants to change to another section; then what should be changed is not the section number alone, but the instructorID as well. Otherwise, inconsistency may occur.
Why is RDB design theory needed?
RDB design theory is needed because without it we would have one very large relation in a database that wouldn't be of much use to anyone. With RDB design we are able to determine the number of relations needed, and which attributes should go with each relation so that we can get the most efficient database possible.
Compare sequential-index vs. B+ tree indexing.
Sequential index: ordered sequential file with a primary index. B+ tree: obvious The main disadvantage of the index-sequential file organization is that performance degrades as the file grows, both for index lookups and for sequential scans through the data.
What is the process of the lossless-join decomp into BCNF?
So long as there is an FD that violates BCNF, split that FD into a new relation and remove the RHS of that FD from the original relation. Repeat until all relations are no longer in violation of BCNF.
How does sparse index differ from dense index?
Sparse Index: contains index records for only some search-key values; applicable when records are sequentially ordered on search-key. Dense Index: contains index records for every search-key value in the file. A sparse index can contain only meaningful values.
How does speedup differ from scaleup?
Speedup is the amount of time that is reduced on the execution of a command when system resources are increased but the command size remains the same. small system elapsed time ----- large system elapsed time Scaleup is the amount of time difference during execution when both the program size and the system size are increased. small system small problem elapsed time ------- large system large problem elapsed time
When doing the BCNF decomposition (lossless-join) algorithm, which side should you remove from the original R when you find a FD that is not in BCNF?
The right-hand side.
What are transformation rules and how to use them?
Two relational algebra expressions are said to be equivalent if on every legal database instance the two expressions generate the same set of tuples. An equivalence rule says that expressions of two forms are equivalent.
What is a B+ tree and why are B+ trees important?
A B+ tree is a balanced tree in which all the leaves are at the same level. The branch factor is usually a relatively large number, making only a few disk accesses necessary. Search operation can be carried out in a straightforward manner. Insert and delete are somewhat complex, because the balanced condition may be violated and need to be restored. B+ trees are designed to overcome the performance degrading problem of index-sequential files.
What is a data model?
A data model is a collection of conceptual tools for describing: data, data relationships, data semantics, data constraints.
How does parallel database system differ from distributed database system?
A parallel database system consists of multiple processors and multiple disks connected by a fast interconnect network. A distributed database system consists of multiple sites/nodes that all contain different sets of data but are set up in such a way that they appear as one single database system.
Why is indexing important?
A query may reference only a small proportion of the records in a file. To reduce overhead in searching for these records we can construct indicies for the files on which the database is stored. An index on a file is designed to speed up operations that are not efficiently supported by the same basic organizations of records in that file.
What are the conditions of 3NF?
A relation schema R is in 3NF with a respect to a set F of FDs if for ALL FDs in F+ of the form a->b, where a is in R and b is in R, at least one of the following (a), (b), or (c) holds: * a -> b is the trivial FD (b is in a) * a is a superkey for schema R * b is a prime attribute of R -- namely b is a member of any candidate key in R. In condition C, each attribute in B may be in a different candidate key. For example if we have C->DE, C is not a superkey but D is in a candidate key BD and E is in another candidate key EF, then this FD satisfies the conditions. Simplified test: need to check only FDs in F, need not check all FDs in F+.
What are the conditions of BCNF?
A relation schema R is in BCNF with a respect to a set F of FDs if for ALL FDs in F+ of the form a->b, where a is in R and b is in R, at least one of the following (a) or (b) holds: * a -> b is the trivial FD (b is in a) * a is a superkey for schema R.
What is a schedule?
A schedule is a list of actions ordered by time performed by a set of transactions that are executed together in the system.
What is a transaction?
A transaction is a unit of program execution that accesses and possibly updates various data items. A transaction must see a consistent database and during the transaction execution the database may be inconsistent.
What are the transaction states?
Active: the initial state -- the transaction stays in this state while executing. Partially committed: after the final statement has been executed. Failed: after the discovery that normal execution can no longer proceed. Aborted: after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: restarted the transaction or kill the transaction. Committed: after successful completion.
What is the meaning of dependency-preserving decomposition?
All FDs of the original schema R are either preserved or can be recovered after decomposition.
What is the meaning of lossless-join decomposition?
All attributes of the original schema R must appear in the decomposition. (R1, R2): R = R1 U R2
Why is query optimization important?
Allows the DBMS to run the queries as fast as it possibly can.
What are the ACID properties?
Atomicity - either all operations of the transaction are properly reflected in the database or none are. Consistency - execution of a transaction in isolation preserves the consistency of the database. Isolation - although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Durability - after a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
Why is decomposition important?
Because decomposition is just replacing one relation with a set of smaller relations, it helps us break up the larger schemas into smaller schemas that can be used to eliminate redundancy and reduce table sizes.
How does BCNF compare to 3NF?
Comparing with BCNF, 3NF is more relaxed by adding a third condition to choose from. Any relation in BCNF is also in 3NF.
What is the process of decomposing into 3NF?
Compute Fc and for each FD in Fc, create a new relation. If none of these relations contain a candidate key of the original relation, then make an additional relation containing the original candidate key and just the trivial FDs.
Why is DBMS needed? List three most important reasons.
DBMS contains information about a particular enterprise and also includes a set of programs to access the data in an environment that is both convenient and efficient to use. The three most important reasons for DBMS are that it provides data redundancy and addresses possible inconsistencies, it provides atomicity for updates, and it provides concurrent access to multiple users.
What is DTD and how is it different from XML schema?
DTD is a document type definition, and it is used to put contraints on the structure of XML data, but does not constrain data types.
Why should we study XML in the database context?
Data interchange is critical in today's networked world, each application area has its own set of standards for representing information. XML has become the basis for all new generation data interchange formats. Earlier generation forwards were based on plain text with line headers indicating the meaning of fields. XML allows new tags to be defined as required and has a wide variety of tools available for parsing, browsing, and querying XML documents.
Understand the basics of disk mechanisms.
Data is stored in blocks, surface of platter is divided into circular tracks and tracks are made of blocks, set of all tracks with the same diameter is called a cylinder. Cylinder I consists of the Ith track of all the platters. Each track is divided into fixed size arcs called sectors, sector is smallest unit of data that can be read or written. cylinder #, track #, block #
How to determine if a decomposition is dependency-preserving?
Dependency preservation decompositions requires to check F+ = (F1 U F2 U ... U Fn)+ If any FD in F in the original relation R, if it is either explicitly stated in some Ri or it can be derived from FDs in Fi's, then the decomposition is dependency preserving.
What is the meaning of serializability?
Each transaction preserves database consistency, thus serial execution of a set of transactions preserves database consistency.
What are the two majors issues for transaction processing?
Failures of various kinds, such as hardware failures or system crashes Concurrent execution of multiple transactions.