Databases Final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Describe the suggested method of eliminating the use of a multi-valued attribute to represent the employee's three favorite colors.

use a weak relation to implement 1-M relationship

1. Describe the purpose of the Transaction Log. 2. How does the use of a transaction log increase DBMS performance? 3. What are the N operations recorded in the Transaction Log as described in the slides? 4. What is a Transaction Log Redo and when is it performed?

-

1. What is a serial schedule of multiple transactions? 2. What problems are we trying to prevent when we serially execute the transactions in a schedule i.e. serial schedules? 3. Why are serial schedules considered to be an undesirable approach to scheduling the execution of multiple transactions?

-

1. What is the execution state of T2 during the period of time between lock(x) and read_item(X)? 2. What operation causes the execution of T2 to change state?

-

A. Identify each of the following transactions as representing a Lost Update, Dirty Read, or Incorrect Summary. There is one of each. B. Describe the Unrepeatable Read problem. C. Describe Phantom Reads within a transaction.

-

Are each of the following schedules serializable? 1. R1(X); R2(Y); R1(Y); W1(X); W1(Y); C1; W2(Y); C2; 2. R1(Y); R2(X); W1(Y); A1, W2(X); C2; 3. R1(Y); R1(X); W1(X); R2(X); W1(Y); C1; W2(X);C2; Hint: You will need to build a Precedence Graph for each schedule.

-

Are the following schedules recoverable? 1. R1(X); W1(X); C1; R2(X); W2(X); C2; 2. R1(X); R2(X); W1(X); R1(Y); W2(X); C2; W1(Y); C1; 3. R1(X); W1(X); R2(X); R1(Y); A1; W2(X); C2;

-

What are the three principle properties of transaction described in the slides?

-

Describe the meaning of a Functional Dependency between attributes in a single relation A1 → A2. Use an example such as the Employee relation's SSN (Social Security Number) and the remaining attributes in the relation. Provide an example of how functional dependencies are not symmetric i.e. if A1 → A2 it is not necessarily true that A2 → A1.

--

Why is the execution of serial schedules assumed to always result in a correct database state?

--

Describe the result set of these statements:

---

Rewrite the following query

---

What is the tri-state boolean logic employed by DBMS? How does UNKNOWN contribute to result set? Why is TRUE AND UNKNOWN = UNKNOWN, but FALSE AND UNKNOWN = FALSE? Why is TRUE OR UNKNOWN = TRUE, but FALSE OR UNKNOWN = UNKNOWN?

-Tri-state maintains values: TRUE, FALSE, UNKNOWN (NULL) -UNKNOWN = neither true or false, gets treated as a false by DBMS. unknown until known. -FALSE AND UNKNOWN is false regardless of second attribute. TRUE AND UNKNOWN will stay UNKNOWN until second attribute is known. -TRUE OR UNKNOWN is true regardless of second attribute. FALSE OR UNKNOWN is unknown until second attribute is known.

1.Describe the problems that occur when two problem domain entities are combined into a single relation. For example, trying to maintain individual Employee and Department when both entities are combined into a single relation EMP_DEPT. 2. How do we decompose this relation to solve this problem?

-difficult to maintain single instance -difficult to create instance when no association to 2nd entity (system loses knowledge of entities) -wasted storage (nullable attributes) -duplication of information -updates multiple rows 2. make into two relations and use foreign key for 1-M relationship or join table for n-m relationship

Describe the structure of a multi-level index.

-single level index with layers to serve to index to index, tree of disk blocks index range -first level: ordered, leaves, record/block addresses -second level: anchor records of first -third level: anchor records of second

1) Describe how grouping and aggregation functions are combined to create summary reports. 2) Present a query which lists every project number and the average hours worked per week.

1) Summary reports include aggregation (summation, average, minimum, maximum) of information within groups of tuples (department, categories, dates, etc). ex: average employee salary grouped by project 1-group tuples in the result set by specific attribute value using GROUP BY clause 2- aggregate attribute values exclusively within each group (GA) 2) select pno, avg(hours) from works_on GROUP BY pno;

1) Describe the meaning and purpose of Triggers in a database design 2) 3)

1) a mechanism that executes a function (stored procedure) on the database server itself. attached to specific tables, execution is triggered under conditions that are set when created (CREATE TRIGGER)

Context of inner joins: 1) What are Join conditions? 2) What are Filter conditions? 3) How do explicit joins differ from implicit joins in terms of these conditions?

1) define relationship between two tables involved in the join (tables and join conditions). define base set of tuples in the result set of tables 2) selects which tuples in the result set are contained in the result set of the query (like where clause filter) 3) explicit: maintain their join conditions in the FROM portion of select statement ex: SELECT * FROM T1 JOIN T2 ON <condition> implicit: joins mix join and filter conditions in the WHERE clause explicit is easier to understand because join and filter conditions are separate in SELECT statement and conditions are separated from where condition.

What are the DB Schema artifacts identified during the Analysis phase of the SDLC? (3)

1) entity class that need persistence are turned into tables with columns that are identified by the attributes (not all need to be persisted) 2) 1-N and N-M associations are identified by entity classes and translated into foreign key relations and join tables 3) need indexes, views, custom functions for certain operations PAI

1) Describe a (simple) nested query. 2) Describe a correlated nested query (correlated sub-query). 3) Describe how these two types of queries are evaluated.

1) independent sub-query (select) that provides a result set that is used by outer query to do filtering/comparison 2) sub-query (select) with one or more filter expressions that are bound at the outer query (ex: E.dno to EMPLOYEE) 3) for simple sub-query: evaluated once to provide result needed for outer to do filtering for correlated nested query: re-evaluated for each row identified by outer query

1) Describe the issue with inner-join that the use of outer-join solves. 2) Describe the left-outer join and the right outer-join.

1) inner join creates a new join where there is a match for tuples on both left and right tables. problem because all tuples must be included in either right or left. (employees not assigned projects) 2) left outer join produce result set of tuples from inner join and tuples from left table with no match from the right table. unmatched ones = NULL entries from right table. right is the same vice versa (tuples from right table have null entries from the left table)

1) What type of association cardinality between entity classes require a Join Table to implement in the schema? 2) What is the minimum number entities for join table? 3) maximum number to represent join table? 4) What are the sources of a join table's key attribute values?

1) join table is a mechanism used to implement N-M associations between two entities 2) at least two entities 3) between 3,4.. N-way associations between N entities are possible, if greater than 3, the user should refactor the design 4) references to the N entities in N-way association are required for join table, these are implemented as foreign key references *****ER DIAGRAM********

1) Simple vs. Composite Attribute? 2) WHERE clause, disadvantage of using complex attribute? 3) recommended alternative?

1) simple = atomic value (cannot be divided) ex: integer 2) can be divided, delimits parts, difficult to query for only one component 3) migrate composite attribute to a separate table (weak entity) where it is maintained by a separate attribute

There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. 1. Given a requirement that a large number of the employees (> 80%) are assigned a phone number, which is the preferred method and why? 2. Given a requirement that only a small number of employees (< 10%) are assigned phone numbers, which is the preferred method and why? 3. How would the requirement of 'multiple phone numbers can be assigned to an Employee' affect your decision?

1) use a nullable attriubute, more efficient because no join is needed, the nullable attribute will not be wasted bc there is a lot of employees 2) use a weak entity, nullable would waste space, but the extra processing with a join could make nullable better 3) using 1-M would make us use a weak entity

Three methods of implementing a derived attribute?

1) use built-in function (aggregate functions, min, avg) 2) as the attribute of a view 3) functions written in DBMS procedural DML BAD

1) Describe what VIEW objects are 2) What is the reason for providing views in our schema? 3) Query-Based View vs. Materialized View? 4) Disadvantage of each view?

1) virtual table whose tuples or contents are generated by select statement in the VIEW's CREATE statement 2) provides method of encapsulating (hiding) commonly used or complex queries. provides facade to users to modify without effecting other clients. 3) Query-Based View: underlying select statement that is evaluated each time the view is queried. Materialized View: physical table whose tuples are provided by view's select statement (best for aggregating data based on time) 4) Query-Based View: select statement must be re-evaluated, processing gets intensive Materialized View: contents will become invalid (depending on how often the table changes), not synchronized with base table

1. Explain the difference between Binary and Read/Write locks. 2. What is meant by exclusive and shared locks?

1. binary locks: no access (read or write) when locked read/write lock: consider read and write (many can read at once) 2. read lock = shared (allows to read at once) write = exclusive (one at a time)

1. How does a B-Tree differ from a B+-Tree? 2. What type of select operation benefits most from a B+ tree? Briefly explain your answer.

1. B- tree maintains node pointers, key values, record (block) pointers. no difference between the internal and leaf nodes B*- tree maintains node pointers and key values in internal nodes. leaf nodes hold record (block) pointers - holds block pointers as linked list for index to access 2. select for consecutive indexed values (BETWEEN)

1. Define "Blocking Factor" of both Index and Data Files. 2. In terms of Blocking Factors, what is the advantage of using an index file to search for a specific record in a data file? Hint: The Blocking Factor of the index file vs. the Blocking Factor of the data file.

1. BF of index: index entries BF of data file: records 2. decrease number of blocks to be read/write (I/O operations), index BF is greater than data file because more entries will fit on block than records, easier to search for index with less reads

What are the four conditions that must exist for a deadlock between two transactions to form?

1. Mutual exclusion: only single transaction can hold resource (others blocked) 2. Hold & Wait: blocked while waiting 3. No Preemption: transaction cannot be forced to release what it holds 4. Circular Dependency: holds and requires resources needed by other transactions

Name and describe the four levels of Transaction Isolation provided to SQL clients. Describe the difference between Read Committed and Repeatable Read.

1. Read Uncommitted: transaction allowed to read values written by other transactions but not yet committed 2. Read Committed: transaction allowed to only read committed values 3. Repeatable Reads: once transaction starts, client will always read same committed value for some X, even if new X has been committed by other transactions 4. Serializable: transaction executes in isolation from other transactions (serial schedules) Read committed makes sure transaction only reads committed values, but the values can change if other commits during it's execution, Repeatable reads has the same committed value always

1. How does the Strict 2PL protocol differ from Basic 2PL protocol described above? 2. What advantage does schedules produced by Strict 2PL offer over schedules produced by basic 2PL?

1. Strict 2PL requires transaction to release its write lock only after it ends (commits or aborts), write-lock cannot be released before it's end 2. does not trigger cascade abort, second transaction only obtains locks on data that has been committed

1. Informally describe the conditions to be met for a relation to be in Third Normal Form. 2. Describe why the following relation PURCHASE is not in third normal form. 3. How would this relation suffer from Update Anomalies? PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE) Note the relation's primary key is underlined.

1. a relation should not have a non-key attribute functionally dependent on another non-key attribute (no transitive dependencies of non-key attribute on primary key) 2. CUSTOMER_CITY and CUSTOMER_ZIPCODE are functionally dependent on CUSTOMER_ID, which is not a primary key 3. if we change one, we need to update all based on CUSTOMER_ID. if we remove purchases, we remove entire customer information

What are the three conditions that identify a conflict between two operations in the same schedule?

1. belong to different transactions 2. access same item X 3. at least one operation is write_item(X)

If both an ordered data file and an index file can be search in Log2N data file bock reads 1. Why is it is more efficient to search for a value using the index? 2. What are some of the other advantages of utilizing an index?

1. blocking factor of index is higher, so it can locate the block with less reads, number of I/O operations is less with an index 2. data file can have many indexes that can be searched in this log2 time

1. Describe the primary difference between a dense and sparse index 2. What is the advantage of the sparse over the dense index? 3. Can we apply a sparse index to an unordered data file? Explain your answer.

1. dense: maintains index entry for every record, sparse: maintains index entry for every anchor record (first record of every block) 2. sparse require less entries, less blocks used, less read operations, faster search 3. no, depends on value that is ascending ,so index attribute value on block i are less than values on i+i unordered = places records randomly, so each index value would need a pointer to block aka dense index.

1. Describe "Blocking Factor". 2. What is the significance of a larger blocking factor?

1. describes the number of records that can be placed on a disk block. floor(B/R) where B and R are the block and record sizes 2. larger = more records, so more information is being retrieved for every READ BLOCK operation. we want to reduce the number of i/o operations, so by increasing information on the block, we reduce block reads and increase efficiency of search

Each of these questions refer to ordered data files. 1. Provide a short description of a data block's "Anchor Record". 2. What is the relationship between the anchor record and the other records on same block? 3. What is the relationship between the anchor records on block i and block i+1? 4. In terms of anchor record values, what block might the record containing a searched-for field value X be found?

1. first record of a data block, contains starting indexing field value of the record sequence in the block 2. anchor record's field value should be < or = all other field values of other records in the same block (when in ascending order) 3. the anchor record index field value of block i should be < the anchor record value of block i+1 4. when you are looking for a specific value (XS), the record will be on the block i where AR (i) <= X < AR (i+1)

1. Informally describe the conditions to be met for a relation to be in Second Normal Form. 2. Describe why the following CUSTOMER_PURCHASE relation is not in second normal form. 3.How would this relation suffer from Update Anomalies? CUSTOMER_PURCHASE (CUSTOMER_ID, PRODUCT_ID, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME) Note the relation's primary key is underlined

1. for a relation where the primary key is a compound key (keys are defined in multiple attributes), all nonprime attributes (not a part of primary key) must be functionally dependent on ALL of the key's attributes ALL NONPRIME ATTRIBUTES DEPENDENT ON KEY'S ATTRIBUTES 2. mixes attributes from customer's purchase with attributes with attributes that are ONLY dependent on product ID and customer ID 3. 1- purchase_description and customer_name are only partially dependent on primary key - duplicated with those that share same, changing it will require updating all 2- customer is not in the database until they make a purchase 3- if we remove customer, all information will go away and out of entire system

How are classes from the Model category used? How are classes from Controller category used?

1. identify foreign keys 2. identify application-specific operations (joins, etc)

As described in the text, what are the two strategies employed by the DBMS query processor for searching for multiple attributes in the same table (file) when one or both attributes are indexed? For example, searching for customers with a given last-name and zip-code. Hint: Three strategies are given in the book, but the first two are identical.

1. if 1/2 of the attributes is indexed, search records for indexed value, and linearly search records for non-indexed value in the result set of indexes value 2. if both are indexed (separately), perform two searches an return intersection of both result sets

1. How does the Conservative 2PL protocol differ from Basic 2PL protocol described above? 2. What two advantages do schedules produced by Conservative 2PL offer above the schedules produced by over basic 2PL?

1. in Conservative, must lock before transaction starts and transaction only releases locks after commit/abort 2. will not trigger cascade abort, free of deadlocks

Consider the differences in search efficiency when indexing using a surrogate integer field vs using an email address hint: two differences.

1. integer uses one comparison operation vs. N comparison 2. index file entry blocking factor is greater for integer vs. string

1. Why must a secondary index be a dense index? 2. Why is a secondary index less efficient that a Primary Index on an ordered data file?

1. needs to be unordered by definition (index entry for every record) aka dense index 2. secondary is less efficient than primary, it needs N-1 times more entries

1. Informally describe the conditions to be met for a relation to be in First Normal Form. 2. The book describes three possible methods of bringing Department (Figure 15.9 B) into First Normal Form. We discussed the first solution in class. Describe the other two possible methods of normalization (changes to the Department schema) and the problems that result when each is applied.

1. no multi-valued attributes no nested relations -attribute can only have single atomic values 2. 1- move department location into Department relation using ID as the foreign key 2- expand the relation's key to include department number and location (creates a separate tuple for each location) - could have redundant values 3- if max location (3), add three nullable location attributes into the relation - could waste space

1. What guarantee does 2 Phase Locking make concerning the schedules produced when transactions follow the 2PL locking protocol? 2. Describe the three restrictions Basic Two Phase Locking places on how transactions are allowed to execute their lock / unlock operations.

1. only serializable schedules will be created by all transactions involved 2. transaction has 2 phases: expanding phase followed by shrinking phase.. during expanding, transaction acquire locks during shrinking, transaction release locks transaction cannot acquire new locks after it releases its first lock (expand-shrink-expand is forbidden)

The following questions refer to 'Primary Index on Ordered Data File's Unique Ordering Attributes'. 1. What are the qualities of the data file attributes that can be used to build a Primary Index as described in the slides? 2. Can a Primary Index as described in the slides, be applied to both ordered and unordered data files? Explain why. 3. Can we have two Primary Index on separate attributes in the same data file? Explain your answer.

1. unique ordering attribute of ordered file (ex: SSN) 2. No, only ordered. indexes anchor records (sparse) 3 No, ordered-attribute, can't have two that rely on the same one

8

8

A. Describe the two methods of inserting records into data files. B. What are the advantages and disadvantages of each?

A. Ordered (sequential) files: maintained by ordering field (table's primary key) Unordered (heap) files: maintained by order they were put in (new = at the end) B. Ordered (sequential) files: can be searched O(log2) using binary search, empty slots made, deleted slots filled in Unordered (heap) files: insertion is efficient, scans all file to find what they want (unless you use an index)

A. What are the two records structures discussed in the slides? Hint: Relates to how records lengths are defined. B. What is an advantage and disadvantage of each structure?

A. Fixed Length and Variable Length Records B. -Fixed Length: simple to calculate the starting byte of each record within the file and starting byte of field within records waste of space bc of max length -Variable Length Records: conserves disc space by using only what it needs, expensive to locate records

A. What are two methods of placing data records into disk blocks? B. What is an advantage and disadvantage of each method?

A. Spanned and Unspanned Records B. Spanned: deletes unused space, does not start at a fixed location Unspanned: easy to identify the location, makes unused space

cardinality of each of the rows? >O ----------------- A >| ------------------ B >|| ------------------ C -|-O --------------- D

A. zero or more B. one or more C. one and only one D. zero or one

Provided the names and brief description of their meaning from the acronym ACID.

A: Atomicity: atomic unit of processing, performed entirely or not at all C: Consistency: take database from one consistent state to another I: Isolation: updates not visible until committed D: Durability: once committed transaction changes database, change cannot be undone or lost

Using a block size of 4096 (1012), repeat the calculation given in the slides "Efficiency of Indexes for Search Operations" to produce the number of block reads. Data File Blocking Factor Number of data file blocks Index File Blocking Factor Number of Index File Blocks 1. The number of block reads to search an un-ordered data file without an index. 2. The number of block reads to search using an ordered data file and no index. 3. The number of block reads using a single-level index against an un-ordered data file.

Data File Blocking Factor X = B/R Number of data file blocks r/X Index File Blocking Factor Y= B/RI Number of Index File Blocks r/Y

use... foreign key for? join table for?

FK = 1-M Join table = N-M

Describe the IN and EXIST clauses. Provide a simple example query for each.

IN Clause: identifies tuples (returns true) that have value in nested relation (nested SELECT) SELECT * FROM U WHERE U.a IN (SELECT V.b FROM V WHERE...) where U.a and V.b are type compatible EXIST clause: identifies tuples that have a value used to make a non-empty relation in a correlated nested query SELECT * FROM U WHERE EXISTS (SELECT * FROM V WHERE U.a = V.a)

Describe the four 'things' maintained by every Lock Table Entry maintained in the database's Lock Table.

ItemID: unique ID LOCK_TYPE: shared or exclusive OWNING_TRANSACTIONS: currently owns item FIFO: queue of transactions

Describe the general solution to normalizing (fixing) relations that are not in second or third normal forms? Hint: _______ through _______ function not covered by primary key?

Normalization through Decomposition -turned into 2 or more relations, each have their own primary key and in 2nd and 3rd normal form

Describe what triggers a Cascade Abort on an uncommitted transaction

T1 triggered by the abort of a second transaction that X depends upon (ex: T1 has used/read a value that second transaction has written, and then aborts it) the schedule S: r1(X); w1(X); r2(X); w2(X); a1 transaction's abort will cause cascade abort for other transaction because it reading x makes it be dependent on w1(X)

In terms of blocking factors, how do indexes promote more efficient searches of the data file?

index-file entry is smaller tan data-file record. each index holds ordering field's value and file record's location (block number with target record) in the UNORDERED data file smaller index = larger blocking factor bc more records per block. finding location of target record in unordered file has less block reads than searching directly in file

Describe how locking accomplishes the goal of preserving the database's correct state when executing multiple concurrently executing transactions.

locking stops execution for conflicting transactions that are already running (current transaction owns and holds onto conflicting item) conflicting transactions will stay blocked until no longer conflicting this happens when lock is released on conflicting item during commit or roll-back

Describe the meaning of multi-valued attribute. How do we convert multi-valued attribute to a weak entity? add more

maintains multiple values for the same attribute each value represented as a separate row in weak entity need: 1- foreign key 2- attribute to hold one choice

In terms of increasing DBMS performance what are the caching techniques described in this section attempting to minimize?

performance overhead to execute block I/O operations that read or write block of data to and from the drive

Integer is a good example of...?

simple attribute

"thing" for entity has what three characteristics?

symbol: has a name intension: represents something in problem domain extension: identify unique instance


Ensembles d'études connexes

3106: EXAM #1 MULTIPLE CHOICE (SPRING 2021)

View Set

(2) Advantages and Disadvantages of Sole Proprietorship

View Set

The four stages of general anesthesia and related nursing intervention

View Set

Reproductive- Lippincott Prostate Cancer

View Set

AP Gov Unit 1+5 Review- Previous Test Questions

View Set

Chapter 15: Assessing Head and Neck

View Set