Database Test 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Describe the general solution to normalizing (fixing) relations that are not in second or third normal forms? Hint: _______ through _______

"Normalization through Decomposition" In much the same manner as correcting an entity class that lacks cohesion from OOAD, a relation that contains functional dependencies not covered by the relation's primary key can be decomposed into 2 or more relations, each of which have their own primary keys and are in second and third normal form.

Describe the DAO design pattern.

'Data Access Object' is a design pattern that describes a method of encapsulating in a single service class the responsibilities of persisting an entity class. Note: The major responsibility of a DAO is to encapsulate and isolate changes to the DB schema from the rest of the application's design.

Describe the meaning and purpose of ORM.

'Object-Relational Mapping' describes the process mapping application object instances to / from database tables. That is, the transformation of an object's state (i.e. its attribute values) into a row in a database table designed to maintaining the object's class. Also, the transformation of a table row back into an object instance with the same persisted state

When designing an ORM mechanism how do we map the application's classes to a database schema? Hint: There are three steps.

1 Create a table for every class to be persisted whose attributes (columns) types match the attributes types of class. 2 Map each object instance to a table row. 3 We map associations in the OO design to foreign key relationships in the schema between tables.

As described in the text, what are the two strategies employed by the DBMS query processor for searching for multiple attributes in the same table (file) when one or both attributes are indexed? For example, searching for customers with a given last-name and zip-code. Hint: Three strategies are given in the book, but the first two are identical.

1. If only one of the two attributes is indexed, search for the records that match the indexed attribute value, and linearly search the identified records for the non-indexed attribute value. For example, if only zip-code is indexed, 1) search for the given zip-code and 2) linearly search the zip-code result set for records with the given last-name. 2. If both attributes are individually indexed (both zip-code and last-name have separate indexes), then perform two searches and return the intersection of both result sets.

When using JDBC, what are the seven workflow steps involved with programmatically executing a select (query) statement and returning the result set?

1. Load the DBMS vendor-specific JDBC Driver into the application (handled automatically by the DataSource). 2. Open a Connection to the DBMS (ID / Password / URL). 3. Build a Statement containing the SQL to be executed. 4. Execute the Statement though the Connection. 5. Retrieve a ResultSet containing the query results. 6. Iterate through, and process each row (tuple) in the ResultSet. 7. Close the Statement and then the Connection.

What are the four conditions that must exist for a deadlock between two transactions to form?

1. Mutual Exclusion: Only a single transaction can hold a resource while other requesting transactions are blocked while waiting. 2. Hold & Wait: A transaction will hold owned resources while also blocked waiting for other requested resources to be allocated. 3. No Preemption: A transaction cannot be forced to release the resources it holds. 4. Circular Dependency: There exist a set of blocked transactions where each transaction : 1) Holds a resources needed by another transaction in the set. 2) Requires a resource held by another transaction in the set.

What guarantee does 2 Phase Locking make concerning the schedules produced when transactions follow the 2PL locking protocol?

2PL guarantees that only serializable schedules will be generated when 2PL is applied by all of the transactions involved in the schedule.

Describe the four 'things' maintained by every Lock Table Entry maintained in the database's Lock Table.

<ItemID, LOCK_TYPE, OwningTranscation(s)> and the Blocked Transactions FIFO. ItemID: The unique ID of the item locked by the transaction e.g. row, table, etc. LOCK_TYPE: The lock's type (Shared or Exclusive). OWNING_TRANSACTIONS: The transaction that currently own the item. Exclusive locks are restricted to a single transaction. Shared locks can be shared by multiple transactions i.e. an item can be read-locked by multiple transactions. A FIFO queue of transaction that are blocked waiting to gain ownership of the item allowing the transaction to continue its execution.

There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. How would the requirement of 'multiple phone numbers can be assigned to an Employee' affect your decision?

A 1-M relationship between employee and phone numbers would require the use of a weak entity.

What is the relationship between an application entity class to be persisted and a DAO class?

A DAO class should be defined for every application entity class to be persisted. For example, the company schema would result in a CustomerDAO, AddressDAO, ProjectDAO, etc.

The following questions refer to 'Primary Index on Unique Ordering Attributes'. What are the qualities of the data file attributes that can be used to build a Primary Index as described in the slides?

A Primary Index is defined on a unique ordering attribute of an ordered data file e.g. SSN in the example Employee table.

What is meant by exclusive and shared locks?

A Read Lock is shared because it allows multiple concurrent accesses to a database item (shared access) by multiple transactions performing only read operations (no writes). A Write Lock is exclusive because it permits only a single transaction to access the item for an exclusive write operation.

Describe the primary difference between a dense and sparse index.

A dense index maintains an index entry for every record in the data file. A sparse entry maintains an index entry for every anchor record (every block) in the data file

Describe the issue with representing and assigning identity to instances of entities in a DBMS.

A entity maintains its own independent identity among all other objects in the system. This means that a entity maintains (is identified by) a primary key in the table defined to maintain the entity type. When inserting a new entity type into its table, it is advisable to allow the DBMS to auto-generate the new row's primary key. This can be accomplished using an auto-increment column (MySQL & SQL Server) or a Sequence (Oracle).

Describe the meaning of a Functional Dependency between attributes in a single relation A1 → A2. Use an example such as the Employee relation's SSN (Social Security Number) and the remaining attributes in the relation. Provide an example of how functional dependencies are not symmetric i.e. if A1 → A2 it is not necessarily true that A2 → A1.

A functional dependency A1 → A2 means that for every unique value assigned to attribute A1, the same value must be found assigned to A2. That is, A1 consistently identifies A2. For example, every tuple that has a given social security number the same first name, last name, etc. must be present in the tuple. In other words, the same SSN (A1) must not result in different employees / names (A2). If a SSN identifies a customer's (fname, lname), it is not the case that unique (fname, lname) identifies the same SSN. Naturally, other examples are possible.

What is the significance of a larger blocking factor?

A larger blocking factor (i.e. 4096/2028 = 2 vs 4096/1024 = 4, etc.) implies that more records and so more information is being retrieved with every disk READ BLOCK operation. We want to reduce the number of I/O operations. By increasing the amount of information maintained in each block, we reduce the number of block reads and increase the efficiency of record search and other DBMS operations.

Describe the structure of a multi-level index.

A multi-level index is a single-level index with additional layers that serve to index the index i.e. a multi-level index has first, second, etc. levels. Like the single-level index, the first level index entries are physically ordered according to the indexing attribute values of the data records being indexed. The multi-level index is structured as tree of disk blocks where each layer (level) in the tree indexes the complete range of data record index values. The first level (leaves of the index tree) reference the record address (block address) in the data file. The second level entries index the anchor records (first) entries of the first level index entry blocks. The third level entries index the anchor record entries of the second level index entry blocks. Etc.

Describe Query-Based Views and Materialized Views

A query-based view is implemented as an underlying select statement which is evaluated each time the view is queried. A Materialized View is implemented as a physical table whose tuples are provided by the view's select statement. The materialized view must be periodically updated so that its contents remain synchronized with the underlying base tables. A materialized view is most useful when aggregating data based on time. For example, aggregating data from the previous day, week, month, etc. This is data that will not change after the close of its period (day, week, etc).

Why must a secondary index be a dense index?

A secondary index is created against the data file's non-ordering attributes i.e. not the physical ordering of the data file blocks. So unlike a sparse index, a secondary index requires an index entry for every record in the data file. This is the definition of a dense index.

Why is a secondary index less efficient that a Primary Index on an ordered data file?

A secondary index is less efficient because the index requires N-1 times more index entries than a primary index where N is the data file's blocking factor (number of data records that fit into a block.

What type of select operation benefits most from a B+ tree? Briefly explain your answer.

A select operation that is identifies a range of consecutive indexed values. For example a BETWEEN operation identifying a range of records between two indexed attribute values e.g. a date. This is because the leaf nodes of the B+ tree maintain pointers to the next index block and moving from one block to the next in an ascending or descending order is an O(1) operation.

Can we apply a sparse index to an unordered data file? Explain your answer.

A sparse index cannot be utilized for an unordered data file. A sparse index relies on the fact that the indexed attribute value is ascending across data file blocks. That is, that the indexed attribute values on index block i are less than the values on index block i+1 (assuming ascending order). An unordered data file places records randomly across its blocks, and so each unique index value (data record) requires a pointer to the record's block i.e. a dense index.

Describe the restrictions Basic Two Phase Locking places on how transactions are allowed to execute their lock / unlock operations.

A transaction has two phases: Expanding followed by Shrinking. During the expanding phase, transactions acquire locks. During the shrinking phase, transactions release locks. A transaction cannot acquire new locks after it releases its first lock i.e. expand-shrink-expand is forbidden.

Describe the meaning and purpose of Triggers in a database design.

A trigger is a mechanism that executes a function (stored procedure) on the database server itself. Triggers are attached to specific tables and their execution is 'triggered' under conditions that are described when the Trigger is created (CREATE TRIGGER ...).

What is the reason for providing views in our schema?

A view provides a method of encapsulating (hiding) commonly used or complex queries from the database users. Second, like interfaces from OO languages, a view provides a façade to the database users whose implementation can be modified without the effecting the clients that make use of the view. That is, the select statement and / or underlying tables can be modified and no changes will be required of the clients so long as the view's attributes remain unchanged.

A. Describe the two methods of inserting records into data files. B. What are the advantages and disadvantages of each?

A. Ordered (Sequential) Files: Records are physically maintained in an order based on an Ordering Field (Usually the table's primary key). Unordered (Heap) Files: Records are physically maintained in the order they were inserted into the database i.e. new records are appended to the end of file. B. Ordered Files: Advantage: The data can be searched in Log2 time using a binary search. Disadvantage: An empty slot must be made in the data when inserting a new record into the 'middle' of the file. Likewise, empty record slots must be 'filled in' with upstream records to fill the hole created by deleted records. Unordered Files: Advantage: New records are appended to the end of the file so insertion is very efficient. Disadvantage: Searching the file for a specific record requires a scan of the entire file's contents (unless an index is applied).

A. What are two methods of placing records into disk blocks? B. What is an advantage and disadvantage of each method?

A. Spanned and Unspanned Records. B. Spanned: Advantage: Eliminates unused space in each disk block. Disadvantage: Records do not start at fixed location in each block. Unspanned: Advantage: Easy to identify the location of a specific record in the file. Disadvantage: Produces unused space in each disk block.

In terms of blocking factors, how do indexes promote more efficient searches of the data file?

An index-file entry is typically much smaller than the data-file record. Each index file entry maintains only the ordering field's value and the location of the field's record (typically the block number containing the target record) in the unordered data file. The smaller index entry means a larger blocking factor (B/R or more records per block) than the data file records so an I/O operation retrieving index file entries contains more information per block read. Identifying the location of a target record in an unordered file will require fewer block reads than searching for the record directly in the data file.

Describe the issue with the inner-join that the use of outer-join solves.

An inner join between two relations will produce a new (join) relation that contains tuples where there is a match between tuples from both the left and right hand tables. But consider the case where there are tuples in left table without a matching tuple in the right (or vice versa). The problem with inner joins arises when a query must include all the tuples in either the left or right hand tables e.g. a report on employee-project participation when some employees are not assigned to any projects. The left outer join will produce a result set with the tuples from the inner join operation and the tuples from the left table that have NO MATCH with the right table. These unmatched left table tuples will contain null entries for the columns provided by the right table. The opposite is true of the right outer join; all the tuples from the right table with null entries for the left table columns.

Describe what is wrong with this update operation? Hint: There is nothing wrong with the SQL. The problem is what is being updated. UPDATE customer SET id = ?, first_name = ?, last_name = ?, gender = ?, dob = ?, email = ? WHERE id = ?;

An update should NEVER modify a table's primary key. The key is set when the row is inserted i.e. when the entity (customer, product, etc.) is added to the system. For example, using an auto-increment column.

What method does the JDBC DataSource API provide to its clients to obtain an open Connection?

DataSource.getConnection() returns an open connection to the DBMS.

Describe the statements and conditions where a trigger can be attached in the MySQL DBMS.

In MySQL (and most other DBMS) a Trigger is attached to a table and is executed when tuples are inserted, updated, or deleted from the table. The trigger can be specified to execute before or after the insert, update, or delete operation. The trigger can optionally include conditional statements that determine when should execute.

In terms of Blocking Factors, what is the advantage of using an index file to search for a specific record in a data file? Hint: The Blocking Factor of the index file vs. the Blocking Factor of the data file.

Increasing the performance of most database operations depends on decreasing the number of blocks that must be read or written from/to the disk. The blocking factor of an index file is much greater (better) than the BF of the data file because more index entries will fit on a block than records. When searching for a target record, we can locate the data file block containing the record with fewer block reads when we search with the index rather than retrieving and searching the data file records directly. That is, the I/O efficiency of the index file is much higher than the data file.

Describe a correlated nested query (correlated subquery)

It is a subquery (select) with one or more filter expressions that are bound to the outer query. (in the slide "correlated nested queries", the subquery filter E.dno is bound to the EMPLOYEE row in the outer query.)

Describe what VIEW objects are

It is a virtual table whose contents / tuples are generated by a select statement that is embedded in the VIEW's CREATE statement.

Describe a simple nested query

It is an independent subquery (select) that provide a result set that is used by the outer query to perform some filtering or comparison.

What two advantages do schedules produced by Conservative 2PL offer above the schedules produced by over basic 2PL?

Like Strict 2PL, schedules generated by Conservative 2PL will not trigger cascade aborts. Second, Conservative 2PL schedules are free of deadlocks.

Describe how locking accomplishes the goal of preserving the database's correct state when executing multiple concurrently executing transactions.

Locking blocks (suspends) the execution of transactions with one or more operations that are in conflict with owning transactions that are already running i.e. the transaction that currently is accessing (owns) the conflicting item. Conflicting transactions remain blocked until the conflicting operation is no longer in conflict with previously running operations in the schedule. This typically occurs when the owing transaction commits or rolls-back.

At a minimum, what services (operations) should a DAO provide to its clients? What are the additional operations described in the slides?

Minimum: has to provide CRUD operations The DAO also provides query methods that implement specialized selection and aggregation operations that target the DAO's entity class as required by the application i.e. summary reports, etc.

In terms of increasing DBMS performance what are the various techniques described in this section attempting to minimize?

Most of the performance overhead in maintaining a database is the time needed to execute the block I/O operations that read or write blocks of data to / from the drive.

The following questions refer to 'Primary Index on Unique Ordering Attributes'. Can a Primary Index as described in the slides, be applied to both ordered and unordered data files?

No. Because a Primary Index indexes the anchor records of an ordered data file's blocks, and not individual data records (i.e. it is sparse), a Primary Index can only be applied to ordered data files.

The following questions refer to 'Primary Index on Unique Ordering Attributes'. Can we have two Primary Index on separate attributes in the same data file? Explain your answer.

No. Primary Indexes required an ordered data file i.e. a data file that is physically ordered by its ordering-attribute(s). So we cannot have two indexes that rely on an ordering of different attributes on the same file i.e. we cannot have two physical orderings of records in a single file.

What are the disadvantages of query-based views and Materialized views?

Query-based view disadvantage: the view's select statement must be evaluated each time the view is queried. This may be processing intensive. Materialized view disadvantage: is that its contents will become invalid- i.e. no longer synchronized with the underlying base tables. How quickly the materialized view becomes out of date depends on how often the base tables change.

Informally describe the conditions to be met for a relation to be in First Normal Form.

Relations should have not multivalued attributes or nested relations. Alternatively, attribute are permitted to maintain only single atomic (indivisible) values.

Describe the result set of the statement: Select * from employee where name like 'Smi%'

Returns a result set containing zero or more rows of the employees whose lname attribute starts with "smi"

Describe the result set of the statement: Select count(*) from employee where lname like 'Smi%'

Returns a result set with a single row/single attribute containing the number of rows as an integer from employee that matched the filter criteria.

Describe how simple nested queries and correlated nested queries are evaluated.

Simple subquery: the subquery can be evaluated once to provide the result needed by the outer query to perform its filtering. Correlated nested query: the subquery must be re-evaluated for each of the rows identified by the outer query (i.e. the average department salary must be identified for each employee in the outer query.)

Which type of SQL statement (select, insert, update, delete) is executed in each method (update vs query)?

Statement.executeQuery: Select Statement.executeUpdate: All the other SQL statement (update, delete, insert, create, etc.)

How does the Strict 2PL protocol differ from Basic 2PL protocol?

Strict 2PL requires that a transaction release its write locks only after the transaction ends (i.e. commits or aborts) i.e. write-locks cannot be released before the transaction's end.

What advantage does schedules produced by Strict 2PL offer over schedules produced by basic 2PL?

Strict 2PL schedules will not trigger cascade aborts. This is because a second transaction can only obtain locks on data that has been committed. Put another way, the second transaction cannot obtain a lock on data that has been modified by the first transaction, but not committed or aborted triggering the cascade abort.

What is the difference between a strong entity and weak entity?

Strong Entity maintains a primary key. Weak Entity does NOT maintain a primary key but only a foreign key reference to a strong entity. Example: Customer Address and CreditCard might be implemented as weak entities without primary keys but maintaining a FK relation to the owning Customer

Describe how grouping and aggregation functions are combined to create summary reports.

Summary reports usually include aggregation (summation, average, min and max, etc.) of information within groups of tuples (department, categories, dates, etc.). For example, the average employee salary grouped by project. This indicates the need to: 1) group tuples in the result set by a specific attribute value and then 2) aggregate attribute values exclusively within each group (as opposed to across the entire result set).

Each of these questions refer to ordered data files. What is the relationship between the anchor records on block i and block i+1?

The Anchor Record (AR) index field value of block i should be < the Anchor Record value of block i+1.

How does a B-Tree differ from a B+-Tree?

The B-Tree maintains both node pointers, and <key-values & record (block) pointers> in the tree nodes. There is no difference between internal and leaf nodes. The B+-Tree maintains node pointers and key values in the internal nodes. Leaf nodes are reserved for <key, record-pointers> The B+-Tree maintains the leaf node blocks as a linked list so that indexed records can be transversed (visited) in index order in linear time.

How does the Conservative 2PL protocol differ from Basic 2PL protocol described above?

The Conservative 2PL protocol dictates that 1) A transaction must lock all of the items it will access before the transaction starts. 2) A transaction releases its locks only after the transaction commits or aborts.

How does a boolean expression that evaluates to UNKNOWN in a where clause contribute to the result set?

The DBMS wont get the results of a boolean expression with a value of UNKNOWN UNTIL it knows the result of that expression. The DBMS treats a boolean expression that evaluates to unknown as a FALSE!!!

Why is the DataSource sometimes called a Connection pool? What is the state of the Connections maintained by the DataSource?

The DataSource maintains a pool of up to N Connections that are in an open state. i.e. the connections are actively connected to the DBMS.

Describe the IN and EXISTS clauses. Provide a simple example query for each.

The IN clause identifies tuples (returns true) containing a value found in a nested relation (usually a nested SELECT statement). SELECT * FROM U WHERE U.a IN (SELECT V.b FROM V WHERE ...) Where U.a and V.b are type compatible. The EXISTS clause identifies tuples containing a value that is used to produce a non-empty relation in a correlated nested query. SELECT * FROM U WHERE EXISTS (SELECT * from V where U.a = V.a)

Each of these questions refer to ordered data files. What is the relationship between the anchor record and the other records on same block?

The anchor record's index field value should be ≤ all other field values of other records on the same data block (assuming ascending order).

Describe why the following relation PURCHASE is not in third normal form. PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE)

The attributes CUSTOMER_CITY and CUSTOMER_ZIPCODE are functionally dependent on CUSTOMER_ID which is not a primary key.

When designing the schema with a 1-N foreign-key relationship, which side of the relationship (owning or owned) maintains the foreign key attribute?

The owned (many) side of the relationship maintains the foreign key attribute referencing the owning entity primary key e.g. the Purchase table maintains the FK into the Customer table.

When designing / coding a 1-N relationship in an object model, which side of the relationship (owning or owned) maintains the collection of object instances e.g. an ArrayList?

The owning (single instance) side of the relationship maintains the collection of many owned objects identified by the relationship.

What is the advantage of the sparse over the dense index?

The sparse index requires fewer index entries to cover every record in the file and so fewer disk block are needed to maintain a sparse index. During a search, fewer disk read operations are needed to identify the data file block containing the target record resulting in faster and more efficient searches.

Inner Join Question: "What are the Join Conditions?"

They define the relationship between the two tables that are being joined. They define the base set of tuples in the result set generated from the joined tables.

Inner Join Question: "What are the Filter conditions?"

They select which tuples in the result set defined by the join clauses are contained in the query's result set....kinda like how the 'where' clause filters tuples from a select on a single relation.

Describe the suggested method of eliminating the use of a multi-valued attribute to represent the employee's three favorite colors. EMPLOYEE Ename | Ssn | Bdate | Color_1 | Color_2 | Color_3

This is another example of the use of a weak relation to implement a 1-M relationship between Employee and zero or more color choices. Note for the exam you may be asked to describe the new weak relation's table.

Each of these questions refer to ordered data files. Provide a short description of a data block's "Anchor Record".

This is the first record of a data block. The record contains the starting indexing field value of the sequence of records stored on the data file block.

Describe why the following CUSTOMER_PURCHASE relation is not in second normal form. CUSTOMER_PURCHASE (CUSTOMER_ID, PRODUCT_ID, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME)

This relation mixes attributes from a customer's purchase (CUSTOMER_ID, PRODUCT_ID) → (PURCHASE_DATE, & PURCHASE_AMT) with attributes that are dependent on only the Product ID → PRODUCT_DESCRIPTION and Customer ID → CUSTOMER_NAME.

What is the tri-state boolean logic employed by DBMS?

Tri state logic maintains that value is either: TRUE, FALSE, or UNKNOWN (null-neither true nor false).

Describe why: True AND Unknown = Unknown, but False AND Unknown = false

True AND unknown remains unknown until that second unknown is evaluated. False AND unknown is false regardless of the value of its second attribute.

Describe why: True OR Unknown = TRUE, but False OR Unknown = UNKNOWN

True Or Unknown is TRUE regardless of second value. False OR Unknown remains unknown until the value of that second unknown is evaluated.

When designing a DAO for a entity, describe the issue with creating vs updating the instance wrt the instance's identity. Hint: The difference in managing the entity's ID within an insert vs. an update operation.

When inserting a new instance of an entity into the DBMS, the entity's primary key field should initially be NULL. A null ID indicates that this is a new entity (customer, product, etc.) and it is being persisted for the first time. The implementation of the DAO's insert operation should include retrieving the auto-generated primary key created when the new row was inserted. Using the appropriate JDBC method, the DAO operation should retrieve the object's new key and set the id attribute on the entity instance before returning to the calling client.

What is the issue encountered when retrieving a strong entity that maintains several associations in the application's object model (design)?

When retrieving a strong entity with many nested relationships, the issue is how deeply do we transverse the object network when retrieving the target object? Example: when retrieving a Customer instance do we also retrieve and return all objects reachable from the customer like the customer's addresses, credit cards, purchased items, item descriptions, etc.?

Each of these questions refer to ordered data files. In terms of anchor record values, what block might the record containing a searched-for field value VAL be found?

When searching for a specific index field value (VAL), the record will be located on block i where ARi ≤ VAL < ARi+1

How is a Connection returned to the DataSource?

When the client closes a connection (Connection.close()), the DataSource marks the connection as 'returned to the pool' and available to other DS clients. The DataSource does NOT close the connection between the client process and the DBMS.

What are the benefits of, and method of executing multiple SQL statement in a JDBC batch?

You can queue up a number of SQL statements and execute them in a single operation against the DBMS. Faster than one statement at a time. To do this, use addBatch() and executeBatch(). addBatch() adds the given SQL statement to a list (batch) of statements to be executed in a single DBMS operation. executeBatch() executes all of these statements in a single DBMS operation.

Describe the differences between the methods Statement.executeQuery() and Statement.executeUpdate().

executeQuery returns a JDBC ResultSet containing the ACTUAL rows executeUpdate returns an integer with the NUMBER of rows affected

Present a query which provides the average employee hours worked per project. Use the explicit join notation.

select pno, avg(hours) from employee join works_on on ssn = essn group by pno;

Describe why the use of triggers should be avoided.

1) Behaviors execute automatically (asynchronously) and without apparent reason from the user's perspective. 2) They are difficult to debug. This is especially true when dependencies between multiple triggers (e.g. Trigger A causes B to fire, causing C to fire, etc.) producing a cascade of database modifications and / or exceptions that are difficult to follow. 3) They execute on the DBMS which makes them difficult to debug or to even know they exist.

Describe the problems that occur when two problem domain entities are combined into a single relation. 1) For example, trying to maintain individual Employee and Department when both entities are combined into a single relation EMP_DEPT. 2) How do we decompose this relation to solve this problem?

1) It is difficult to maintain single instances of entities (employees or departments) when both are combined into a single relation. For example, if we want to create a new department, we need to pair with an employee when one may not exist. 2) Likewise, it is difficult to create an instances of one entity when there is initially no association with the second entity. For example, creating a new employee that has not been assigned to a department. Also, if we move all employees out of a department, the system also loses all knowledge of the department. This is likely not desired. Also important: - There is significant wasted storage space and duplication of information. - We need to use nullable attributes to implement optional information which can result in an inefficient use of table space. If we want to update employee or department specific attributes, we must update multiple rows in the combined relation.

What are the three principle properties of transaction described in the slides?

1. From the POV of the database's state either a transaction's operations complete entirely or are never executed. 2. Multiple transactions can be executing concurrently against shared (common) DBMS data (rows, attributes, etc.). 3. Depending on it's "Isolation Level", a transaction can be isolated from changes made to data that is being shared with other concurrently executing transactions.

Consider the differences in search efficiency when indexing using a surrogate integer field vs using an email address hint: two differences.

1. The comparison of an integer value requires one comparison operation vs N comparisons needed to compare a string of length N. 2. The index file entry blocking factor will be greater for the integer index value vs a variable length string.

What are the three conditions that identify a conflict between two operations in the same schedule?

1. The operations belong to different transactions. 2. The operations access the same item X. 3. At least one of the operations is a write_item(X).

Rewrite the following query WITHOUT the ALL clause, and using the MAX() function. SELECT lname, fname, salary FROM EMPLOYEE WHERE salary > ALL (SELECT salary FROM EMPLOYEE WHERE dno=5)

Answer: SELECT lname, fname, salary FROM EMPLOYEE WHERE salary > (SELECT max(salary) FROM EMPLOYEE WHERE dno=5)

What are the restrictions on the types of SQL statements that can be executed in a JDBC batch?

Because a SQL Select statement returns a ResultSet, it cannot be executed in a batch.

How does the DataSource increase the performance of an application interacting with the DBMS?

Because the connections in the DS's pool are 'connected', the application avoids the processing delay of having to open & close a network connection to the DBMS for every JDBC operations.

Explain the difference between Binary and Read/Write locks.

Binary locks prevent any type of access of the locked item by other transactions (read or write). Read/Write locks consider the differences between read and write operations i.e. that multiple operations can have read locks on the same object because multiple operations may safely read the same item without conflicting.

Define "Blocking Factor" of both Index and Data Files.

Blocking Factor is (B/R) where B is the block size and R is the record size. Blocking Factor (BF) describes how many 'units' fit on a single drive block. The BF of an index file describes how many index entries will fit in a block. The BF of a data file describes how many records (rows) will fit on a block.

Describe "Blocking Factor".

Blocking factor describes the number of records that can be placed in a disk block. It is defined as floor(B/R) where B and R are the block and record sizes.

Describe the similarities between the methods Statement.executeQuery() and statement.executeUpdate().

Both methods accept, execute, and return the result generated by the execution of a string containing valid SQL against the database.

Inner Join Question: "How do explicit joins differ from implicit joins in terms of these conditions?"

Explicit joins maintain their join conditions in the FROM portion of the SELECT statement. Implicit joins mix both the join and filter conditions in the single WHERE clause. NOTES: Explicit joins are easier to understand because the join and filter conditions are shown to you in the select statement itself.

What are the two records structures discussed in the slides? And What is an advantage and disadvantage of each structure? Hint: Relates to how records lengths are defined.

Fixed Length and Variable Length Records. Fixed Length: Advantage: Simple to calculate the starting byte of each record within the file and starting byte of fields within records. Disadvantage: Wasteful of space when storing variable length data because the max data length must be allocated for every record e.g. Strings. Variable Length: Advantage: Conserves disk space by using just the space needed to hold variable length data. Disadvantage: Expensive to locate specific records within a file or specific fields within records. Files must be scanned to locate the Nth record.

There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. --Given a requirement that a large number of the employees (> 80%) are assigned a phone number, which is the preferred method and why?

If a large percentage of EMPLOYEE tuples have a single office number, the use a NULLABLE attribute would seem appropriate. This is because the nullable attribute is more efficient (No Join Needed) and the large percentage of employees with numbers indicates that the nullable attribute will not be wasteful of table space.

There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. Given a requirement that only a small number of employees (< 10%) are assigned phone numbers, which is the preferred method and why?

If a small percentage of EMPLOYEE tuples will have a single office number, the use of a nullable attribute would waste table space and the WEAK ENTITY is the best choice. However, the additional processing of performing a Join with the weak relation may still make the nullable attribute more attractive.

How would this relation suffer from Update Anomalies? PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE)

If the customer were to change their address (city or zip-code), we would need to update all of the customer's purchases based on the customer_id attribute. And again, when we remove purchases, we are removing customer information.

Both an ordered data file and an index file can be search in Log2N data file bock reads. Why is it is more efficient to search for a value using the index?

The blocking factor of the index file is (usually) much greater than the data file and the index can be used to locate the target data file block in fewer disk block reads.

How would this relation suffer from Update Anomalies? CUSTOMER_PURCHASE (CUSTOMER_ID, PRODUCT_ID, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME)

The first problem is that because PURCHASE_DESCRIPTION and CUSTOMER_NAME are partially dependent on the relation's PK, these attributes will be duplicated in every tuple that shares the same product or customer. Changing the value of a Customer's name or Product's description will require updating every tuple that references the entity being modified. A second problem is that that until a customer makes a purchase, they are not represented in the database (assuming no other table maintains customer information). A third problem is that if we eliminate a product from the system (for whatever reason) we will remove all its purchases and the customer information associated with the purchase. This could result in removing customers entirely from the system.

Informally describe the conditions to be met for a relation to be in Second Normal Form.

The informal description of 2NF is: For relations where the primary key is a compound key (i.e. keys defined from multiple attributes), all nonprime attribute (attributes not part of the primary key) must be functionally dependent on all the key's attributes.

Informally describe the conditions to be met for a relation to be in Third Normal Form.

The informal description of 3NF is: A relation should not have a non-key attribute functionally dependent on another non-key attribute. That is, there should be no transitive dependencies of a non-key attribute on the primary key.


Kaugnay na mga set ng pag-aaral

FRENCH 7 - DIRECT OBJECT PRONOUNS #2

View Set

Kentucky Life Only (Focused Concepts)

View Set

Better Chinese Lesson 14 What is he wearing today? 他今天穿什麼?

View Set

Basic Spanish Premium (Unidad 19)

View Set

ch 26 health promotion and pregnancy

View Set

Chapter 10: Fundamentals of Law for Health - HIPAA Security Rule Notes

View Set

CMS 2 Assignment 4: Job Evaluation - Measuring Content, Value, and External Market Links

View Set