Database Chapter 4
4.2 When you receive a set of tables, what steps should you take to assess their structure and content?
1. Count rows COUNT (*) and examine columns SELECT (*) 2. Examine data values and interview users to determine (Multivalued dependencies, Functional dependencies, Candidate Keys, Primary Keys, Foreign Keys) 3. Assess the validity of assumed referential integrity constraints.
4.6 Describe two advantages of normalized tables
1. Eliminates modification anomalies 2. Reduces data duplication
4.9 Describe two disadvantages of normalized tables
1. Longer SQL queries and slower application processing 2. Programmers are required to write more complex SQL 3. To recover the original data you must write subqueries and joins to connect data stored in separate tables.
4.22 List four common design problems when creating databases from existing data.
1. The multivalue, multicolumn problem 2. Inconsistent Values 3. Missing Values 4. General-purpose remarks column
4.33 What are three interpretations of null values? Use an example in your answer that is different from the one in the book.
1. The value is inappropriate. 2. The value is appropriate but unknown. 3. The value is appropriate and known, but no one has entered it into the database. An example is a value of a death date that is prior to a birth date. You cannot have a death before you are born.
4.30 Describe two way to identify inconsistent values. Are these techniques certain to find all inconsistent values? What other step can be taken?
1. You can check for referential integrity, which will show the values that don't belong. 2. You can use the GROUP BY on the suspected column. The inconsistent values will stand out. If the list resulting from the select is too long, groups can be selected that have just one or two elements using HAVING.
4.31 What is a null value?
A null value is a value that has never been provided.
4.38 Why should one be wary of general-purpose remarks columns?
All solutions require patience and hours of labor. Don't take such jobs on a fixed-price basis!
4.21 Why are customized duplicated tables not used for updatable databases?
Because every time you create a duplicate table you would run into data integrity issues.
4.8 If data duplication is only reduced, how can we say that the possibility of data inconsistencies has been eliminated?
Data duplication in foreign keys will not cause inconsistencies because referential integrity constraints prohibit them.
4.25 Show how to represent the relation in your answer to Review Question 4.23 with two tables.
EMPLOYEE (EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email...etc) BANKACCOUNTS (EmployeeNumber, BankAccount1, BankAccount2, BankAccount3...etc)
4.5 Summarize how database design principles differ with regards to the design of updatable databases and the design of read-only databases.
For updatable databases modification anomalies and inconsistent data are the primary concerns. Normalization can eliminate these anomalies, reduce the data duplication, and eliminate inconsistencies. Updatable tables should normally be placed in BCNF. For Read-only databases normalization is less desired. If the input data is normalized, it frequently needs to be denormalized by joining it together and storing the joined result.
4.10 Suppose you are given the table: EMPLOYEE_DEPARTMENT (EmployeeNumber-underlined, EmployeeLastName, EmployeeFirstName, Email, DepartmentName, BudgetCode) and you wish to transform this table into the two tables: DEPARTMENT (DepartmentName-underlined, BudgetCode) EMPLOYEE (EmployeeNumber-underlined, EmployeeLastName, EmployeeFirstName, Email, DepartmentName-italicized) Write the SQL statements needed for filling the EMPLOYEE and DEPARTMENT tables with data from EMPLOYEE_DEPARTMENT
INSERT INTO DEPARTMENT SELECT DepartmentName, BudgetCode FROM EMPLOYEE_DEPARTMENT INSERT INTO EMPLOYEE SELECT EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email, DepartmentName FROM EMPLOYEE_DEPARTMENT
4.19 Suppose you are given the DEPARTMENT and EMPLOYEE tables in Review Question 4.10 and asked to denormalize them into the EMPLOYEE_DEPARTMENT relation. Show the design of the EMPLOYEE_DEPARTMENT relation. Write an SQL statement to fill this table with data.
INSERT INTO EMPLOYEE_DEPARTMENT SELECT DEPARTMENT.DepartmentName, BudgetCode, EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email FROM DEPARTMENT, EMPLOYEE WHERE DEPARTMENT.DepartmentName = EMPLOYEE.DepartmentName
4.36 Give an example in which the general-purpose remarks column makes it difficult to obtain values for a foreign key.
If the foreign key is AirplaneModelID and it is in the general-purpose remarks column it may be coded as "Customer wants to purchase a Piper Seneca II" or "Possible buyer for a Turbo Seneca" You have the Airplane Model ID in the remarks but it also contains extra language. If you don't know the exact language you will have a tough time finding the exact foreign key.
4.24 Explain the problems caused by the multivalue, multicolumn table in your example in Review Question 4.23.
If there are only three columns for bank account info and a person has four bank accounts then you have a problem. Also if you are querying the data and are looking for a particular bank account you would have to query each individual column, which is arduous.
4.12 Describe a situation, other than the one for ZIP codes, in which one would choose not to place tables into BCNF. Justify your decision not to use BCNF.
If you had a table that contained social security numbers you would not need to change it into BCNF because they never change.
4.23 Give an example of a multivalue, multicolumn table other than the one discussed in this chapter.
If you have a table where you keep track of someone's bank accounts or credit card numbers. This would require multiple columns or values and could change over time, which would cause issues.
4.27 Explain the following statement: "The multivalue, multicolumn problem is just another form of multivalued dependency." Show how this is so.
Instead of storing multiple rows in a table for each item, multiple named columns are created in the table. The underlying problem is the same, however.
4.32 How does a null value differ from a blank value?
It is not the same as a blank value because a blank value is a value that is known to be blank. A null value is not known to be anything.
4.18 What is denormalization?
Joining the data prior to storage. Typically used in a read-only database.
4.13 According to this text, under what situations should you choose not to remove multi-valued dependencies from a relation?
Never. Always remove multi-valued dependencies from your relations!
4.16 How does the fact that a read-only database is never updated influence the reasons for normalization?
No modification anomalies can occur, therefore it is seldom an advantage to normalize the database.
4.17 For read-only databases, how persuasive is the argument that normalization reduces file space?
Not very persuasive. File space today is nearly free, so you don't need to reduce file space requirements.
4.37 Give an example in which the general-purpose remarks column causes difficulties when multiple values are stored in the same column. How is this problem solved?
One user may code names of the spouse of the contact in the general-purpose remarks column, another may put in the type of purchase they made, or the customer preference of model. You can also use this column for different purposes at different times! The way to solve this is to identify all the different purposes of the remarks column, create new columns for each of those purposes, and then extract the data and store it into the new columns as appropriate. However, this solution can seldom be automated.
4.28 Explain ways in which inconsistent values arise.
One way is when different users have coded the same entries differently. This is hard to detect and will create inconsistent and erroneous information. An example is a SKU_Description of Corn, Large Can where another user may have used a SKU_Description of Large Can Corn. Same SKU, but very hard to reconcile. Another related problem occurs when entries are misspelled, which will cause entries to appear as two separate products.
4.20 Summarize the reasons for creating customized duplicated tables
Organizations create several customized versions of tables for use by different applications, especially if the values of some of the columns in the tables are large and would slow down processing. There is no danger of data integrity problems and storage space is not an issue. By creating customized duplications of the tables it increases efficiency of database use.
4.15 Describe the three uses for a read-only database
Read-only databases aer used in business intelligence systems for querying, reporting and data mining applications.
4.29 Why are inconsistent values in foreign key particularly troublesome?
Relationships will be missing or wrong when foreign key data are coded inconsistently.
4.34 Show SQL for determining the number of null values in the column EmployeeFirstName of the table EMPLOYEE.
SELECT COUNT (*) as EmployeeFirstNameNullCount FROM EMPLOYEE WHERE EmployeeFirstName IS NULL;
4.3 Show SQL statements to count the number of rows and to list the top 15 rows of the RETAIL_ORDER table.
SELECT COUNT(*) FROM RETAIL_ORDER; SELECT TOP 15 * FROM RETAIL_ORDER;
4.4 Suppose you receive the following two tables: DEPARTMENT (Department Name-underlined, BudgetCode) EMPLOYEE (EmployeeNumber-underlined, EmployeeLastName, EmployeeFirstName, Email, DepartmentName) and you conclude that EMPLOYEE.DepartmentName is a foreign key to DEPARTMENT.DepartmentName. Show SQL for determining whether the following referential integrity constraint has been enforced: DepartmentName in EMPLOYEE must exist in DepartmentName in DEPARTMENT
SELECT DepartmentName FROM EMPLOYEE WHERE DepartmentName NOT IN (SELECT DepartmentName FROM EMPLOYEE, DEPARTMENT WHERE EMPLOYEE.DepartmentName = DEPARTMENT.DepartmentName);
4.35 Describe the general-purpose remarks column problem.
The general-purpose remarks column problem is common, serious, and very difficult to solve. Columns with names such as Remarks, Comments, and Notes often contain important data that are stored in an inconsistent, verbal, and verbose manner.
4.26 Show how the tables in your answer to Review Question 4.25 solve the problems you identified in Review Question 4.24.
Using the second structure and employee can have an unlimited amount of bank accounts. Also when you are looking for a particular bank account, you can just query the employee and all their bank accounts will show up.
4.7 Why do we say that data duplication is only reduced? Why is it not eliminated?
We cannot eliminate all duplicated data because we must duplicate data in foreign keys.
4.14 Compare the difficulty of writing subqueries and joins with the difficulty of dealing with anomalies caused by multivalued dependencies.
Writing subqueries and joins is nothing compared to the complexities of dealing with the anomalies caused by multivalued dependencies.
4.1 Summarize the premise of this chapter
You have received, from some source, one or more tables of data that are to be stored in a new database. The question is should that data be stored as is, or should it be transformed in some way before it is stored? Normalization theory plays an important role.
4.11 Summarize the reasons explained in this chapter for not placing ZIP code values into BCNF.
Zip codes almost never change, so technically anomalies could exist if the table was not in BCNF but the likelihood is really low, so it's better to have the advantages of a non-normalized table.