DB Chap 4
How does a null value differ from a blank value?
- A blank value is a value that is known to be blank.
Describe three uses for a read-only database.
- Business Intelligence systems for querying - Reporting - Data mining applications
Why do we say that data duplication is only reduced? Why is it not eliminated?
- Cannot eliminate all duplicated data because we must duplicate data in foreign keys.
Summarize the reasons for creating customized duplicated tables.
- Cost of storage is miniscule and no danger of data integrity problems in a read only database - Create several versions of the table for use by different applications
When you receive a set of tables, what steps should you take to assess their structure and content?
- Count rows and examine columns - Examine data values and interview users to determine: ○ Multivalued dependencies ○ Functional dependencies ○ Candidate keys ○ Primary keys ○ Foreign keys -Assess validity of assumed referential integrity constraints.
Describe a situation, other than the one for ZIP codes, in which one would choose not to place tables into BCNF. Justify your decision not to use BCNF.
- Data is never modified and if data inconsistencies will be easily corrected via the normal operation of business activity
Why are customized duplicated tables not used for updatable databases?
- Duplicated data would risk severe integrity problems
Describe two advantages of normalized tables.
- Eliminate modification anomalies - Reduce duplicated data - Eliminate data integrity problems - Save file space
Give an example of a multivalue, multicolumn table other than one discussed in this chapter.
- Employee -> Degree = BS
Give an example in which the general-purpose remarks column causes difficulties when multiple values are stored in the same column. How is this problem solved?
- Identify all of the different purposes of the remarks column, create new columns for each of those purposes, and then extract the data and store it into the new columns as appropriate.
How does the fact that a read-only database is never updated influence the reasons for normalization?
- If a database is never updated, then no modification anomalies can occur. - The reason to normalize a read only database is to reduce data duplication. -Data might be normalized to speed up processing, but this may take more time to join then searching in small tables.
Explain the following statement: "The multivalue, multicolumn problem is just another form of multivalued dependency." Show how this is so.
- Instead of displaying multiple rows for each entry, multiple named columns are created in the table.
Describe two disadvantages of normalized tables.
- More complicated SQL required for multi-table subqueries and joins - Extra work for DBMS can mean slower applications
Why are inconsistent values in foreign keys particularly troublesome?
- Relationships will be missing or wrong when foreign key data are coded inconsistently or misspelled.
List four common design problems when creating databases from existing data.
- The Multivalue, multicolumn problem - Inconsistent Values - Missing Values - General-purpose remarks column
Summarize the reasons explained in this chapter for not placing ZIP code values into BCNF.
- Zips hardly change -Inconsistent zip values if someone enters wrong city, state, zip.
If data duplication is only reduced, how can we say that the possibility of data inconsistencies has been eliminated?
Data duplication in foreign keys will not cause inconsistencies because referential integrity constraints prohibit them.
Give an example in which the general-purpose remarks column makes it difficult to obtain values for a foreign key.
Data for foreign key may be stored in Remarks column. Users will use remarks column for different purposes.
What is denormalization?
Join the data prior to storage. Maybe for a read only database.
What is a null value?
Missing value that has never been provided.
According to this text, under what situations should you choose not to remove multivalued dependencies from a relation?
Never
For read-only databases, how persuasive is the argument that normalization reduces file space?
Not persuasive. If data is normalized, then data from two or more tables may need to be read and the time required for the join may overwhelm the time savings of searching in small tables.
Show how to represent the relation in your answer to Review Question 4.23 with two tables.
SELECT * FROM EMPLOYEE WHERE EmployeeNumber IN (SELECT EmployeeNumber FROM DEGREE WHERE Degree);
Show SQL for determining the number of null values in the column EmployeeFirstName of the table EMPLOYEE.
SELECT COUNT (*) as QuantityNullCount FROM EMPLOYEE WHERE EmployeeFirstName is NULL;
Show SQL statements to count the number of rows and to list the top 15 rows of the RETAIL_ORDER table.
SELECT Count (*) FROM RETAIL_ORDER; SELECT TOP 15 * FROM RETAIL_ORDER;
4.4 Suppose you receive the following two tables: DEPARTMENT (DepartmentName, BudgetCode) EMPLOYEE (EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email, DepartmentName) and you conclude that EMPLOYEE.DepartmentName is a foreign key to DEPARTMENT.DepartmentName. Show SQL for determining whether the following referential integrity constraint has been enforced: DepartmentName in EMPLOYEE must exist in DepartmentName in DEPARTMENT
SELECT DepartmentName FROM EMPLOYEE WHERE DepartmentName NOT IN (SELECT DepartmentName FROM EMPLOYEE, DEPARTMENT WHERE EMPLOYEE.DepartmentName = DEPARTMENT.DepartmentName);
Describe the general-purpose remarks column problem.
Very serious, common, and very difficult to solve. Columns with names such as Remarks, Comments, and Notes often contain important data that are stored in an inconsistent, verbal, and verbose manner.
Why should one be wary of general-purpose remarks columns?
Very serious, common, and very difficult to solve. Columns with names such as Remarks, Comments, and Notes often contain important data that are stored in an inconsistent, verbal, and verbose manner.
Compare the difficulty of writing subqueries and joins with the difficulty of dealing with anomalies caused by multivalued dependencies.
Writing subqueries and joins is nothing compared with the complexity of code that must be written to handle the anomalies due to multivalued dependencies.
Explain the problems caused by the multivalue, multicolumn table with degrees
You would have to request Degree 1, Degree 2, Degree 3 etc for each employee.
Describe two ways to identify inconsistent values. Are these techniques certain to find all inconsistent values? What other step can be taken?
1. Check for referential integrity 2. Use the GROUP BY on the suspected column. - Neither check is foolproof. Sometimes you just have to read the data. -Another step is to develop an error reporting and tracking system to ensure that inconsistencies that users do find are recorded and fixed.
What are three interpretations of null values? Use an example in your answer that is different from the one in this book.
1. Value is inappropriate 2. Value is appropriate, but unknown 3. Value is appropriate and known, but no one has entered it.
Explain ways in which inconsistent values arise.
Different users or different data sources may use slightly different forms of the same data value. Different codes, misspellings, etc.
Suppose you are given the table:EMPLOYEE_DEPARTMENT (EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email, DepartmentName, BudgetCode) and you wish to transform this table into the two tables: DEPARTMENT (DepartmentName, BudgetCode) EMPLOYEE (EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email, DepartmentName) Write the SQL statements needed for filling the EMPLOYEE and DEPARTMENT tables with data from EMPLOYEE_DEPARTMENT.
INSERT INTO DEPARTMENT SELECT DISTINCT DepartmentName, BudgetCode FROM EMPLOYEE_DEPARTMENT; INSERT INTO EMPLOYEE SELECT EmployeeNumber, EmployeeLastName, EmployeeFirstName, Email, DepartmentName FROM EMPLOYEE_DEPARTMENT;
Suppose you are given the DEPARTMENT and EMPLOYEE tables in Review Question 4.10 and asked to denormalize them into the EMPLOYEE_DEPARTMENT relation. Show the design of the EMPLOYEE_DEPARTMENT relation. Write an SQL statement to fill this table with data.
INSERT INTO EMPLOYEE_DEPARTMENTSELECT DEPARTMENT.DepartmentName, BudgetCode, EmployeeNumber, EmployeeLastName, EmployeeFirstname, Email FROM DEPARTMENT, EMPLOYEE WHERE DEPARTMENT.DepartmentName = EMPLOYEE.DepartmentName