ISMN 3830 EXAM 1
When you receive a set of tables, what steps should you take to assess their structure and content?
(1) Examine the table structure and contents (2) Examine data values and interview users to determine: (a) multivalued dependencies, (b) functional dependencies, (c) candidate keys, (d) primary keys, and (e) foreign keys, and (3) Assess the validity of any assumed referential integrity constraints.
Name three sources for databases.
(1) Existing data, (2) the development of new information systems, and (3) the redesign of existing databases.
How does a null value differ from a blank value?
A blank value is a value that is known to be blank, whereas the value of a null value is unknown.
What is a candidate key?
A candidate key is any attribute or set of attributes that uniquely determines the values of all the other attributes for a row in a relation.
Surrogate key
A column of artificial data added to a relation to serve as the primary key.
What is a composite key?
A composite key is a key that has two or more attributes or columns.
What is a composite determinant?
A determinant that has two or more attributes.
Under what conditions will a determinant be unique in a relation?
A determinant will be unique in a relation only if it determines every other column in the relation. Since a relation cannot have two rows with identical data, at least one column must have unique values. The determinant will therefore have to be unique and will function as the identifier of the rows.
Define the term domain and explain the significance of the domain integrity constraint to a relation.
A domain is a collection of like items (such as integers), and a domain integrity constraint enforces the fact that all column values come from the same domain.
Why do we say that data duplication is only reduced? Why is it not eliminated?
Data duplication is only reduced, not eliminated, in normalized tables because some data duplication is inherent in the use of foreign keys.
What is a foreign key? Explain the significance of the referential integrity constraint to a foreign key.
A foreign key is the attribute (or set of attributes) that is a primary key in one table that is then placed in a second table to form a relationship to the first table by storing linking values. The term foreign key refers to the attribute or set of attributes in the second table. The referential integrity constraint enforces the fact that every value of a foreign key must exist as a primary key value in the referenced table.
Summarize the reasons for creating customized duplicated tables.
Customized duplicate tables are used to create more efficient applications when variations of an application are needed for specialized purposes. Since read-only database tables can be very large, an application may be slowed down by dealing with unnecessary columns. Although this creates a lot of duplicate data, this is not a problem since the data is not updated by ongoing operations.
Why are customized duplicated tables not used for updateable databases?
Customized duplicated tables are not used for updateable, operational databases because the data duplication would make the databases subject to modification anomalies.
What is 5NF?
5NF, also known as Project-Join Normal Form (PJ/NF) is used when we need to correct for a condition where a table in 4NF can be split apart but not joined back together. It is rare, and generally a table in 4NF is in 5NF.
If data duplication is only reduced, how can we say that the possibility of data inconsistencies has been eliminated?
Data inconsistencies are initially eliminated by normalization. Data duplication in foreign keys cannot cause data inconsistencies because referential integrity constraints prohibit such inconsistencies as long as the constraints are enforced.
Show SQL statements to count the number of rows and to list the top 15 rows of the RETAIL_ORDER table.
The SQL statement is: SELECT TOP 15 * FROM RETAIL_ORDER;
What is denormalization?
Denormalization is the intentional design of a database in a non-fully normalized form. This means that the relations are not in BCNF form—multivalued dependencies, however, should always be put into their own relation.
Foreign key
The attribute (or set of attributes) that is a primary key in one table that is then placed in a second table to form a relationship to the first table by storing linking values. The term foreign key refers to the attribute or set of attributes in the second table.
Determinant
The attribute or set of attributes that functionally determine the value of another attribute or set of attributes. The "left-hand side" of a functional dependency. For example, in the functional dependency A B, A is the determinant.
What is the basic premise of chapter 3?
The basic premise is that when we receive one or more tables from a source to be included in a new database, we should consider normalizing the tables to remove insertion, deletion, and modification anomalies.
Primary key
The candidate key selected to be the "official" key of a relationship.
Give an example of a multivalue, multicolumn table other than one discussed in this chapter. Consider the following table from a real estate database: PROPERTY (PropertyID, Street, City, State, ZIP, FirstUnitNumber, SecondUnitNumber, ThirdUnitNumber, {Other Columns})
The column UnitNumber refers to an apartment or similar number.
What is the best test for determining whether a determinant is unique?
Since sample data is not conclusive, the best test for determining whether a determinant is unique is to logically determine the functional dependencies by examining the business structure and by asking users about the relationships between attributes.
Explain the role of referential integrity constraints in normalization.
Since the referential integrity constraint requires that a value of the foreign key exist as a primary key value, we will maintain consistency between the two relations. We are protected against update anomalies being created when we create the new relation and move the existing data into it, and against insertion anomalies being created when we add data to the new relation. We typically create a new referential integrity constraint after creating the new table and moving data to it.
rows,___,___
tuples, records
table,___,____
relation, file
Describe the characteristics of a table that make it a relation.
A. The rows must contain data about an entity B. The columns must contain data about the attributes of the entity instances C. All entries in a column must be the same data type D. Each column must have a unique name E. Each cell in the table must hold only a single value F. The order of the columns must be unimportant G. The order of the rows must be unimportant H. The data in each row must be unique - no two rows may contain identical data
Why is an un-normalized relation like a paragraph with multiple themes?
According to the rules of English composition, paragraphs should only have one theme. If you have written a paragraph with more than one theme, it should be broken into a separate paragraph for each of them. In an un-normalized relation, a theme is similar to a set of functional dependencies, each with its own determinant. If there are determinants that are not candidate keys, then there is more than one "theme" in the relation, and the extra "themes" should be put in their own relations ("paragraphs").
According to this text, under what situations should you choose not to remove multivalued dependencies from a relation?
According to this text, multivalued dependencies should always be removed from a relation and put into a separate relation.
Explain the difference between functional dependencies that arise from equations and those that do not.
All functional dependencies document the fact that the knowledge of the value of one or more attributes can be used to uniquely determine the value of one or more associated attributes. Functional dependencies may or may not be based on a quantitative basis. An example of a non-quantitative functional dependency is FullName HomeAddress. This type of functional dependency does not involve an equation. Equations, however, can also form the basis of a functional dependency, and can be translated into functional dependencies by placing the independent variables (those whose values are free to change) in the determinant attributes, and the dependent variable (the result of the calculation) as the functionally determined attribute. For example, the equation (TotalBeforeTax * TaxRate) = Tax can be expressed as the functional dependency (TotalBeforeTax, TaxRate) Tax.
Explain why duplicated data leads to data integrity problems.
All occurrences of duplicated data must be maintained - if updated, all occurrences must be updated, and if deleted, all occurrences must be deleted. If this is not done, data integrity problems occur. Unfortunately, duplicated data lends itself to update anomalies and therefore data integrity problems.
For read-only databases, how persuasive is the argument that normalization reduces file space?
Although a read-only database may require more file space because they are denormalized, this is not an issue today because file space is cheap and therefore the cost of storage is minimal.
Multivalued dependency
An anomaly that occurs in a relation when a determinant is matched to a set of values rather than a single value. For example, the multivalued dependency Person Sibling means that a person may have more than one sibling. This is not a problem if the multivalued dependency exists in a separate table, but can cause a problem if the multivalued dependency exists within a table with other attributes.
Candidate key
An attribute or set of attributes that uniquely determines the values of all the other attributes for a row in a relation.
What relations are in 1NF?
Any relation is, by definition, in 1NF. Any table that meets the conditions to be a relation as shown in Figure 3-4 is defined by E.F. Codd to be in 1NF.
Summarize the three categories of normalization theory.
As shown in Figure 3-12, the three categories of normalization theory are (1) anomalies associated with functional dependencies, (2) anomalies associated with multivalued dependencies, and (3) anomalies associated with data constraints and odd conditions.
columns, ___,___
attributes, fields
If A -> (B, C), then can we also say that A -> B?
Yes, A determines B and also A determines C.
Composite key
A key that contains two or more attributes.
What is a normal form?
A normal form is one category in a set of categories ("normal forms") used to describe relations according to the type of anomalies that can occur in the relations.
What are three interpretations of null values? Use an example in your answer.
A null value can represent one of three conditions: (1) the value is inappropriate, or (2) the value is appropriate but not known, or (3) the value is appropriate and known, but not entered. Consider the following table from a real estate database: PROPERTY (PropertyID, Street, UnitNumber, City, State, ZIP, . . . ) The column UnitNumber refers to an apartment or similar number. Condition (1): The PROPERTY does not have units (a house, for example), and the value is inappropriate. Condition (2): The PROPERTY has units, but we do not know what they are. Condition (3): The PROPERTY has units, we know what they are, but they have not been entered.
What is a null value?
A null value is a missing value—a value of an attribute that was never provided.
What is a referential integrity constraint? Define the term, and give an example of its use. Are null values allowed in foreign key columns with a referential integrity constraint? How does the referential integrity constraint contribute to database integrity?
A referential integrity constraint is a value constraint on a foreign key that states that no value can be placed in the foreign key unless it already exists as a primary key value in the linked table. Technically, null values are allowed in some cases, especially when using recursive relationships. The referential integrity constraint helps ensure database integrity by maintaining valid links between linked tables.
What conditions are required for a relation to be in 2NF?
A relation is in 2NF if and only if (1) it is in 1NF and (2) all non-key attributes are determined by the entire primary key.
What conditions are required for a relation to be in 3NF?
A relation is in 3NF if and only if (1) it is in 2NF and (2) there are no non-key attributes determined by another non-key attribute.
What conditions are required for a relation to be in BCNF?
A relation is in BCNF if (1) it is in 3NF and (2) every determinant is a candidate key.
Functional dependency
A relationship between attributes in a relation where the value (or values) of one (or more) attributes determines the value (or values) of another attribute (or set of attributes). For example, the functional dependency A B expresses the fact that if we know a value of A, we will always know (we can lookup) the corresponding value of B.
What is a surrogate key?
A surrogate key is a column of artificial data added to a relation to serve as the primary key.
When would you use a surrogate key?
A surrogate key is used when the non-surrogate primary key would be too large and unwieldy.
Where does the value of a surrogate key come from?
A surrogate key value is normally assigned by the DBMS when a new row is created
Relation
A table-like structure of rows and columns, where the rows store data about an entity and the columns store data about the attributes of that entity. Each column has a unique name, and all values in a column are for the same attribute. The cells in the table can only hold a single value. The order of the rows doesn't matter, and neither does the order of the columns. The data in each row as a whole must be unique.
Referential integrity constraint
A value constraint of a foreign key that states that no value can be placed in the foreign key unless it already exists as a primary key value in the linked table.
How do the conditions for DK/NF correspond to the conditions for BCNF?
For our purposes, they are the same: every determinant must be a functional dependency. If a table is in BCNF, we will consider it to be in DK/NF as well.
List four common design problems when creating databases from existing data.
Four common design problems when creating databases from existing data are (1) the multivalue, multicolumn problem, (2) the inconsistent values problem, (3) the missing values problem, and (4) the general-purpose remarks column problem.
If a relation is in BCNF, what can we say about it with regard to 2NF and 3NF?
If a relation is in BCNF, it is also in 2NF and 3NF.
Explain the following statement: "The only reason for having relations is to store instances of functional dependencies."
If functional dependencies were always based on equations, we could calculate the results of the equation based on the appropriate input values when we needed the result. But since functional dependencies store information that cannot be calculated, we have to have some place to store the related data for future use. Relations are used for that storage.
Explain the meaning of the expression: (FirstName, LastName) -> Phone
If you tell me a person's first name and last name, I can always, unambiguously tell you their phone number.
Intuitively, what is the meaning of the functional dependency: PartNumber -> PartWeight
If you tell me the PartNumber, I can always, unambiguously tell you a specific PartWeight.
How does a multivalued dependency differ from a functional dependency?
In a functional dependency, a determinant determines one value of each of the attributes associated with it. In a multivalued dependency, a determinant determines a set of values of one or more attributes.
Describe the general-purpose remarks column problem.
In the general-purpose remarks problem, columns with names like Comments, Notes, and Remarks may hold important data that are stored in an inconsistent, verbal and verbose manner.
Explain the following statement: "The multivalue, multicolumn problem is just another form of multivalued dependency." Show how this is so.
In the multivalue, multicolumn problem, multiple values of an attribute are placed in different columns in one row, while in a multivalued dependency, multiple values of an attribute are placed in one column in different rows. In both cases, we are dealing with multiple values of an attribute and trying to find a place to store those values.
Why are inconsistent values in foreign keys particularly troublesome?
Inconsistent values in foreign keys are particularly troublesome because relationships will be missing or wrong when the foreign key value does not match the intended primary key value.
Explain ways in which inconsistent values arise.
Inconsistent values occur (1) when different users code entries that should be identical in differing ways, and (2) when users misspell entries.
Summarize the reasons explained in this chapter for not placing ZIP code values into BCNF.
ZIP code values are not put in BCNF because (1) the ZIP code assigned to a city and state almost never changes, and (2) if there is a data inconsistency involving a ZIP code in a denormalized table, normal business practices will quickly detect it and it will be corrected.
f (A, B) -> C, then can we also say that A -> C?
No, A alone does not determine the value of C.
If it is true that: PartNumber -> PartWeight does that mean that PartNumber will be unique in a relation?
No, but it does mean that the same value of PartNumber will always determine the same value of PartWeight.
If a determinant is part of a candidate key, is that good enough for BCNF?
No, the determinant must be in exactly the same columns as the candidate key. To wit, "I swear to construct my tables so that all nonkey columns are dependent on the key, the whole key, and nothing but the key, so help me Codd."
Which normal forms are concerned with functional dependencies?
Normal forms 2NF through Boyce-Codd Normal Form (BCNF) are concerned with functional dependencies, particularly modification anomalies from functional dependencies.
Describe two advantages of normalized tables.
Normalized tables have the advantages of (1) eliminating modification anomalies, and (2) reducing data duplication.
Normal form
One category in a set of categories ("normal forms") used to describe relations according to the type of anomalies that can occur in the relations.
How does the fact that a read-only database is never updated influence the reasons for normalization?
The fact that read-only databases will never be updated through normal operations (although they may be updated via a new data extract from an operational database) means that normalization is seldom an advantage. Instead the tables should be denormalized to meet the needs of the intended use of the database.
What normal form is concerned with multivalued dependencies?
The normal form concerned with multivalued dependencies is 4NF.
What is the premise of Fagin's work on DK/NF?
The premise of Fagin's work on DK/NF is the question "What conditions need to exist for a relation to have no anomalies?"
Summarize the premise of chapter 4.
The premise of this chapter is that we have received—from some source—one or more tables of data that need to be stored in a new database.
The term domestic key is not used. If it were used, however, what do you think it would mean?
The primary key in the relation that the foreign key links to.
What is a primary key? Explain the significance of the entity integrity constraint to a primary key.
The primary key is the candidate key that has been selected from the set of all candidate keys to be the "official" key of a relationship. The entity integrity constraint enforces the fact that every primary key must have a value.
Describe three uses for a read-only database.
Three uses for a read-only database are (1) querying, (2) reporting, and (3) data mining.
Describe two disadvantages of normalized tables.
Two disadvantages of normalized tables are that (1) more complicated SQL may be required for multitable subqueries and joins, and that (2) the extra work the DBMS has to do to run these SQL statements may result in slower application performance.
Describe two ways to identify inconsistent values. Are these techniques certain to find all inconsistent values? What other step can be taken?
Two ways to identify inconsistent values are (1) to check referential integrity, and (2) to use the GROUP BY clause on a suspected column. However, neither of these techniques will find all inconsistent values. The final step is to read the data.
Why should one be wary of general-purpose remarks columns?
We should never use a general-purpose remarks column in our own designs because we know what kinds of problems they can cause. If we are working on an existing database, we should be wary of, and check for, general-purpose remarks columns because we will have to discover and sort out all the attributes and values that are hidden in those columns, create new columns for the attributes, and extract and store the data in those columns. This is a labor-intensive process and cannot be automated.
Summarize how database design principles differ with regards to the design of updateable databases and the design of read-only databases.
When designing updateable databases we need to be concerned about modification anomalies and inconsistent data. This means that normalization principles must be carefully considered, and that the tables generally be fully normalized although some denormalization may be justified. When designing read-only databases, we are working with tables that will never be updated through normal operations (although they may be updated via a new data extract from an operational database). Since these databases need to be optimized for querying, reporting and data mining operations, normalization is seldom an advantage. Instead the tables should be denormalized to meet the needs of the intended use of the database.
Give an example in which the general-purpose remarks column makes it difficult to obtain values for a foreign key. The general-purpose remarks column makes it difficult to obtain values of a foreign key when the values that should be stored in the foreign key itself are stored in the remarks column. For example, consider the two tables from a real estate database: CLIENT (ClientID, FirstName, LastName, . . . , PropertyTypeSought) PROPERTY_TYPE (PropertyType, TypeDefinition, . . . ) CLIENT.PropertyTypeSought should be a properly defined foreign key in order to create the relationship with PROPERTY_TYPE (linking to PROPERTY_TYPE.PropertyType). If, on the other hand, the CLIENT table is structured as: CLIENT (ClientID, FirstName, LastName, . . . , Remarks)
Where the Remarks include phrases like "Is looking for a 2-bedroom apartment," the foreign key value of "2-bedroom apartment" cannot be used as a linking value.
Explain the difference between a candidate key and a primary key.
While there can be several candidate keys, any of which could be the primary key, there is only one primary key actually used to be the "official" key of a relationship.
Compare the difficulty of writing subqueries and joins with the difficulty of dealing with anomalies caused by multivalued dependencies.
Writing subqueries and joins is easy compared to the complex code that must be written to deal with anomalies caused by multivalued dependencies.