Chapter 4
Why is normalization useful, given that EER conversion will typically lead to a normalized set of relations?
1) During logical database design. You should use normalization concepts to verify the quality of the relations that are obtained from mapping E-R diagrams. 2) When reverse-engineering older systems. Many of the tables and user views for older systems are redundant and subject to the anomalies we describe in this chapter.
summarize 6 important properties of relations
1) Each relation (or table) in a database has a unique name. 2) An entry at the intersection of each row and column is atomic (or single valued). There can be only one value associated with each attribute on a specific row of a table; no multivalued attributes are allowed in a relation. 3) Each row is unique; no two rows in a relation can be identical. 4) Each attribute (or column) within a table has a unique name. 5) The sequence of columns (left to right) is insignificant. 6) The sequence of rows (top to bottom) is insignificant.
describe 3 types of anomalies that can arise in a table and the negative consequences of each
1) Insertion - the user should be able to enter employee data without supplying course data 2) Deletion - loss of information 3) Modification - inconsistency
List three conditions that you can apply to determine whether a relation that is in first normal form is also in second normal form.
1) The primary key consists of only one attribute. By definition, there cannot be a partial dependency in such a relation. 2) No nonkey attributes exist in the relation (thus all of the attributes in the relation are components of the primary key). There are no functional dependencies in such a relation. 3) Every nonkey attribute is functionally dependent on the full set of primary key attributes
Explain three conditions that suggest a surrogate key should be created for the primary key of a relation.
1) There is a composite primary key, as in the case of the DEPENDENT relation shown previously with the four-component primary key 2)The natural primary key (i.e., the key used in the organization and recognized in conceptual data modeling as the identifier) is inefficient. 3) the natural primary key cannot, in fact, be guaranteed to be unique over time (e.g., there could be duplicates, such as with names or titles).
demonstrate each anomaly types with an example
1) insertion - Suppose that we need to add a new employee to EMPLOYEE2. The primary key for this relation is the combination of EmpID and CourseTitle (as noted earlier). Therefore, to insert a new row, the user must supply values for both EmpID and CourseTitle (because primary key values cannot be null or nonexistent). This is an anomaly because the user should be able to enter employee data without supplying course data. 2) deletion - Suppose that the data for employee number 140 are deleted from the table. This will result in losing the information that this employee completed a course (Tax Acc) on 12/8/2015. In fact, it results in losing the information that this course had an offering that completed on that date. 3) modification - Suppose that employee number 100 gets a salary increase. We must record the increase in each of the rows for that employee (two occurrences in Figure 4-2); otherwise, the data will be inconsistent.
functional dependency
A constraint between two attributes in which the value of one attribute is determined by the value of another attribute
recursive foreign key
A foreign key in a relation that references the primary key values of the same function
transitive dependency
A functional dependency between the primary key and one or more nonkey attributes that are dependent on the primary key via another nonkey attribute
partial functional dependency
A functional dependency in which one or more monkey attributes are functionally dependent on part (but not all) of the primary key
relation
A named, two-dimensional table of data. Consists of a set of named columns and an arbitrary number of unnamed rows. An attribute is a named column of a relation. Each row corresponds to a record that contains attribute values for a single entity.
enterprise key
A primary key whose value is unique across all relations Main motivation for using this is for database evolvability - merging new relations into a database after it is created
well-structured relation
A relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies
Referential integrity constraint
A rule that states that either each foreign key value must match a primary key value in another relation or the foreign key value must be null.
surrogate primary key
A serial number or other system-assigned primary key for a relation
what is a well-structured relation? why are they important in logical database design?
A well-structured relation contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies. We need this for normalization. Without such, anomalies may occur when the user attempts to update data in a table.
Domain constraints
All values that appear in a column of a relation must be of the same domain (set of values that may be assigned to an attribute - name, meaning, data type, size)
primary key
An attribute or a combination of attributes that uniquely identifies each row in a relation
anomaly
An error or inconsistency that may result when a user attempts to update a table that contains redundant data. The three types of anomalies are insertion, deletion, and modification anomalies.
How do you represent an associative entity in a relational data model?
Basically same as doing a many to many. Create 3 relations: one for each of the two participating entity types and a third for the associate entity. If identifier not assigned to associative entity, the default PK for the associative relation is a composite key that consists of the two primary key attributes from the other two relations. If an identifier is assigned, the two primary keys of the entity types are instead included as foreign keys.
primary differences between conceptual and logical data models
Conceptual data modeling is about understanding the organization-getting the requirements right. (modeling the real-world domain) Logical database design is about creating stable database structures- correctly expressing the requirements in a technical language. (representing the data items within the domain in a way that can be implemented with a DBMS)
How do you represent an M:N ternary relationship in a relational data model?
Create a new associative relation. Pk consists of three PK attributes for the participating entity types. These attributes act as foreign keys that reference the individual PK's of the participating entity types. Any attributes of the associative entity type become attributes of the new relation.
1NF 2NF 3NF
First normal form - a relation that has a primary key and in which there are no repeating groups Second normal form - a relation in first normal form in which every monkey attribute is functionally dependent on the primary key Third normal form - A relation that is in second normal form and has no transitive dependencies.
Under what conditions must a foreign key not be null?
If there is a mandatory relationship. Can be null if optional.
Entity Integrity rule
No primary key attribute may be null
What is the relationship between the primary key of a relation and the functional dependencies among all attributes within that relation?
Normalized relations have as their primary key the determinant for each of the nonkeys, and within that relation there are no other functional dependencies.
How do you represent a 1:M unary relationship in a relational data model?
Same process. But a foreign key attribute is added to the same relation- to reference the primary key values in the same relation (recursive foreign key) If was M:N unary, create two relations to represent the entity type and an associate relation. PK consists of two attributes
What are the benefits of enforcing the integrity constraints as part of the database design and implementation process (instead of doing it in application design)?
So benefit if that the constraints or limitations that are defined here can be implemented all over the database consistently. If the constraints were implemented during the application design phase, the analysts must ensure that the limitations are applied in the application code. This is a risky process.
determinant
The attribute on the left side of the arrow in a functional dependency
Explain what can be done with primary keys to eliminate key ripple effects as a database evolves.
The only way to guarantee that each primary key of a relation is unique across the database is to create an enterprise key from the very beginning so primary keys never have to change. The enterprise ket has no business meaning.
normalization
The process of decomposing relations with anomalies to produce smaller, well-structured relations
describe the most common way in which relationships in an E-R diagram are expressed in a corresponding relational data model
The process of mapping such a relationship to relations requires two steps. First, two relations are created, one for each of the participating entity types. Second, the primary key of one of the relations is included as a foreign key in the other relation.
What are the benefits of the use of a surrogate key?
The solution will ensure unique identification for each dependent. It simplifies the key structures.
what is the primary purpose of normalization
To guarantee that all anomalies are removed. Ensure good quality of relations from E-R diagrams. We decide which attributes to group together. Goals: 1) Minimize data redundancy, thereby avoiding anomalies and conserving storage space. 2) Simplify the enforcement of referential integrity constraints. 3) Make it easier to maintain data (insert, update, and delete). 4) Provide a better design that is an improved representation of the real world and a stronger basis for future growth.
composite key
a primary key that consists of more than one attribute
normal form
a state of a relation that requires that certain rules regarding relationships between attributes are satisfied
Explain how each of the following types of integrity constraints is enforced in the SQL CREATE TABLE commands: a) entity integrity b) referential integrity
a) entity integrity - when creating a table, the primary key in the table must be assured and the attributes in the table should not be null. The values added in the attribute should be valid. NOT NULL and PRIMARY KEY in the SQL CREATE TABLE command enforces entity integrity. b) referential integrity - the values in each row of a table are unique. Two tables are related to each other with the help of a foreign key. FOREIGN KEY REFERENCES enforces the integrity constraints.
Describe how the following components of an E-R diagram are transformed into relations: a) regular entity type b) relationship (1:M) c) relationship (M:N) d) relationship (supertype/subtype) e) multivalued attribute f) weak entity g) composite attribute
a) regular entity type - name of relation same as entity type. Each simple attribute becomes an attribute of the relation. The identifier becomes the primary key of the relation. b) relationship (1:M) - create a relation for each of the two entity types. Include the PK attributes of the entity on the one-side of the relationship as a foreign key in the relation that is on the many-side relationship (primary key migrates to the many side) c) relationship (M:N) - suppose have many-to-many relationship between entity types A and B. create new relation C. Include as foreign key attributes in C the primary key for each of the two participating entity types. Together are PK. Any monkey attributes that as associated with the relationship are included within the relation C. d) relationship (supertype/subtype) - 1) Create a separate relation for the supertype and for each of its subtypes. 2)Assign to the relation created for the supertype the attributes that are common to all members of the supertype, including the primary key. 3) Assign to the relation for each subtype the primary key of the supertype and only those attributes that are unique to that subtype. 4) Assign one (or more) attributes of the supertype to function as the subtype discriminator. e) multivalued attribute - two new relations (rather than one) are created. The first contains all the attributes of the entity type except the multivalued attribute. The second contains two attributes that form the primary key from the first relation, which becomes a foreign key in the second relation. The second is the multivalued attribute. Name of second relation should capture the meaning of the multivalued attribute. f) weak entity - create a new relation and include all of the simple attributes. Then include the primary key of the identifying relation as a foreign key attribute in this relation. The primary key of the new relation is the combination of this PK of the identifying relation and the partial identifier of the weak entity type. OR create a new attribute to be used as a surrogate primary key. g)composite attribute - only the simple components of the composite attribute are included in the new relation as its attributes (i.e. the things inside the parentheses)
foreign key
an attribute in a relation that serves as the primary key of another relation in the same database
describe 2 properties that each candidate key must satisfy
candidate key - an attribute to set of attributes from which a primary key can be selected. 1) unique identification - For every row, the value of the key must uniquely identify that row. This property implies that each nonkey attribute is functionally dependent on that key. 2) non-redundancy - No attribute in the key can be deleted without destroying the property of unique identification.
Briefly describe four typical problems that often arise in merging relations and common techniques for addressing those problems
normalized relations may have been created from a number of separate EER diagrams. In result, some of the relations generated from these various processes may be redundant. We merge to remove redundancy. 1) synonyms - two (or more) attributes that have different names but the same meanings (ex. Employee ID and EmployeeNo) alias - an alternative name used for an attribute 2) homonyms - an attribute that may have more than one meaning (ex. account could mean checking account or savings account) 3) transitive dependencies - when two 3NF relations merged, they may have the same primary key. these can be merged together. (ex. Student1 has PK of StudentID and Student2 has PK of Student ID. Make it: Student(StudentID, ...) 4) supertype/subtype relationships - may appear they can be merged because they have the same PK. but, for example, Patient1 and Patient2 have Pk PatientID. But there are two different types of patients - resident patients and outpatients. so create supertype/subtype relationships for these entities. (make Patient(PatientID..), ResidentPatient(PatientID...), Outpatient(PatientID...)