Chapter 4: Normalization
Normal form
- A state of a relation that requires that certain rules regarding relationships between attributes (or functional dependencies) are satisfied
Non-redundancy
- No attribute I the key can be deleted without destroying the property of unique identification
Enterprise key
- a primary key whose value is unique across all relations 1. Makes a primary key more like what in object-oriented databases is called a object identifier 2. With this recommendation, the primary key of a relation becomes a value internal to the database system and has no business meaning
Homonym
- an attribute that may have more than one meaning
Candidate keys
- an attribute, or combination of attributes, that uniquely identifies a row in a relation
Unique identification
- for every row, the value of the key must uniquely identify that row. This property implies that each nonkey attribute is functionally dependent on that key
Normalization
- the process of decomposing relations with anomalies to produce smaller, well-structures relations (process of successively reducing relations with anomalies to produce smaller relations..
Synonyms
- two (or more) attributes that have different names but the same meaning (alias)
Transitive dependencies
- when two 3NF relations are merged to form a single relation, transitive dependencies may result
Two major benefits from using normalization
1. During logical database design (described in this chapter) You should use normalization concepts as a quality check for the relations that are obtained from mapping E-R diagrams 2. When reverse-engineering older systems 3. Transform, normalize, merge
Steps in Normalization
1. First normal form- any multivalued attributes(also called repeating groups) have been removed, so there is a single value(possible null) at intersection of row and column 2. Second normal form- any partial functional dependences have been removed (nonkey attributes are identified by only primary keys) 3. Third Normal Form- any transitive dependencies have been removed(nonkey attributes are identified by only primary keys) 4. Boyce-Codd normal form- any reaming anomalies that result from functional dependencies have been removed (because there was more than one posiable primary key for the same nonkeys 5. Fourth Normal form- any multivalued dependences have been removed 6. Fifth normal form- any remaining anomalies have been removed
Goals of normalization
1. Minimize data redundancy, thereby avoiding anomalies and conserving storage space 2. Simplify the enforcement of referential integrity constraints 3. Make it easier to maintain data (insert, update, and delete) 4. Provide a better design that is an improved representation of the real world and a stronger basis for future growth
Outcomes for each form
1. Remove M.V. A. 2. Remove P.F.D. 3. Remove transitive 4. Boyce-Codd 5. Remove M.V. D 6. Remove Anomalies 1NF-repeating groups, 2NF- functional dependencies, 3NF- transitive dependency
Determinant
: the attribute on the left side of the arrow in a functional dependency
Functional dependencies and Keys
a constraint between two attributes in which the value of one attribute is determined by the value of another attribute
Merging Relations view integration
a. Part of the logical design process, normalized relations may have been created from a number of separate EERDs and possibly other user views, there may be bottom-up or parallel database development activities for different areas of the organization as well as top-dawn ones b. The result is that some of the relations generated from these various processes may be redundant; that is, they may refer to the same entities. In such cases, we should merge those relations to remove the redundancy
Understanding how to merge relations is important for three reasons:
i. On large projects, the work of several sub-teams comes together during logical design, so there is often a need to merge relations ii. Integrating existing databases with new information requirements often leads to the need to integrate different views iii. New data requirements may arise during the life cycle, so there is a need to merge any new relations with what has already been developed
Selecting identifiers
they do not change values over time and must be unique and known, not nonitelliegent, and use a single attribute surrogate for composite identifier