NORMALIZATION
Unnormalized tables may suffer from...
- Less efficient updates because of larger tables - more cumbersome (larger and difficult to use) indexing
Business Rules (data modeling checklist)
- Properly document and verify all business rules with the end users. - Ensure that all business rules are written precisely, clearly, and simply. The business rules must help identify entities, attributes, relationships, and constraints. - Identify the source of all business rules, and ensure that each business rule is justified, dated, and signed off by an approving authority.
Data modeling: ENTITIES
- each entity should represent a single subject - each entity should represent a set of distinguishable entity instances. - All entries should be in 3NF or higher. Any entities below 3NF should be justified. - the granularity of the entity instance should be clearly defined. - the PK should be clearly defined and support the selected data granularity.
Data modeling: ATTRIBUTES
- should be simple and single-valued (atomic data) - should document default values, constraints, synonyms, and aliases - derived attributes should be clearly identified and include source(s) - should not be redundant unless this is required for transaction accuracy, performance, or maintaining a history - nonkey attributes must be fully dependent on the PK attribute
Data modeling: ER MODEL
- should be validated against expected processes: insets, updates, and deletions - should evaluate where, when, and how to maintain a history - should not contain redundant relationships except as required (see attributes) - should minimize data redundancy to ensure single-place updates - should conform to the minimal data rule: All that is needed is there, and all that is there is needed.
Data modeling: RELATIONSHIP
- should clearly identity relationship participants - should clearly define participation, connectivity, and document cardinality
Second normal form (2NF)
1NF, no partial dependencies Partial: existing only in part; incomplete. In the case of 2 or more primary keys, partial dependency might occur. This means a column (not a primary key) will only depend on one of (or part of) the primary keys (prime attribute). subject_id + student_id are prime attributes and the crn will only depend on the subject_id, not the student_id.
Third normal form (3NF)
2NF, no transitive dependencies Transitive is when a non-prime will depend on a non-prime that will depend on a prime (or full primary key). number of classes -> crn -> subject_id
Fourth normal form (4NF)
3NF, no independent multivalued dependencies. multi-valued dependency: A full constraint between two sets of attributes in a relation. **meaning you have to have 3 columns, at least one has to be the prime attribute. A, B, C if A is a prime attribute and B and C are dependent of A, but have no relation (independent) functional dependency: the multi-valued dependency requires that certain tuples be present in a relation.
Database Normalization
Evaluating and correcting table structures. - To minimize data redundancies - To reduce data anomalies - To assign attributes to tables based on determination
Naming Conventions:
All names should be limited in length (database dependent size)
Boyce-Codd normal form (BCNF)
Every determinant is a candidate key (special case of 3NF). determinant: a factor that decisively affects the nature or outcome of something. Candidate Key: a column, or set of columns, in a table that can uniquely identify any database record without referring to any other data. - each table may have one or more candidate keys - one candidate key is unique, the Primary Key. When a key is composed of more than one column, it is known as a composite key. A -> B B is dependent on A. B cannot exist without A. class -> major. this can't work because it should be the other way around. the major is not dependent on the class. major -> class. class should be dependent on the major.
Anomalies
Something that deviates from what is standard, normal, or expected. ANYTHING NOT NORMAL FORM.
First normal form (1NF)
Table format, no repeating groups, primary key (PK) identified. Cannot have 2 sets of data in one column. Not all the time will there will only be ONE primary key. Sometimes there are multiple.
Normalization Objectives
To ensure that each table conforms to the concept of well-formed relations. - Each table represents a single object. - No data item will be unnecessarily stores in more than one table. i.e. "one fact in one place" - All nonprime attributes in a table are dependent on the primary key. - Each table is void of insertion, update, and deletion anomalies.
Business Data Model (BDM)
a conceptual data model that specifies the third-normal-form They check boxes when you put in attributes (PK, NN, AI)
Relationship names:
should be active or passive verbs that clearly indicate the nature of the relationship.
What does Normalization result in?
the number of tables with corresponding increase in the number of foreign key relationships between tables known as joins.
Entity names
- should be nouns that are familiar to business and should be short and meaningful - should document abbreviations, synonyms, and aliases for each entity - should be unique within the model - for composite entities, may include a combination of abbreviated names of the entities linked through the composite entity.
Attribute names:
- should be unique within the entity - should use the entity abbreviation as a prefix - should be descriptive of the characteristic - should use suffixes such as _ID, _NUM, or _CODE for the PK attribute - should not be reserved work - should not contain spaces or special characters such as @, !, &
Maintaining data redundancies must be...
(a database design may decide to keep certain redundancies) - consciously designed and not by accident - justifiable
What are characteristics of well-formed relations?
- Each table represents a single object. - No data item will be unnecessarily stores in more than one table. i.e. "one fact in one place" - All nonprime attributes in a table are dependent on the primary key. - Each table is void of insertion, update, and deletion anomalies.
Normalization Process
- Ensures that all tables are in at least 3NF - Higher forms are not likely to be encountered in business environment - Works one relation at a time. - May need to decompose tables into sub-tables - Starts by: - Identifying the dependencies and relations between attributes. - Progressively breaking the relation into new set of relations.
Normalization includes normal forms...
- First normal form (1NF) - Second normal form (2NF) - Third normal form (3NF) - Boyce-Codd normal form (BCNF) - Fourth normal form (4NF) THE HIGHER NORMAL FORMS THE BETTER.