DBMS Chapter 3 Data Normalization

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Database Normalization Rules

1. Each cell in a table contains atomic (single-valued) data. 2. Each non-key column depends on all of the primary key columns (not just some of the columns). 3. Each non-key column depends on nothing outside of the key columns.

One-to-Many Relationships

...

hidden dependency

A hidden dependency arises when there is a functional rule that is not part of the primary key.

Keys

A key K (sometimes called candidate key) is a set of attributes (1) With FD K→ U where U is all other attributes in the relation (2) If K' is a subset of K, then there is no FD K' → U A set of key attributes functionally determines all other attributes in the relation, and it is the smallest set of attributes that will do so (there is no smaller subset of K that determines the columns.)

Definition: Keys

A key is a set of attributes K such that, where U is the set of all attributes in the relation, 1. There is a functional dependency K → U. 2. If K' is a subset of K, then there is no FD K' → U. That is, a set of key attributes K functionally determines all other attributes in the relation, and it is the smallest set of attributes that will do so (there is no smaller subset of K that determines the other attributes). Primary keys are important in relational databases because they are used to identify rows of data. Sometimes multiple attribute sets could be used to form dif- ferent keys, so they are sometimes referred to as candidate keys.

Definition: Multi-Valued Dependency (MVD)

A multi-valued dependency (MVD) exists when there are at least three attributes in a relation (A, B, and C; which could be sets of attributes), and one attribute A determines the other two (B and C), but the two dependencies are independent of each other. That is, A → B and A → C, but B and C are not functionally dependent on each other. For example, employees can have many specialties and be assigned many tools, but tools and specialties are not directly related to each other.

Definition: Boyce-Codd Normal Form (BCNF)

A relation is in Boyce-Codd normal form if and only if it is in third normal form and every determinant is a candidate key. That is, if there is an FD X → Y, then X must be the primary key (or equivalent to the primary key). In simpler terms: there cannot be a hidden dependency, where hidden means it is not part of the primary key. As shown in the example in Figure 3.5A, consider the situation where employees can have many specialties, there are many employees for each specialty, and an employee can have many managers, but each manager is manager for only one specialty. This functional dependency (MangerID → Specialty) is not a key with- in the relation EmpSpecMgr(EID, Specialty, ManagerID), so the relation is not in BCNF. It has to be decomposed to create new relations ManagerSpecialty(Manag erID, Specialty), and EmployeeManager(EmployeeID, ManagerID) that explicitly have each functional dependency as keys.

First Normal Form

A relation is in first normal form (1NF) if and only if all attributes are atomic. Example: Customer(CID, Name: First + Last, Phones, Address)

Definition: First Normal Form (1NF)

A relation is in first normal form if and only if all of its attributes are atomic.

Definition: Fourth Normal Form (4NF)

A relation is in fourth normal form if and only if it is in Boyce-Codd normal form and there are no multi-valued dependencies. That is, all attributes of the relation are functionally dependent on A. In the multi-valued dependency example for employee specialties and tools, the relation EmpSpecTools(EID, Specialty, ToolID) is not in fourth normal form, because of the two functional dependencies: EID → Specialty; and EID → Tool- ID. Solving the problem results in two simpler relations: EmployeeSpecialty(EID, Specialty) and EmployeeTools(EID, ToolID).

Definition: Second Normal Form (2NF)

A relation is in second normal form if it is in first normal form and each nonkey attribute is fully functionally dependent on the primary key. That is, K → Ai for each nonkey attribute Ai. Consequently, there is no subset K' such that K' → Ai for any attribute. This definition corresponds closely to the simpler version presented in the chapter that each nonkey column depends on the entire key, not just a portion of the key. Figure 3.3A shows an example of a relation that is not in second normal form. The formal definition of third normal form is a little harder to comprehend be- cause it relies on a new concept: transitive dependency.

Definition: Third Normal Form (3NF)

A relation is in third normal form if and only if it is in second normal form and no nonkey attributes are transitively dependent on the primary key. That is, given second normal form: K → Ai for each attribute Ai, there is no subset of attributes X such that K → X → Ai. In simpler terms, each non-key attribute depends on the entire key (K), and not on some intermediate attribute (X). Figure 3.4A shows a common business example of a relation that is not in third normal form, because customer attributes depend transitively on the CustomerID. As discussed in Chapter 3, Boyce-Codd normal form is a little harder to follow. It represents the same basic issue: removing a hidden dependency as seen by the formal definition.

How does a database record constraints?

A second type of constraint is to choose data from a set of predefined options. Fo

Relation

A set of attributes with data that changes over time. Often denoted R (Table)

Definition: Atomic Attributes

Atomic attributes are single valued, which means they cannot be composite, multi-valued, or nested relations. Essentially, a 1NF relation is a table with simple cells under each attribute col- umn. You are not allowed to play tricks and try to squeeze extra data, other rela- tionships, or multiple columns into one column. Figure 3.2A provides an example of a table that is not in first normal form because it has two attributes that are not atomic.

Data Volume and Usage

Business rules • Three year retention. • 1000 customers. • Average 10 orders per customer per year. • Average 5 items per order.

Attribute

Characteristic with a real-world domain. Subsets of attributes are multiple columns, often denoted X or Y. (column)

Schema

Collection of tables and constraints/relationships (Data Dictionary)

Functional Dependency (details)

Derives from a real-world relationship/constraint. Denoted X→ Y for sets of attributes X and Y Holds when any rows of data that have identical values for X attributes also have identical values for their Y attributes: If t1[X] = t2[X], then t1[Y] = t2[Y] X is also known as a determinant if X is non-trivial (not a subset of Y).

Definition: Transitive Dependency

Given functional dependencies X → Y and Y → Z, the transitive dependency X → Z must also hold. The concept of transitivity should be familiar from basic algebra. The fact that it holds true arises from the set-theory foundations. To understand the definition, remember that functional dependency represents business semantic relationships. Consider the relationship between OrderID, CustomerID, and customer Name at- tributes. The business rule that there is only one customer per order translates to a functional dependency OrderID → CustomerID. Once you know the OrderID value you always know the CustomerID value. Likewise, the key relationship be- tween CustomerID and other attributes such as Name means there is a functional dependency CustomerID → Name. Applying transitivity, once you know the Or- derID value, you can obtain the CustomerID value, and in turn learn the value of the customer Name.

Transitive Dependency

Given functional dependencies: X → Y and Y → Z, the transitive dependency X → Z must also hold

repeating groups

If a group of items repeats, it should be split into a new table.

Dependency

If one attribute always identifies a specific value for another attribute, the second attribute is said to depend on the first.

second normal form (2NF)

K → Ai for each non-key attribute Ai That is, there is no subset K' such that K' → Ai A relation is in second normal form (2NF) if it is in 1NF and each nonkey attribute is fully functionally dependent on the primary key. Example: OrderProduct(OrderID, ProductID, Quantity, Description)

Checking Your Work (Quality Control)

Look for one-to-many relationships. Many side should be keyed (underlined). e.g., VideosRented(TransID, VideoID, . . .). Check each column and ask if it should be 1 : 1 or 1: M. If add a key, renormalize. Verify no repeating sections (1NF) Check 3NF Check each column and ask: Does it depend on the whole key and nothing but the key? Verify that the tables can be reconnected (joined) to form the original tables (draw lines). Each table represents one object. Enter sample data--look for replication.

Many-to-Many Relationships

Many-to-many relationships occur when each record in TableA may have 0, 1 or many linked records in TableB and vice-versa.

cascading delete,

Most relational databases also support cascading delete, which uses the same concepts. If a user deletes a row in the Customer table, you also need to delete the related entries in the Sale table. Then you need to delete the corresponding rows in the SaleItems table. If you build the relationships and specify cascade on delete, the database will automatically delete the related rows when a user deletes an entry in the Customer table. These actions maintain the consistency of the database by ensuring that links between the tables always refer to legitimate rows.

Why Normalization?

Need standardized data definition Advantages of DBMS require careful design Define data correctly and the rest is much easier It especially makes it easier to expand database later Method applies to most models and most DBMS Similar to Entity-Relationship Similar to Objects (without inheritance and methods) Goal: Define tables carefully Save space Minimize redundancy Protect data

pseudo column

On some DBMSs, you can create a pseudo column to define a computed value. This value is not actually stored, but recomputed as needed.

insertion anomaly, deletion anomaly

Several other problems arise because of this weak design. What do you know about products that have not been sold yet? Conversely, what if you delete old data, such as all of last year's sales? As you delete sales, you also delete item and customer data. Suddenly, you notice that you deleted half of the customer base. Technically, these problems are known as an insertion anomaly and a deletion anomaly; that is, when the data is not stored in a proper format, you encounter difficulties as you try to add or delete data. These problems arise because you tried to store all the data in one table.

Data Rules and Integrity

Simple business rules Limits on data ranges Choosing from a set Referential Integrity Foreign key values in one table must exist in the master table.

3NF Rules/Procedure

Split out repeating sections Be sure to include a key from the parent section in the new piece so the two parts can be recombined. Verify that the keys are correct Is each row uniquely identified by the primary key? Are one-to-many and many-to-many relationships correct? Check "many" for keyed columns and "one" for non-key columns. Make sure that each non-key column depends on the whole key and nothing but the key. No hidden dependencies.

Multi-Valued Dependency

That is, A →B and A → C but B and C have no FDs A multi-valued dependency (MVD) exists when there are at least three attributes in a relation (A, B, and C; and they could be sets), and one attribute (A) determines the other two (B and C) but the other two are independent of each other.

Third Normal Form

That is, K → Ai for each attribute, (2NF) and There is no subset of attributes X such that K → X → Ai A relation is in third normal form if and only if it is in 2NF and no nonkey attributes are transitively dependent on the primary key. Example: Order(OrderID, OrderDate, CustomerID, Name, Phone)

Referential integrity

The CustomerID in the Sale table is a foreign key in that table, and the constraint you need is known as referential integrity. Referential integrity exists when a value for a foreign key can be entered only if the corresponding value al- ready exists in the originating table.

Tuple

The data values returned for specific attribute sets are often denoted as t[X] (Row of data)

No Hidden Dependencies

The simple normalization rules: Remove repeating sections Each non-key column must depend on the whole key and nothing but the key. There must be no hidden dependencies. Solution: Split the table. Make sure you can rejoin the two pieces to recreate the original data relationships. For some hidden dependencies within keys, double-check the business assumption to be sure that it is realistic. Sometimes you are better off with a more flexible assumptio

data integrity

These data integrity constraints are easy to assign in most databases. Typically, you can go to the table definitions and add the simple constraints along with a message. The advantage of storing these constraints with the tables is that the DBMS en- forces the conditions for every operation on the table, regardless of the source or method of data entry. No programming is necessary, and the constraint is stored in one location. If you need to change the condition, it is readily accessible (to authorized users).

surrogate keys

When creating primary keys in the database design it may be inefficient (more indexes) to use the composite attributes specified in the E-R model. In this case a *meaningless* but unique identifier can be used to represent each row, a surrogate. Often these values are immutable and some SQL implementations allow DEFAULT AUTOINCREMENT to dynamically create a new numeric key value based on the last. These help indexing and also allow the previous composite columns to be changed without having to deal with primary key concerns such as *updating indexes* and *referential constraints*.

Definition: Functional Dependency and Determinant

Where X and Y are subsets of attributes, a functional dependency is denoted as X → Y, read as (X implies Y or X determines Y) and holds when any rows of data that have identical values for the X attributes always have identical values for the Y attributes. That is, for tuples t1 and t2 of R, if t1[X] = t2[X], then t1[Y] = t2[Y]. In an FD, X is also known as a determinant, because given the dependency, once you are given the values for the X attributes, it determines the resulting values for the Y attributes.

Functional dependency

X → Y (Business rule dependency)

N-ary Associations

an association between n distinct types of things

Atomic

attributes are single valued, and cannot be composite, multivalued or nested relations.

Data Dictionary

n the process of collecting data and creating normalized tables, be sure to keep a data dictionary to record the data domains and various assumptions you make. A data dictionary or data repository consists of metadata, which is data that describes the data stored in the database. It typically lists all of the tables, columns, data domains, and assumptions. It is possible to store this data in a notebook, but it is easier to organize if it is stored on a computer. Some designers create a separate database to track the underlying project data. Specialized computer tools known as computer- aided software engineering (CASE) tools help with software design. One of their strengths is the ability to create, store, and search a comprehensive data dictionary.

Notation

primary - underline foreign - italic tableName(prim, att,att,att)


Kaugnay na mga set ng pag-aaral

Computer Maintenance I Unit 12 Study Guide

View Set

Algebra: Lap 9: Graphing and Analyzing Trig Functions

View Set

Medical Word Suffix (can be classified as noun, adj, ect) MT Chap 1 & 2

View Set

EX4: ch 19 HIV/AIDS NCLEX Practice Questions

View Set

America is regarded as the world's sole superpower after the Soviet Union

View Set

STUKENT NEW MEDIA (DIGITAL) MARKETING CHAPTER QUIZZES

View Set

Course: Math 7B (2018) Unit: 1. PROBABILITY AND GRAPHING Assignment: 4. Sample Space

View Set