question 3 exam Functional Dependencies
What are our goals for relational database design?
- BCNF. - Lossless-join decompostion. - Dependency preservation. If this is not achievable, we either: - Choose BCNF and accept lack of dependency preservation, or - Choose 3NF.
What are the 7 types of normal forms?
- First Normal Form (1NF). - Second Normal Form (2NF). - Third Normal Form (3NF). - Boyce-Codd Normal form (BCNF). - Fourth Normal Form (4NF). - Fifth Normal Form (5NF). - Sixth Normal Form (6NF). Typically, 3NF (or BCNF) is sufficient.
When decomposing how important are Lossless Joins and Dependency Preservation?
- Lossless Joins are critical. - Dependency preservation is desirable but sometimes sacrificed.
What must a good DB design do?
- Make it easier for users to retrieve information. - Avoid storing redundant information (avoiding redundancy fundamentally depends on the notions of functional dependencies).
In general, given R and set of FDs F. R1, R2 is a lossless decomposition if and only if...?
- R1 ∩ R2 -> (R1 - R2) (functionally detemines the things in R1 but not in R2). or - R1 ∩ R2 -> (R2 - R1) Exist in F+. F+ is the cover of F and is the set of all implied FDs from F.
What is an insertion anomaly of 1NF (using the surgery example)?
A new patient is to be added but has not yet undergone surgery, so no surgeon #. However, since surgeon # is part oft he key, we cannot insert.
What is Normalisation Theory?
A way to ensure the above "good DB design". It is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data.
What makes an attribute prime?
An attribute is prime if it is part of the minimal candidate key (other attributes are non-prime).
How do we get 1NF?
Attributes must have atomic values (i.e., no multi-valued attributes or composite attribute). 1NF simplifies attributes.
standard form
Ax+By=C, all are integers
How do we avoid the aforementioned anomalies?
Bad relational schema will have to be decomposed into smaller relation schema. Given this decomposition, we have to be able to join these relations to collect the original information. If the decomposition is done poorly, the joining of decomposed relations can create problems in the form of spurious tuples.
What is guideline 4 for good R-RB design?
Design relations to avoid spurious tuples. When relations are to be joined, they should use keys as join attributes.
How are Function Dependencies of use here?
FDs are used to specify formal measures of the 'goodness' or relational designs. Will be used to define keys, which are used to define 'normal forms' for relations. (X -> Y states X determines a unique value of Y).
Self-Determination
For all A, A->A
Full Functional Dependency Notation
Go over this
Transitivity
If A->B and B->C, then A->C
Composition
If A->B and C->D, then AC->BD
Determinant
If A->B, A is called the
Consequent
If A->B, B is called the
Augmentation
If A->B, then AC->BC
Decomposition
If A->BC, then A->B and A->C
Reflexivity
If B is a subset of A, then A->B
What is a delete anomaly of 2NF (using the surgery example)?
If a patient receives some other drug because of a penicillin rash, and new drug and side effects are entered, we lost information that penicillin causes a rash.
What is a super key?
If a set of attributes X in relation R functionally determines all the other attributes of R, then X is a super key of R.
What is an update anomaly of 2NF (using the surgery example)?
If drug side effects change, we have to update multiple occurrences of side effects.
What is an overview of BCNF?
If every determinant (of a non-trivial FD) is a superkey.
What is a minimal key?
If no part of a superkey can be removed while keeping it as a superkey then it is a minimal key. All minimal keys are superkeys but not all superkeys are minimal keys. if some variable is never on the right hand side of a --> then it must be part of the minimal key.
Attribute B is functionally dependent on attribute A
If the value of attribute B is determined by the value of attribute A, then
What is Dependency Preservation?
If we decompose a relation R, do all the original FDs hold? Otherwise, to check updates for FD violation may require computing joins, which is an expensive operation. (In other words, have to be able to check all FDs hold using only the decomposed relations).
What does Normalisation generally involve and what are the fundamental concepts?
Involves decomposing existing tables in multiples tables. Upon each query, the tables must be re-joined, so no information is lost. The fundamental concepts are: - FDs and keys. - Lossless joins. - Dependency preservation.
No
Is Functional Dependency symmetric? (EX: If A->B then B->A)
Yes
Is functional dependency a directional property?
What is a candidate key?
It is a column or set of columns that can uniquely identify any database record without referring to any other data. It is also called a minimal superkey (as it is the minimal set of attributes necessary to identify a tuple.)
Closure of f under x
It means that we can now determine, for each set of attributes x in the database, the set x+ of attributes that are functionally determined by x
How do we get BCNF?
Most 3NF relations are already BCNF relations. The right-hand side of an FD must not be a prime attribute unless the FD is trivial. (In other words, cannot have an FD where ... -> A(PK) unless the FD is trivial).
How do we get 2NF?
Must already be in 1NF. Every non-key attribute must be fully functionally dependent on any key. No proper subset of the primary key can functionally determine a non-key attribute. 2NF improves data integrity. (Note: if any key is a single attribute in 1NF then it is automatically in 2NF).
How do we get 3NF?
Must already be in 2NF. There is no transitive functional dependency from a key to a non-key attribute. This occurs when one non-key attribute determines another non-key attribute. E.g., A(PK) -> B and B -> C gives that A(PK) -> C. Once in 3NF, it is always possible to find a dependency-preserving lossless-join decomposition.
2NF
NF: every monkey attribute is fully dependent on the key
3NF
NF: every non key attribute is non-transivitely dependent on the key
BCNF
NF: if X-> A holds in R, then X is a super key of R
What is guideline 3 for good R-DB design?
NULL values within tuples is harmful and should be avoided. However, NULL values are typically unavoidable. Attributes whose value in a relation is expected to be frequently NULL, should not be a part of that scheme, another schema should be created to accommodate for it.
What is an overview of 3NF?
Non-key attributes must only depend on the key (any candidate key).
What are the issues with Normalisation?
Normalisation requires the JOIN operation to be used, which can be very expensive for large joins. Some R-DBs uses tables that are not high in NFs.
What is guideline 2 for good R-DB design?
Redundancy of tuple information is harmful and must be avoided. For two reasons: - Storage costs: replicating same info in different places wastes space. - Consistency costs and anomalies: all replicas must be kept consistent during updates, anomalies can occur when all replicas need updating.
What is a Primary Key?
The chosen candidate key.
What is guideline 1 for good R-DB design?
The scheme and the attributes for the schema should make sense and be easy to understand. - Each tuple should represent one entity or relationship instance. - Attributes of different entities should not be in the same table. - Referring to other entities should only be done via foreign keys.
Why might creating a new project for the relation EMP_PROJ(Emp#, Proj#, Ename, Pname) cause an anomaly?
To create a new project, we need to insert a new tuple, however we cannot do this without assigning an employee to it. In general, we should be able to do this as employee assignments to projects may be done at a later date in the process.
What is the overall goal of Normalisation?
To decompose relations by: - reducing redundancy - while preserving dependencies - in a lossless-join manner.
What are spurious tuples?
Tuples formed as part of a join that did not previously exist. Usually occurs when joining two relations on a non-key attribute.
Transitive, Partial, Total
When identifying dependencies, what order do we find them in?
x intercept
a if (a,0)∈f
Superkey
a subset of the relation attributes where all tuple values must be distinct, the subset cannot be the same for two tuples
y intercept
b if (0,b)∈f
joint proportion
combo of direct and inverse proportions
Dependency Property
each functional dependency is represented in some individual relation resulting after decomposition
linear function
f={(x,y); y=ax+b, a≠0}
multiplying function
f={(x,y); y=ax} line through orgin
constant function
f={(x,y); y=k} horizontal line
linear functional dependence
n is a linear function of m, n=f°m
Normalization
process of analyzing the given relation schemes based on their FDs and primary keys t achieve the desirable properties of minimizing redundancy and minimizing the insertion, deletion, and update anomalies
Denormalization
process of storing the join of higher normal form relations as a base relation, which is in a lower normal form
Transitivity
proof - Armstrongs axiom x -> y && y -> z | x-> z
Augmentation
proof - Armstrongs axiom x -> y | xz -> y
Functional dependancy
x->y the values of the y component depend upon the value of the x component
inverse proportion
y₁ x x₁=y2 x x2, hyperbolic
direct proportion
y₁/x₁=y2/x2, multiplying
What is a lossless-join decomposition of schema R?
{R1,...,Rk} is a lossless-join decomposition of R, if R1,...,Rk can be natural joined to create a legal instance of R (the instance satisfies all FDs defined on R).
slope
∆y/∆x
What is the general solution to normalizing (fixing) relations that are not in second or third normal forms?
"Normalization through Decomposition" In much the same manner as correcting an entity class that lacks cohesion from OOAD, a relation that contains functional dependencies not covered by the relation's primary key can be decomposed into 2 or more relations, each of which have their own primary keys and are in second and third normal form.
y-intercept
"b" in y = mx + b stands for
slope
"m" in y = mx + b stands for
Summary of Normal Forms
- 1NF - All attributes are atomic. - 2NF - All attributes are fully functionally dependent on a key. - 3NF - There are no transitive dependencies in the relation. - BCNF - 3NF and all LHS are superkeys.
Notation for Functional Dependencies
- A functional dependency has a left-side called the determinant which is a set of attributes, and one attribute on the right-side. - Strictly speaking, there is always only one attribute on the RHS, but we can combine several functional dependencies into one:
Normal form
- Condition using keys and Functional Dependencies of a relation to certify whether a relation schema is in a particular normal form - the normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized
Attribute Set Closures
- If and attribute does not appear on the RHS of any FD then it must be a part of the key - if x is a key then any set containing x is a super key - there can be a few equivalent keys
What are the criteria for "good" base relations?
- Making sure that the semantics of the attributes is clear in the schema - Reducing the redundant information in tuples - Reducing the NULL values in tuples - Disallowing the possibility of generating spurious tuples
Normalization
- Normalization is a technique for producing relations with desirable properties. - Using Functional Dependencies. - Normalization decomposes relations into smaller relations that contain less redundancy - This decomposition requires that no information is lost and reconstruction of the original relations from the smaller relations must be possible.
Two levels of relation schemas
- The logical "user view" level - The storage "base relation" level
Normalization theory allow us
- To recognize the undesirable properties of a relation - To show how a relation can be converted to a more desirable form.
How to go about designing a good schema?
- ad-hoc approach, hope for the best! - formal method- start with a single relation with all attributes and systematically decompose
Why Armstrongs Axioms?
- consistent any relation satisfying FD's in F will satisfy those in F+ - complete
Desirable Relational Schema Properties
- consists of attributes that are logically related - lossless-join property, information that is decomposed must be able to be reconstructed without loss - dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations - avoid update anomalies
Problems with Partial Dependencies
- insertion anomalies - deletion anomalies - modification anomalies
Schema Problems
- insertion, deletion, modification anomalies - too many nulls - spurious tuples (called non-additive join)
What are tree-based indexes?
- keep references to our data in a sorted order
Armstrong's Axioms
- reflexivity - augmentation - transitivity lead to.. - union - pseudo transitivity - decomposition
Functional Dependencies
- represent constraints on the values of attributes in a relation and are used in normalization. - is a statement about the relationship between attributes in a relation. We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y. - X-> Y - a property of the domain being modeled NOT of the data instances currently in the database
Ways to convert to 1NF
- splitting method, divide the existing relation into two relations: non-repeating attributes and repeating attributes. Make a relation consisting of the primary key of the original relation and the repeating attributes. Determine a primary key for this new relation. remove the repeating attributes from the original relation - flattening method, create new tuples for the repeating data combined with the data that does not repeat, introduces redundancy that will be removed by normalization
First Normal Form
- value of any attribute is a single value - domain of attribute contain only atomic values (cannot be a set of values or tuples) - a relation not in 1NF is an unnormalized relation
1NF
-A normalized relation containing no repeating groups. -A relation where all underlying domains contain atomic values only
Why normalize tables?
-recognize undesirable properties of a relation -show how a relation can be converted into a more desirable form
Describe the problems that occur when two problem domain entities are combined into a single relation. For example, trying to maintain individual Employee and Department when both entities are combined into a single relation EMP_DEPT.
1) It is difficult to maintain single instances of entities (employees or departments) when both are combined into a single relation. For example, if we want to create a new department, we need to pair with an employee when one may not exist. 2) Likewise, it is difficult to create a instances of one entity when there is initially no association with the second entity. For example, creating a new employee that has not been assigned to a department. Also, if we move all employees out of a department, the system also loses all knowledge of the department. This is likely not desired. Equally important: - There is significant wasted storage space and duplication of information. - We need to use nullable attributes to implement optional information which can result in an inefficient use of table space. - If we want to update employee or department specific attributes, we must update multiple rows in the combined relation.
What are the types of indexing?
1. Clustering 2. Hash-based indexes 3. Tree based indexes
How do you determine if its necessary to use and index?
1. Data distributiokn 2. Size/ layout of data 3. Query vs Update frequency
What are Armstrong's five axioms?
1. Reflexivity 2. Augmentation 3. Transitivity 4. Pseudo-transitivty 5. Decomposition Rule (combines Reflexivity and Transitivity)
How do you achive first normal form?
1. Remove nested relation and non-atomic attributes into a new relation 2. Propagate the primary key into it
Minimal Cover Algorithm
1. Rewrite so that each RHS only has one attribute 2. consider is any steps are implied in other steps 3. Continue deleting redundant FD's
Normalization Procedure Summary
1. use attribute set closure also to find keys and prime attributes 2. test each FD in F to see if it satisfies 3NF/BCNF properties 3. decompose 4. if BCNF is not dependency preserving then go with 3NF
2NF
1NF + every non-key attribute is fully dependent on p.k. OR every non-key attribute is irreducibly dependent on p.k.
What are the normal forms up to 3NF
1NF = all domain values in R are atomic 2NF = R is in 1NF and every non-key attribute is full dependent on the key 3NF = R is 2NF and every non-key attribute is non-transitively dependent on the Key
second normal form (2NF): invoice relation
1NF plus every non-key attribute is fully functionally dependent on the ENTIRE primary key, not part of the key (no partial dependencies)
Normal Forms Defined Informally
1st normal form: All attributes depend on the key. 2nd normal form: All attributes depend on the whole key. 3rd normal form: All attributes depend on nothing but the key.
3NF
2NF + every non-key attribute is not transitively dependent on the primary key
third normal form (3NF): invoice relation
2NF plus no non-primary-key attribute is transitively dependent on the primary key
There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. How would the requirement of 'multiple phone numbers can be assigned to an Employee' affect your decision?
A 1-M relationship between employee and phone numbers would require the use of a weak entity.
What is functional dependence?
A M-1 relationship from one set of attributes to another within a given table X --> Y X functionally determines Y Y is functionally determined by X
If A--> B and B--> A
A and B have a one to one attribute relationship
if A--> B , but B not-->A
A and B have many to one attribute relationship
Define modification anomalies
A data item whose value was updated, but left the database in an incomplete state because that item's copies are scattered and not linked.
Describe the meaning of a Functional Dependency between attributes in a single relation A1 → A2. Use an example such as the Employee relation's SSN (Social Security Number) and the remaining attributes in the relation.
A functional dependency A1 → A2 means that for every unique value assigned to attribute A1, the same value must be found assigned to A2. That is, A1 consistently identifies A2. For example, every tuple that has a given social security number the same first name, last name, etc. must be present in the tuple. In other words, the same SSN (A1) must not result in different employees / names (A2).
Trivial Functional Dependencies
A functional dependency is trivial if the attributes on its left hand side are a super set of the attributes on its right-handside they don't tell us anything Ex. enum, pnum, hours -> enum, hours
superkey
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more.
BCNF
A relation is in boyce/codd NF iff every determinant is a candidate key note: determinant --> dependent
Third Normal Form Definition
A relation is in third normal form it it is in 2NF and there is no non-prime attribute that is transitively dependent on the primary key That is, for all functional dependencies X ->Y of R, one of the following holds: • Y is a prime attribute of R • X is a superkey of R
third normal form
A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute is transitively dependent on the primary key
first normal form (1NF): invoice relation
A relation that has a defined primary key with no repeating groups
Minimal Cover
A set FD is minimal if: 1. every FD in F is of the form X->A where A is a single attribute 2. there aren't more than one attributes on each side, each attribute is essential XY->Z, X->Y is not a minimal cover
What is 5th normal form?
A table is said to be in 5NF if and only if every join dependency in it is implied by the candidate keys
Unnormalized Form (UNF)
A table that contains one or more repeating groups
Normalization
A technique for producing a set of suitable relations that support the data requirements of an enterprise
Self-determination
A-->A
we have (A1, A2, A3, A4) A1 ->A2 A1 -> A3 A1 -> A4 what is A1?
A1 is a key!
What are the three functional dependencies that make up Armstrong's inference rules?
Abbreviated notation, reflective, augmentation
insertion anomaly
Adding new rows creates duplicate data or certain facts cannot be recorded
What are the advantages and disadvantages to clustering?
Advantages: Easy look up for queries that use a RANGE of items ie. 15,000 > = salary Disadvantages: Only one per table allowed Requires us to rearrange the table on inserts, deletes
FD Diagram shows?
All of the relationships (FDs) between the attributes (columns) of a relation (table)
Lossless Join
Allows us to put a decomposed table back together after decomposition without any loss of information or uncertainty
Determinant
An attribute (could be composite) that some other attribute is fully functionally dependent on The left side of a FD
Define deletion anomalies
Attempted to delete a record, but not all of its copies were deleted because they were not linked properly.
The book describes three possible methods of bringing Department (Figure 15.9 B) into First Normal Form. We discussed the first solution in class. Describe the other two possible methods of normalization (changes to the Department schema) and the problems that result when each is applied.
B.1. The first solution was to move department location into a Department relation using the department's ID as a foreign key. This was the solution discussed in class and is the preferred solution. B.2. The second solution is to expand the relation's key to include both department number and location. This has the effect of creating a separate tuple for each department's locations. It also introduces redundant values in the relation e.g. dmgr_ssn (bad). B.3. If there is a maximum number of department locations (e.g. three locations), we can introduce three nullable location attributes into the relation. This has the disadvantage of introducing null attributes into the relation which is a waste of table space in the DBMS (bad).
How would this relation suffer from Update Anomalies? CUSTOMER_PURCHASE (CUSTOMER_ID, PRODUCT_ID, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME) Note the relation's primary key is underlined
Because PURCHASE_DESCRIPTION and CUSTOMER_NAME are partially dependent on the relation's PK, these attributes will be duplicated in every tuple that shares the same product or customer. Changing the value of a Customer's name or Product's description will require updating every tuple that references the entity being modified.
How do you create an index in SQL?
CREATE INDEX
What are the advantages of hash functions?
Can index on more than one attributes Good for looking up values based on EQUALITY TESTS (Equijoins win big with hash based inexes) Disadvantage: Not so good with range based tests -- salary > = 15,000
What is an insertion anomaly of 2NF (using the surgery example)?
Cannot enter the fact that a particular drug has a particular side effect unless it is given to a patient.
update anomaly
Changing data in a row forces changes to other rows to prevent duplication
Attributes
Columns
How do we get an Unnormalized Relation?
Convert the data into a two-dimensional table. Data can repeat within a column.
unnormalized form (UNF): invoice data
Data transformed from an information source (e.g. form) into table format
Abassi is
Database
How do you solve update, insert and deletion anomalies?
Decompose the relation into different relations(tables)
What is a delete anomaly of 1NF (using the surgery example)?
Deleting a patient record may also delete all information about a surgeon.
deletion anomaly
Deleting rows may result in the loss of data needed for other future rows
Bad compression vs good compression
Dependencies are preserved in good compression
Transitive Dependency
Dependency between a non-key attribute and another non-key attribute
Functional dependency
Describes relationship between attributes
I: Red Bull consumption D: rate of hyperactivity
Drinking Red Bull increases hyperactivity in children
I: lemon consumption D: test scores
Eating lemons causes students to have higher test scores
I: eating sugar D: memory level
Eating sugar impairs performance on a memory task
Dependency preservation property
Enables us to enforce a constraint on the original relation by enforcing some constraint on each of the smaller relations
Lossless-join property
Enables us to find any instance of the original relation from corresponding instances in the smaller relations
What is BCNF?
Every determinant of R is a condidate key
What is second normal form?
Every non prime attribute is not partially dependent on any candidate key`
Computing Attribute Closure
FInd an attribute that lets you find all of the other attributes in a circular like motion
24
Find f(-2) = -10x + 4
126
Find f(12) = 10x+6
11
Find f(8) = (1/2)x +7
-20
Find f(x) if f(x) =-7x + 1 when x = 3
-9
Find x for f(x) = -6x + 7 when f(x) = 61
-8
Find x for f(x) = 5x - 6 when f(x) = -46
If there is a functional dependency X -> Y, what does it mean that if t1[X] = t2[X] is true, then t1[Y] = t2[Y] must also be true?
For any two records (tuples) that agree on attribute X, those records must also agree on attribute Y.
Boyce-Codd Normal Form (BCNF)
For every relation scheme R and for every X -> A that holds over R, either A is a subset of X or X is a super key for R
Third Normal Form (3NF)
For every relation scheme R and for every X -> A that holds over R, either A is a subset of X or X is a super key for R or A is a member of some key R
What is the difference between full functional dependency and partial functional dependency?
Full = Warehouse --> W_Address Not full = Part, Warehouse --> W_Address
What are hash-based indexes?
Hash table associates a key to a record or list of records using a hash function Number to department and organizes employees to hash code
I: length of football practice D: number of touchdowns during games
Having longer football practice increases touchdowns during games
Composition
IF A-->B and C-->D then AC --> BD
How do you identify functional dependencies?
If 2 tuples have the same X value, they must have the same Y value
Union
If A-->B and A-->C, then A-->BC
General Unification Theoream:
If A-->B and C-->D, then A U (C-B) -->BD
Union
If A->B and A->C, then A->BC
Please provide an example of how functional dependencies are not symmetric i.e. if A1 → A2 it is not necessarily true that A2 → A1.
If a SSN identifies a customer's (fname, lname), it is not the case that unique (fname, lname) identifies the same SSN. Naturally, other examples are possible.
There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. Given a requirement that a large number of the employees (> 80%) are assigned a phone number, which is the preferred method and why?
If a large percentage of EMPLOYEE tuples have a single office number, the use a nullable attribute would seem appropriate. This is because the nullable attribute is more efficient (No Join Needed) and the large percentage of employees with numbers indicates that the nullable attribute will not be wasteful of table space.
What is an update anomaly of 1NF (using the surgery example)?
If a patient changes their address, multiple address entries have to be changed.
There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. Given a requirement that only a small number of employees (< 10%) are assigned phone numbers, which is the preferred method and why?
If a small percentage of EMPLOYEE tuples will have a single office number, the use of a nullable attribute would waste table space and the weak entity is the best choice. However, the additional processing of performing a Join with the weak relation may still make the nullable attribute more attractive.
How would this relation suffer from Update Anomalies? PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE) Note the relation's primary key is underlined.
If the customer were to change their address (city or zip-code), we would need to update all of the customer's purchases based on the customer_id attribute.
data normalization
Improving the logical design to create well-structured relations to: 1. Avoid duplication and conserve storage 2. Satisfy certain referential integrity constraints (i.e. a particular row in one table can be related to at most one row in a related table) 3. Facilitate data maintenance (insert, update, and delete) 4. Provide a better design that enables future growth
FDs cannot be ___, but must be ____
Inferred; defined explicitly
Positive
Is the slope of y = 2x + 3 negative or positive?
A relation schema R is in 3NF if every nonprime attribute of R meets both of the following conditions:
It is fully functionally dependent on every key of R. It is non transitively dependent on every key of R.
Finding Keys
Keep adding attributes to the set until the set can be closed. Smallest subset of the attributes that allows for all of the other attributes to be determined.
I: listening to a radio D: test performance
Listening to a radio broadcast of a sports event while studying for a test decreases performances on the test
Why would R.A multidetermins attribute R.B?
MIf the set of B values matching a given pair in R depends only on the A value and is independent of C value For example -- president --> VP President -->Loser Both can have different numbers and are independent of each other, but to be in 4nf they need to be separated into tables
Why do we use indexes?
Make data access orders of magnitude faster
If A not--> B and B not-->A
Many to many attribute relationship
I: hours of sleep D: time it takes for students to get to class
More sleeps help students move to class more quickly
What is 3NF?
Must be in 2NF and no prime attribute of R is transitively dependent on primary key
1NF
NF: all domain values in R are atomic
Given an r(R), can we conclude an FD holds?
No, we can only say if an FD does not hold.
Partial functional dependency
Non-key Attribute is functionally dependent on part (but not all) of the primary key
What is an overview of 2NF?
Non-key attributes must be dependent on the full key (any candidate key).
I: crowd (number of people around) D: shyness
People will become shy if they are in a crowd
What are the advantages and disadvantages of tree-based indexes
Properties: Worst case is O(log n) for search, update, and delete
Normalization attempts to get rid of what anomalies?
Redundancy anomalies (potential inconsistency) -update anomalies -insertion anomalies -deletion anomalies
Armstrong's Axioms
Reflexivity: If B is subset of A then A --> B Augmentation: If A --> B then AC--> BC Transitivity: If A --> B and B --> C then A --> C
Informally describe the conditions to be met for a relation to be in First Normal Form.
Relations should have not multivalued attributes or nested relations. Alternatively, attribute are permitted to maintain only single atomic (indivisible) values.
I: smoking cigarettes D: rate of lung cancer
Smoking cigarettes while driving a car increases lung cancer
What is the process of Normalisation?
Starting from a universal all-listing relation, progressively remove redundant data from the table. In a relational model, methods exist for quantifying how efficient a database is, called normal forms (NF). There are algorithms for converting a database from one NF to another.
What is the clustering approach to indexing?
Table is rearranged by order of groupings of data -- ie Employee ids grouped by department
Describe why the following relation PURCHASE is not in third normal form. PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE) Note the relation's primary key is underlined.
The attributes CUSTOMER_CITY and CUSTOMER_ZIPCODE are functionally dependent on CUSTOMER_ID which is not a primary key.
What is relational database design?
The grouping of attributes to form good relation schemas
Informally describe the conditions to be met for a relation to be in Second Normal Form. CUSTOMER_PURCHASE (++CUSTOMER_ID, PRODUCT_ID++, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME) Note the relation's primary key is +
The informal description of 2NF is: For relations where the primary key is a compound key (i.e. keys defined from multiple attributes), all nonprime attribute (attributes not part of the primary key) must be functionally dependent on all of the key's attributes.
Informally describe the conditions to be met for a relation to be in Third Normal Form. PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE) Note the relation's primary key is underlined.
The informal description of 3NF is: A relation should not have a non-key attribute functionally dependent on another non-key attribute. That is, there should be no transitive dependencies of a non-key attribute on the primary key.
Normalization
The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations
Denormalization
The process of storing the join of higher normal form relations as a base relation—which is in a lower normal form
How do we decompose this relation to solve this problem?
The solution is to decompose the combined relation into two relations (CUSTOMER & DEPARTMENT) and use a FK to maintain the 1-M relationship or a join-table if it happens to be an M-N relationship.
Describe the suggested method of eliminating the use of a multi-valued attribute to represent the employee's three favorite colors. Ename|Ssn|Bdate|Color_1|Color_2|Color_3
This is another example of the use of a weak relation to implement a 1-M relationship between Employee and zero or more color choices. Note for the exam you may be asked to describe the new weak relation's table.
Describe why the following CUSTOMER_PURCHASE relation is not in second normal form. CUSTOMER_PURCHASE (CUSTOMER_ID, PRODUCT_ID, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME) Note the relation's primary key is underlined
This relation mixes attributes from a customer's purchase (CUSTOMER_ID, PRODUCT_ID) → (PURCHASE_DATE, & PURCHASE_AMT) with attributes that are dependent on only the Product ID → PRODUCT_DESCRIPTION and Customer ID → CUSTOMER_NAME.
In 2NF why and how to eliminate transitive dependence?
To solve the update, insert and delete anomalies by decomposing into tables
I: stress level D: rate of stomach aches
Too much stress causes stomach aches
Define insertion anomalies
Tried to insert data in a record that does not exist.
Trivial vs non-trivial dependencies
Trivial: right-side is subset of left-side X,Y --> X Nontrivial: everything else
Define spurious tuples
Tuples that are not in the original relation, but are produced by a subsequent join.
I: UV light D: mold growth
UV light decreases the growth of mold
Guidelines for decomposition
Use independent projection(s) ex. projections R1 & R2 of relation R are independent iff: 1. every FD in R can be logically deduced from FDs in R1 and R2 2. the common attributes of R1 & R2 form a candidate key for at least one of the pairs
Full Functional Dependency
Value of attribute is functionally dependent on the (entire) primary key
dependent variable
Value that depends on the value of the independent variable
independent variable
Value that does not depend on another
the slope
What does this formula find?
-1/2
What is the slope of y = -1/2x + 2?
0
What is the slope of y = 3?
(0,-4)
What is the y-intercept if y = -2x - 4?
5
What is the y-intercept of y = 1/2x + 5?
If a relation is in 2NF is it also in 1NF by default?
Yes
Key
a minimal superkey
Partial Dependencies
a partial dependency only relies on part of the key Ex: Key AC A->B is a partial dependency
Second Normal Form
a relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on a candidate key there are no partial dependencies By definition, any relation with a single key attribute is in 2NF. Relation is in 2NF if no non-prime attribute is partially dependent on the primary key.
Candidate Keys
a relation may have more than one Key
Prime Attributes
an attribute that belongs to some candidate key
trivial functional dependencies
augmented functional dependency, equivalent functional dependency
Guideline 3
avoid putting attributes in a tuple that will frequently take null values for one reason or another
Atomic relation
can't be decomposed into independent relations
If you know functional dependencies, you can ___ your keys
derive
Guideline 1
design relation schema so that it is easy to explain its meaning, do not combine attributes from multiple entity types and relationship types into a single relation
Guideline 2
design the base relation schemas so that no update anomalies are present in the relations and no redundant information exists in tuples
equivalent functional dependency
determinants and non-key attributes are interchangeable
full dependency
determinants should have the minimal number of attributes to maintain the functional dependency with all non-key attributes (i.e. B is functionally dependent on A but not on a subset of A
Second normal form
every non primary attribute should be fully functionally dependent on the primary key
1
f(x) = (1/10)x+3 when x = -20
0
find f(-2) = 5x + 10
12
find f(-9) - g(8) when f(x) = -2x -5 and g(x) = (1/2)x -3
13
find f(6) + g(14) when f(x) = (1/2)x + 5 and g(x) = (1/14)x + 4
-31
find f(x) when f(x) =3x - 7 when x = -8
17
find x for f(x) = -x+10 when f(x) = -7
-1
find x for f(x) = 4x + 2 when f(x) = -2
-10
find x for f(x) =-4x - 8 when f(x) = -48
-5
find x for f(x) =-6x -5 when f(x) = 25
Lossless Property
guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
functional dependence
h is functionally dependent on g if, h= f°g
Decomposition
if A--> BC, then A-->B and A-->C
Transitive Dependencies
if X-> Z and Z->Y then we have transitive dependencies
A relation schema R is in second normal form (2NF)
if every non-prime attribute A in R is fully functionally dependent on every key of R
Operations that may result in anomalies
insert operation, delete operation, update operation
Problems with Transitive Dependencies
insertion, deletion and modification anomalies
First normal form
it is considered to be part of the definition of relation Disallows: composite attributes multivalued attributes nested relations; attributes whose values for an individual tuple are non-atomic
independent
line intersects in one point
inconsistent
lines are parallel (no points intersect)
dependent
lines are same (all points intersect)
sound proof
mean that no fd inferred by using rules reflexive, augmentation, and transitivity will violate the original set of fds.
Can you tell a functional dependency by looking at the data?
no, you cannot tell if one attribute is dependent on another by looking at the data
augmented functional dependency
non-key attributes are functionally dependent on a subset of its determinant
partial dependency
non-key attributes are functionally dependent on a subset of the primary key
Are functional dependencies symmetric?
nope! they go in a single direction!
Will there always be a dependency preserving decomposition into BCNF?
not always!
Semantics
pertaining to a relation, refers to its meaning resulting from the interpretation of attribute values in a tuple
Update Anomalies
problems associated with storing natural joins of base relations
The semantic of a relation
refer to its meaning resulting from the interpretation of attribute values in a tuple
Tuples
rows
Relations
tables
transitive dependencies
the primary key is a determinant for another attribute which is a determinant for a third attribute
Spurious Tuples
tuples that represent information that is not valid
functional dependency
value of the determinant decides the value of another attribute
A functional dependency has nothing to do with ____
what the table will be populated with
Union
x -> y && x->z | x->yz
(0,0)
y = 2x has a y-intercept of...?
point-slope
y-y₁=a(x-x₁)
slope-intercept
y=ax+b, a≠0
What is an overview of 1NF?
All attributes are atomic (cannot be broken down into simpler values) and there is a key.
Reflexive
proof - Armstrongs axiom if y is a subset of x then x -> y
complete proof
we mean that by the exhaustive application of rules 1 - 3 to a set of dependencies, f, we will infer all possible dependencies that can be inferred from f
Pseudo Transitivity
x -> y, wy -> z |= wx -> z
Decomposition
x -> yz | x -> y && x-> z