question 3 exam Functional Dependencies

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

What are our goals for relational database design?

- BCNF. - Lossless-join decompostion. - Dependency preservation. If this is not achievable, we either: - Choose BCNF and accept lack of dependency preservation, or - Choose 3NF.

What are the 7 types of normal forms?

- First Normal Form (1NF). - Second Normal Form (2NF). - Third Normal Form (3NF). - Boyce-Codd Normal form (BCNF). - Fourth Normal Form (4NF). - Fifth Normal Form (5NF). - Sixth Normal Form (6NF). Typically, 3NF (or BCNF) is sufficient.

When decomposing how important are Lossless Joins and Dependency Preservation?

- Lossless Joins are critical. - Dependency preservation is desirable but sometimes sacrificed.

What must a good DB design do?

- Make it easier for users to retrieve information. - Avoid storing redundant information (avoiding redundancy fundamentally depends on the notions of functional dependencies).

In general, given R and set of FDs F. R1, R2 is a lossless decomposition if and only if...?

- R1 ∩ R2 -> (R1 - R2) (functionally detemines the things in R1 but not in R2). or - R1 ∩ R2 -> (R2 - R1) Exist in F+. F+ is the cover of F and is the set of all implied FDs from F.

What is an insertion anomaly of 1NF (using the surgery example)?

A new patient is to be added but has not yet undergone surgery, so no surgeon #. However, since surgeon # is part oft he key, we cannot insert.

What is Normalisation Theory?

A way to ensure the above "good DB design". It is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data.

What makes an attribute prime?

An attribute is prime if it is part of the minimal candidate key (other attributes are non-prime).

How do we get 1NF?

Attributes must have atomic values (i.e., no multi-valued attributes or composite attribute). 1NF simplifies attributes.

standard form

Ax+By=C, all are integers

How do we avoid the aforementioned anomalies?

Bad relational schema will have to be decomposed into smaller relation schema. Given this decomposition, we have to be able to join these relations to collect the original information. If the decomposition is done poorly, the joining of decomposed relations can create problems in the form of spurious tuples.

What is guideline 4 for good R-RB design?

Design relations to avoid spurious tuples. When relations are to be joined, they should use keys as join attributes.

How are Function Dependencies of use here?

FDs are used to specify formal measures of the 'goodness' or relational designs. Will be used to define keys, which are used to define 'normal forms' for relations. (X -> Y states X determines a unique value of Y).

Self-Determination

For all A, A->A

Full Functional Dependency Notation

Go over this

Transitivity

If A->B and B->C, then A->C

Composition

If A->B and C->D, then AC->BD

Determinant

If A->B, A is called the

Consequent

If A->B, B is called the

Augmentation

If A->B, then AC->BC

Decomposition

If A->BC, then A->B and A->C

Reflexivity

If B is a subset of A, then A->B

What is a delete anomaly of 2NF (using the surgery example)?

If a patient receives some other drug because of a penicillin rash, and new drug and side effects are entered, we lost information that penicillin causes a rash.

What is a super key?

If a set of attributes X in relation R functionally determines all the other attributes of R, then X is a super key of R.

What is an update anomaly of 2NF (using the surgery example)?

If drug side effects change, we have to update multiple occurrences of side effects.

What is an overview of BCNF?

If every determinant (of a non-trivial FD) is a superkey.

What is a minimal key?

If no part of a superkey can be removed while keeping it as a superkey then it is a minimal key. All minimal keys are superkeys but not all superkeys are minimal keys. if some variable is never on the right hand side of a --> then it must be part of the minimal key.

Attribute B is functionally dependent on attribute A

If the value of attribute B is determined by the value of attribute A, then

What is Dependency Preservation?

If we decompose a relation R, do all the original FDs hold? Otherwise, to check updates for FD violation may require computing joins, which is an expensive operation. (In other words, have to be able to check all FDs hold using only the decomposed relations).

What does Normalisation generally involve and what are the fundamental concepts?

Involves decomposing existing tables in multiples tables. Upon each query, the tables must be re-joined, so no information is lost. The fundamental concepts are: - FDs and keys. - Lossless joins. - Dependency preservation.

No

Is Functional Dependency symmetric? (EX: If A->B then B->A)

Yes

Is functional dependency a directional property?

What is a candidate key?

It is a column or set of columns that can uniquely identify any database record without referring to any other data. It is also called a minimal superkey (as it is the minimal set of attributes necessary to identify a tuple.)

Closure of f under x

It means that we can now determine, for each set of attributes x in the database, the set x+ of attributes that are functionally determined by x

How do we get BCNF?

Most 3NF relations are already BCNF relations. The right-hand side of an FD must not be a prime attribute unless the FD is trivial. (In other words, cannot have an FD where ... -> A(PK) unless the FD is trivial).

How do we get 2NF?

Must already be in 1NF. Every non-key attribute must be fully functionally dependent on any key. No proper subset of the primary key can functionally determine a non-key attribute. 2NF improves data integrity. (Note: if any key is a single attribute in 1NF then it is automatically in 2NF).

How do we get 3NF?

Must already be in 2NF. There is no transitive functional dependency from a key to a non-key attribute. This occurs when one non-key attribute determines another non-key attribute. E.g., A(PK) -> B and B -> C gives that A(PK) -> C. Once in 3NF, it is always possible to find a dependency-preserving lossless-join decomposition.

2NF

NF: every monkey attribute is fully dependent on the key

3NF

NF: every non key attribute is non-transivitely dependent on the key

BCNF

NF: if X-> A holds in R, then X is a super key of R

What is guideline 3 for good R-DB design?

NULL values within tuples is harmful and should be avoided. However, NULL values are typically unavoidable. Attributes whose value in a relation is expected to be frequently NULL, should not be a part of that scheme, another schema should be created to accommodate for it.

What is an overview of 3NF?

Non-key attributes must only depend on the key (any candidate key).

What are the issues with Normalisation?

Normalisation requires the JOIN operation to be used, which can be very expensive for large joins. Some R-DBs uses tables that are not high in NFs.

What is guideline 2 for good R-DB design?

Redundancy of tuple information is harmful and must be avoided. For two reasons: - Storage costs: replicating same info in different places wastes space. - Consistency costs and anomalies: all replicas must be kept consistent during updates, anomalies can occur when all replicas need updating.

What is a Primary Key?

The chosen candidate key.

What is guideline 1 for good R-DB design?

The scheme and the attributes for the schema should make sense and be easy to understand. - Each tuple should represent one entity or relationship instance. - Attributes of different entities should not be in the same table. - Referring to other entities should only be done via foreign keys.

Why might creating a new project for the relation EMP_PROJ(Emp#, Proj#, Ename, Pname) cause an anomaly?

To create a new project, we need to insert a new tuple, however we cannot do this without assigning an employee to it. In general, we should be able to do this as employee assignments to projects may be done at a later date in the process.

What is the overall goal of Normalisation?

To decompose relations by: - reducing redundancy - while preserving dependencies - in a lossless-join manner.

What are spurious tuples?

Tuples formed as part of a join that did not previously exist. Usually occurs when joining two relations on a non-key attribute.

Transitive, Partial, Total

When identifying dependencies, what order do we find them in?

x intercept

a if (a,0)∈f

Superkey

a subset of the relation attributes where all tuple values must be distinct, the subset cannot be the same for two tuples

y intercept

b if (0,b)∈f

joint proportion

combo of direct and inverse proportions

Dependency Property

each functional dependency is represented in some individual relation resulting after decomposition

linear function

f={(x,y); y=ax+b, a≠0}

multiplying function

f={(x,y); y=ax} line through orgin

constant function

f={(x,y); y=k} horizontal line

linear functional dependence

n is a linear function of m, n=f°m

Normalization

process of analyzing the given relation schemes based on their FDs and primary keys t achieve the desirable properties of minimizing redundancy and minimizing the insertion, deletion, and update anomalies

Denormalization

process of storing the join of higher normal form relations as a base relation, which is in a lower normal form

Transitivity

proof - Armstrongs axiom x -> y && y -> z | x-> z

Augmentation

proof - Armstrongs axiom x -> y | xz -> y

Functional dependancy

x->y the values of the y component depend upon the value of the x component

inverse proportion

y₁ x x₁=y2 x x2, hyperbolic

direct proportion

y₁/x₁=y2/x2, multiplying

What is a lossless-join decomposition of schema R?

{R1,...,Rk} is a lossless-join decomposition of R, if R1,...,Rk can be natural joined to create a legal instance of R (the instance satisfies all FDs defined on R).

slope

∆y/∆x

What is the general solution to normalizing (fixing) relations that are not in second or third normal forms?

"Normalization through Decomposition" In much the same manner as correcting an entity class that lacks cohesion from OOAD, a relation that contains functional dependencies not covered by the relation's primary key can be decomposed into 2 or more relations, each of which have their own primary keys and are in second and third normal form.

y-intercept

"b" in y = mx + b stands for

slope

"m" in y = mx + b stands for

Summary of Normal Forms

- 1NF - All attributes are atomic. - 2NF - All attributes are fully functionally dependent on a key. - 3NF - There are no transitive dependencies in the relation. - BCNF - 3NF and all LHS are superkeys.

Notation for Functional Dependencies

- A functional dependency has a left-side called the determinant which is a set of attributes, and one attribute on the right-side. - Strictly speaking, there is always only one attribute on the RHS, but we can combine several functional dependencies into one:

Normal form

- Condition using keys and Functional Dependencies of a relation to certify whether a relation schema is in a particular normal form - the normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized

Attribute Set Closures

- If and attribute does not appear on the RHS of any FD then it must be a part of the key - if x is a key then any set containing x is a super key - there can be a few equivalent keys

What are the criteria for "good" base relations?

- Making sure that the semantics of the attributes is clear in the schema - Reducing the redundant information in tuples - Reducing the NULL values in tuples - Disallowing the possibility of generating spurious tuples

Normalization

- Normalization is a technique for producing relations with desirable properties. - Using Functional Dependencies. - Normalization decomposes relations into smaller relations that contain less redundancy - This decomposition requires that no information is lost and reconstruction of the original relations from the smaller relations must be possible.

Two levels of relation schemas

- The logical "user view" level - The storage "base relation" level

Normalization theory allow us

- To recognize the undesirable properties of a relation - To show how a relation can be converted to a more desirable form.

How to go about designing a good schema?

- ad-hoc approach, hope for the best! - formal method- start with a single relation with all attributes and systematically decompose

Why Armstrongs Axioms?

- consistent any relation satisfying FD's in F will satisfy those in F+ - complete

Desirable Relational Schema Properties

- consists of attributes that are logically related - lossless-join property, information that is decomposed must be able to be reconstructed without loss - dependency preservation property ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations - avoid update anomalies

Problems with Partial Dependencies

- insertion anomalies - deletion anomalies - modification anomalies

Schema Problems

- insertion, deletion, modification anomalies - too many nulls - spurious tuples (called non-additive join)

What are tree-based indexes?

- keep references to our data in a sorted order

Armstrong's Axioms

- reflexivity - augmentation - transitivity lead to.. - union - pseudo transitivity - decomposition

Functional Dependencies

- represent constraints on the values of attributes in a relation and are used in normalization. - is a statement about the relationship between attributes in a relation. We say a set of attributes X functionally determines an attribute Y if given the values of X we always know the only possible value of Y. - X-> Y - a property of the domain being modeled NOT of the data instances currently in the database

Ways to convert to 1NF

- splitting method, divide the existing relation into two relations: non-repeating attributes and repeating attributes. Make a relation consisting of the primary key of the original relation and the repeating attributes. Determine a primary key for this new relation. remove the repeating attributes from the original relation - flattening method, create new tuples for the repeating data combined with the data that does not repeat, introduces redundancy that will be removed by normalization

First Normal Form

- value of any attribute is a single value - domain of attribute contain only atomic values (cannot be a set of values or tuples) - a relation not in 1NF is an unnormalized relation

1NF

-A normalized relation containing no repeating groups. -A relation where all underlying domains contain atomic values only

Why normalize tables?

-recognize undesirable properties of a relation -show how a relation can be converted into a more desirable form

Describe the problems that occur when two problem domain entities are combined into a single relation. For example, trying to maintain individual Employee and Department when both entities are combined into a single relation EMP_DEPT.

1) It is difficult to maintain single instances of entities (employees or departments) when both are combined into a single relation. For example, if we want to create a new department, we need to pair with an employee when one may not exist. 2) Likewise, it is difficult to create a instances of one entity when there is initially no association with the second entity. For example, creating a new employee that has not been assigned to a department. Also, if we move all employees out of a department, the system also loses all knowledge of the department. This is likely not desired. Equally important: - There is significant wasted storage space and duplication of information. - We need to use nullable attributes to implement optional information which can result in an inefficient use of table space. - If we want to update employee or department specific attributes, we must update multiple rows in the combined relation.

What are the types of indexing?

1. Clustering 2. Hash-based indexes 3. Tree based indexes

How do you determine if its necessary to use and index?

1. Data distributiokn 2. Size/ layout of data 3. Query vs Update frequency

What are Armstrong's five axioms?

1. Reflexivity 2. Augmentation 3. Transitivity 4. Pseudo-transitivty 5. Decomposition Rule (combines Reflexivity and Transitivity)

How do you achive first normal form?

1. Remove nested relation and non-atomic attributes into a new relation 2. Propagate the primary key into it

Minimal Cover Algorithm

1. Rewrite so that each RHS only has one attribute 2. consider is any steps are implied in other steps 3. Continue deleting redundant FD's

Normalization Procedure Summary

1. use attribute set closure also to find keys and prime attributes 2. test each FD in F to see if it satisfies 3NF/BCNF properties 3. decompose 4. if BCNF is not dependency preserving then go with 3NF

2NF

1NF + every non-key attribute is fully dependent on p.k. OR every non-key attribute is irreducibly dependent on p.k.

What are the normal forms up to 3NF

1NF = all domain values in R are atomic 2NF = R is in 1NF and every non-key attribute is full dependent on the key 3NF = R is 2NF and every non-key attribute is non-transitively dependent on the Key

second normal form (2NF): invoice relation

1NF plus every non-key attribute is fully functionally dependent on the ENTIRE primary key, not part of the key (no partial dependencies)

Normal Forms Defined Informally

1st normal form: All attributes depend on the key. 2nd normal form: All attributes depend on the whole key. 3rd normal form: All attributes depend on nothing but the key.

3NF

2NF + every non-key attribute is not transitively dependent on the primary key

third normal form (3NF): invoice relation

2NF plus no non-primary-key attribute is transitively dependent on the primary key

There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. How would the requirement of 'multiple phone numbers can be assigned to an Employee' affect your decision?

A 1-M relationship between employee and phone numbers would require the use of a weak entity.

What is functional dependence?

A M-1 relationship from one set of attributes to another within a given table X --> Y X functionally determines Y Y is functionally determined by X

If A--> B and B--> A

A and B have a one to one attribute relationship

if A--> B , but B not-->A

A and B have many to one attribute relationship

Define modification anomalies

A data item whose value was updated, but left the database in an incomplete state because that item's copies are scattered and not linked.

Describe the meaning of a Functional Dependency between attributes in a single relation A1 → A2. Use an example such as the Employee relation's SSN (Social Security Number) and the remaining attributes in the relation.

A functional dependency A1 → A2 means that for every unique value assigned to attribute A1, the same value must be found assigned to A2. That is, A1 consistently identifies A2. For example, every tuple that has a given social security number the same first name, last name, etc. must be present in the tuple. In other words, the same SSN (A1) must not result in different employees / names (A2).

Trivial Functional Dependencies

A functional dependency is trivial if the attributes on its left hand side are a super set of the attributes on its right-handside they don't tell us anything Ex. enum, pnum, hours -> enum, hours

superkey

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more.

BCNF

A relation is in boyce/codd NF iff every determinant is a candidate key note: determinant --> dependent

Third Normal Form Definition

A relation is in third normal form it it is in 2NF and there is no non-prime attribute that is transitively dependent on the primary key That is, for all functional dependencies X ->Y of R, one of the following holds: • Y is a prime attribute of R • X is a superkey of R

third normal form

A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute is transitively dependent on the primary key

first normal form (1NF): invoice relation

A relation that has a defined primary key with no repeating groups

Minimal Cover

A set FD is minimal if: 1. every FD in F is of the form X->A where A is a single attribute 2. there aren't more than one attributes on each side, each attribute is essential XY->Z, X->Y is not a minimal cover

What is 5th normal form?

A table is said to be in 5NF if and only if every join dependency in it is implied by the candidate keys

Unnormalized Form (UNF)

A table that contains one or more repeating groups

Normalization

A technique for producing a set of suitable relations that support the data requirements of an enterprise

Self-determination

A-->A

we have (A1, A2, A3, A4) A1 ->A2 A1 -> A3 A1 -> A4 what is A1?

A1 is a key!

What are the three functional dependencies that make up Armstrong's inference rules?

Abbreviated notation, reflective, augmentation

insertion anomaly

Adding new rows creates duplicate data or certain facts cannot be recorded

What are the advantages and disadvantages to clustering?

Advantages: Easy look up for queries that use a RANGE of items ie. 15,000 > = salary Disadvantages: Only one per table allowed Requires us to rearrange the table on inserts, deletes

FD Diagram shows?

All of the relationships (FDs) between the attributes (columns) of a relation (table)

Lossless Join

Allows us to put a decomposed table back together after decomposition without any loss of information or uncertainty

Determinant

An attribute (could be composite) that some other attribute is fully functionally dependent on The left side of a FD

Define deletion anomalies

Attempted to delete a record, but not all of its copies were deleted because they were not linked properly.

The book describes three possible methods of bringing Department (Figure 15.9 B) into First Normal Form. We discussed the first solution in class. Describe the other two possible methods of normalization (changes to the Department schema) and the problems that result when each is applied.

B.1. The first solution was to move department location into a Department relation using the department's ID as a foreign key. This was the solution discussed in class and is the preferred solution. B.2. The second solution is to expand the relation's key to include both department number and location. This has the effect of creating a separate tuple for each department's locations. It also introduces redundant values in the relation e.g. dmgr_ssn (bad). B.3. If there is a maximum number of department locations (e.g. three locations), we can introduce three nullable location attributes into the relation. This has the disadvantage of introducing null attributes into the relation which is a waste of table space in the DBMS (bad).

How would this relation suffer from Update Anomalies? CUSTOMER_PURCHASE (CUSTOMER_ID, PRODUCT_ID, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME) Note the relation's primary key is underlined

Because PURCHASE_DESCRIPTION and CUSTOMER_NAME are partially dependent on the relation's PK, these attributes will be duplicated in every tuple that shares the same product or customer. Changing the value of a Customer's name or Product's description will require updating every tuple that references the entity being modified.

How do you create an index in SQL?

CREATE INDEX

What are the advantages of hash functions?

Can index on more than one attributes Good for looking up values based on EQUALITY TESTS (Equijoins win big with hash based inexes) Disadvantage: Not so good with range based tests -- salary > = 15,000

What is an insertion anomaly of 2NF (using the surgery example)?

Cannot enter the fact that a particular drug has a particular side effect unless it is given to a patient.

update anomaly

Changing data in a row forces changes to other rows to prevent duplication

Attributes

Columns

How do we get an Unnormalized Relation?

Convert the data into a two-dimensional table. Data can repeat within a column.

unnormalized form (UNF): invoice data

Data transformed from an information source (e.g. form) into table format

Abassi is

Database

How do you solve update, insert and deletion anomalies?

Decompose the relation into different relations(tables)

What is a delete anomaly of 1NF (using the surgery example)?

Deleting a patient record may also delete all information about a surgeon.

deletion anomaly

Deleting rows may result in the loss of data needed for other future rows

Bad compression vs good compression

Dependencies are preserved in good compression

Transitive Dependency

Dependency between a non-key attribute and another non-key attribute

Functional dependency

Describes relationship between attributes

I: Red Bull consumption D: rate of hyperactivity

Drinking Red Bull increases hyperactivity in children

I: lemon consumption D: test scores

Eating lemons causes students to have higher test scores

I: eating sugar D: memory level

Eating sugar impairs performance on a memory task

Dependency preservation property

Enables us to enforce a constraint on the original relation by enforcing some constraint on each of the smaller relations

Lossless-join property

Enables us to find any instance of the original relation from corresponding instances in the smaller relations

What is BCNF?

Every determinant of R is a condidate key

What is second normal form?

Every non prime attribute is not partially dependent on any candidate key`

Computing Attribute Closure

FInd an attribute that lets you find all of the other attributes in a circular like motion

24

Find f(-2) = -10x + 4

126

Find f(12) = 10x+6

11

Find f(8) = (1/2)x +7

-20

Find f(x) if f(x) =-7x + 1 when x = 3

-9

Find x for f(x) = -6x + 7 when f(x) = 61

-8

Find x for f(x) = 5x - 6 when f(x) = -46

If there is a functional dependency X -> Y, what does it mean that if t1[X] = t2[X] is true, then t1[Y] = t2[Y] must also be true?

For any two records (tuples) that agree on attribute X, those records must also agree on attribute Y.

Boyce-Codd Normal Form (BCNF)

For every relation scheme R and for every X -> A that holds over R, either A is a subset of X or X is a super key for R

Third Normal Form (3NF)

For every relation scheme R and for every X -> A that holds over R, either A is a subset of X or X is a super key for R or A is a member of some key R

What is the difference between full functional dependency and partial functional dependency?

Full = Warehouse --> W_Address Not full = Part, Warehouse --> W_Address

What are hash-based indexes?

Hash table associates a key to a record or list of records using a hash function Number to department and organizes employees to hash code

I: length of football practice D: number of touchdowns during games

Having longer football practice increases touchdowns during games

Composition

IF A-->B and C-->D then AC --> BD

How do you identify functional dependencies?

If 2 tuples have the same X value, they must have the same Y value

Union

If A-->B and A-->C, then A-->BC

General Unification Theoream:

If A-->B and C-->D, then A U (C-B) -->BD

Union

If A->B and A->C, then A->BC

Please provide an example of how functional dependencies are not symmetric i.e. if A1 → A2 it is not necessarily true that A2 → A1.

If a SSN identifies a customer's (fname, lname), it is not the case that unique (fname, lname) identifies the same SSN. Naturally, other examples are possible.

There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. Given a requirement that a large number of the employees (> 80%) are assigned a phone number, which is the preferred method and why?

If a large percentage of EMPLOYEE tuples have a single office number, the use a nullable attribute would seem appropriate. This is because the nullable attribute is more efficient (No Join Needed) and the large percentage of employees with numbers indicates that the nullable attribute will not be wasteful of table space.

What is an update anomaly of 1NF (using the surgery example)?

If a patient changes their address, multiple address entries have to be changed.

There are two alternative methods of representing an employee's optional (single) phone number. The first is to use a nullable attribute and the second is to use a weak entity. Given a requirement that only a small number of employees (< 10%) are assigned phone numbers, which is the preferred method and why?

If a small percentage of EMPLOYEE tuples will have a single office number, the use of a nullable attribute would waste table space and the weak entity is the best choice. However, the additional processing of performing a Join with the weak relation may still make the nullable attribute more attractive.

How would this relation suffer from Update Anomalies? PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE) Note the relation's primary key is underlined.

If the customer were to change their address (city or zip-code), we would need to update all of the customer's purchases based on the customer_id attribute.

data normalization

Improving the logical design to create well-structured relations to: 1. Avoid duplication and conserve storage 2. Satisfy certain referential integrity constraints (i.e. a particular row in one table can be related to at most one row in a related table) 3. Facilitate data maintenance (insert, update, and delete) 4. Provide a better design that enables future growth

FDs cannot be ___, but must be ____

Inferred; defined explicitly

Positive

Is the slope of y = 2x + 3 negative or positive?

A relation schema R is in 3NF if every nonprime attribute of R meets both of the following conditions:

It is fully functionally dependent on every key of R. It is non transitively dependent on every key of R.

Finding Keys

Keep adding attributes to the set until the set can be closed. Smallest subset of the attributes that allows for all of the other attributes to be determined.

I: listening to a radio D: test performance

Listening to a radio broadcast of a sports event while studying for a test decreases performances on the test

Why would R.A multidetermins attribute R.B?

MIf the set of B values matching a given pair in R depends only on the A value and is independent of C value For example -- president --> VP President -->Loser Both can have different numbers and are independent of each other, but to be in 4nf they need to be separated into tables

Why do we use indexes?

Make data access orders of magnitude faster

If A not--> B and B not-->A

Many to many attribute relationship

I: hours of sleep D: time it takes for students to get to class

More sleeps help students move to class more quickly

What is 3NF?

Must be in 2NF and no prime attribute of R is transitively dependent on primary key

1NF

NF: all domain values in R are atomic

Given an r(R), can we conclude an FD holds?

No, we can only say if an FD does not hold.

Partial functional dependency

Non-key Attribute is functionally dependent on part (but not all) of the primary key

What is an overview of 2NF?

Non-key attributes must be dependent on the full key (any candidate key).

I: crowd (number of people around) D: shyness

People will become shy if they are in a crowd

What are the advantages and disadvantages of tree-based indexes

Properties: Worst case is O(log n) for search, update, and delete

Normalization attempts to get rid of what anomalies?

Redundancy anomalies (potential inconsistency) -update anomalies -insertion anomalies -deletion anomalies

Armstrong's Axioms

Reflexivity: If B is subset of A then A --> B Augmentation: If A --> B then AC--> BC Transitivity: If A --> B and B --> C then A --> C

Informally describe the conditions to be met for a relation to be in First Normal Form.

Relations should have not multivalued attributes or nested relations. Alternatively, attribute are permitted to maintain only single atomic (indivisible) values.

I: smoking cigarettes D: rate of lung cancer

Smoking cigarettes while driving a car increases lung cancer

What is the process of Normalisation?

Starting from a universal all-listing relation, progressively remove redundant data from the table. In a relational model, methods exist for quantifying how efficient a database is, called normal forms (NF). There are algorithms for converting a database from one NF to another.

What is the clustering approach to indexing?

Table is rearranged by order of groupings of data -- ie Employee ids grouped by department

Describe why the following relation PURCHASE is not in third normal form. PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE) Note the relation's primary key is underlined.

The attributes CUSTOMER_CITY and CUSTOMER_ZIPCODE are functionally dependent on CUSTOMER_ID which is not a primary key.

What is relational database design?

The grouping of attributes to form good relation schemas

Informally describe the conditions to be met for a relation to be in Second Normal Form. CUSTOMER_PURCHASE (++CUSTOMER_ID, PRODUCT_ID++, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME) Note the relation's primary key is +

The informal description of 2NF is: For relations where the primary key is a compound key (i.e. keys defined from multiple attributes), all nonprime attribute (attributes not part of the primary key) must be functionally dependent on all of the key's attributes.

Informally describe the conditions to be met for a relation to be in Third Normal Form. PURCHASE (PURCHASE_ID, PURCHASE_DATE, PURCHASE_AMT, CUSTOMER_ID, CUSTOMER_CITY, CUSTOMER_ZIPCODE) Note the relation's primary key is underlined.

The informal description of 3NF is: A relation should not have a non-key attribute functionally dependent on another non-key attribute. That is, there should be no transitive dependencies of a non-key attribute on the primary key.

Normalization

The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations

Denormalization

The process of storing the join of higher normal form relations as a base relation—which is in a lower normal form

How do we decompose this relation to solve this problem?

The solution is to decompose the combined relation into two relations (CUSTOMER & DEPARTMENT) and use a FK to maintain the 1-M relationship or a join-table if it happens to be an M-N relationship.

Describe the suggested method of eliminating the use of a multi-valued attribute to represent the employee's three favorite colors. Ename|Ssn|Bdate|Color_1|Color_2|Color_3

This is another example of the use of a weak relation to implement a 1-M relationship between Employee and zero or more color choices. Note for the exam you may be asked to describe the new weak relation's table.

Describe why the following CUSTOMER_PURCHASE relation is not in second normal form. CUSTOMER_PURCHASE (CUSTOMER_ID, PRODUCT_ID, PURCHASE_DATE, PURCHASE_AMT, PRODUCT_DESCRIPTION, CUSTOMER_NAME) Note the relation's primary key is underlined

This relation mixes attributes from a customer's purchase (CUSTOMER_ID, PRODUCT_ID) → (PURCHASE_DATE, & PURCHASE_AMT) with attributes that are dependent on only the Product ID → PRODUCT_DESCRIPTION and Customer ID → CUSTOMER_NAME.

In 2NF why and how to eliminate transitive dependence?

To solve the update, insert and delete anomalies by decomposing into tables

I: stress level D: rate of stomach aches

Too much stress causes stomach aches

Define insertion anomalies

Tried to insert data in a record that does not exist.

Trivial vs non-trivial dependencies

Trivial: right-side is subset of left-side X,Y --> X Nontrivial: everything else

Define spurious tuples

Tuples that are not in the original relation, but are produced by a subsequent join.

I: UV light D: mold growth

UV light decreases the growth of mold

Guidelines for decomposition

Use independent projection(s) ex. projections R1 & R2 of relation R are independent iff: 1. every FD in R can be logically deduced from FDs in R1 and R2 2. the common attributes of R1 & R2 form a candidate key for at least one of the pairs

Full Functional Dependency

Value of attribute is functionally dependent on the (entire) primary key

dependent variable

Value that depends on the value of the independent variable

independent variable

Value that does not depend on another

the slope

What does this formula find?

-1/2

What is the slope of y = -1/2x + 2?

0

What is the slope of y = 3?

(0,-4)

What is the y-intercept if y = -2x - 4?

5

What is the y-intercept of y = 1/2x + 5?

If a relation is in 2NF is it also in 1NF by default?

Yes

Key

a minimal superkey

Partial Dependencies

a partial dependency only relies on part of the key Ex: Key AC A->B is a partial dependency

Second Normal Form

a relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on a candidate key there are no partial dependencies By definition, any relation with a single key attribute is in 2NF. Relation is in 2NF if no non-prime attribute is partially dependent on the primary key.

Candidate Keys

a relation may have more than one Key

Prime Attributes

an attribute that belongs to some candidate key

trivial functional dependencies

augmented functional dependency, equivalent functional dependency

Guideline 3

avoid putting attributes in a tuple that will frequently take null values for one reason or another

Atomic relation

can't be decomposed into independent relations

If you know functional dependencies, you can ___ your keys

derive

Guideline 1

design relation schema so that it is easy to explain its meaning, do not combine attributes from multiple entity types and relationship types into a single relation

Guideline 2

design the base relation schemas so that no update anomalies are present in the relations and no redundant information exists in tuples

equivalent functional dependency

determinants and non-key attributes are interchangeable

full dependency

determinants should have the minimal number of attributes to maintain the functional dependency with all non-key attributes (i.e. B is functionally dependent on A but not on a subset of A

Second normal form

every non primary attribute should be fully functionally dependent on the primary key

1

f(x) = (1/10)x+3 when x = -20

0

find f(-2) = 5x + 10

12

find f(-9) - g(8) when f(x) = -2x -5 and g(x) = (1/2)x -3

13

find f(6) + g(14) when f(x) = (1/2)x + 5 and g(x) = (1/14)x + 4

-31

find f(x) when f(x) =3x - 7 when x = -8

17

find x for f(x) = -x+10 when f(x) = -7

-1

find x for f(x) = 4x + 2 when f(x) = -2

-10

find x for f(x) =-4x - 8 when f(x) = -48

-5

find x for f(x) =-6x -5 when f(x) = 25

Lossless Property

guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

functional dependence

h is functionally dependent on g if, h= f°g

Decomposition

if A--> BC, then A-->B and A-->C

Transitive Dependencies

if X-> Z and Z->Y then we have transitive dependencies

A relation schema R is in second normal form (2NF)

if every non-prime attribute A in R is fully functionally dependent on every key of R

Operations that may result in anomalies

insert operation, delete operation, update operation

Problems with Transitive Dependencies

insertion, deletion and modification anomalies

First normal form

it is considered to be part of the definition of relation Disallows: composite attributes multivalued attributes nested relations; attributes whose values for an individual tuple are non-atomic

independent

line intersects in one point

inconsistent

lines are parallel (no points intersect)

dependent

lines are same (all points intersect)

sound proof

mean that no fd inferred by using rules reflexive, augmentation, and transitivity will violate the original set of fds.

Can you tell a functional dependency by looking at the data?

no, you cannot tell if one attribute is dependent on another by looking at the data

augmented functional dependency

non-key attributes are functionally dependent on a subset of its determinant

partial dependency

non-key attributes are functionally dependent on a subset of the primary key

Are functional dependencies symmetric?

nope! they go in a single direction!

Will there always be a dependency preserving decomposition into BCNF?

not always!

Semantics

pertaining to a relation, refers to its meaning resulting from the interpretation of attribute values in a tuple

Update Anomalies

problems associated with storing natural joins of base relations

The semantic of a relation

refer to its meaning resulting from the interpretation of attribute values in a tuple

Tuples

rows

Relations

tables

transitive dependencies

the primary key is a determinant for another attribute which is a determinant for a third attribute

Spurious Tuples

tuples that represent information that is not valid

functional dependency

value of the determinant decides the value of another attribute

A functional dependency has nothing to do with ____

what the table will be populated with

Union

x -> y && x->z | x->yz

(0,0)

y = 2x has a y-intercept of...?

point-slope

y-y₁=a(x-x₁)

slope-intercept

y=ax+b, a≠0

What is an overview of 1NF?

All attributes are atomic (cannot be broken down into simpler values) and there is a key.

Reflexive

proof - Armstrongs axiom if y is a subset of x then x -> y

complete proof

we mean that by the exhaustive application of rules 1 - 3 to a set of dependencies, f, we will infer all possible dependencies that can be inferred from f

Pseudo Transitivity

x -> y, wy -> z |= wx -> z

Decomposition

x -> yz | x -> y && x-> z


संबंधित स्टडी सेट्स

Scripps Spelling Bee 8th grade 2016-2017

View Set

Kidney and Excretory System Midterm Review

View Set

Chapter 26: Assessing Male Genitalia and Rectum

View Set

Chapter 14: Altruism and Cooperation

View Set

Civil Rights & Nixon Test Review

View Set

A&P Final Exam Multiple Choice (Exam 1)

View Set