EXAM 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

Resolutions Condensed

A relation is in 1NF if there are no multivalued attributes. • A relation is in 2NF if it is in 1NF and there are no partial dependencies - that is, it satisfies at least one of the following criteria: ▫ The primary key consists of only one attribute ▫ There are no non-key attributes ▫ No non-key attribute is functionally dependent on part of the primary key (partial dependency) • A relation is in 3NF if it is in 2NF and there are no transitive dependencies- that is, no relationships between non-key attributes

Augmentation (works two ways)

1. We can add anything to the determinant and it will still determine the dependent ▫ If X --> Y, then: {X,Z} --> Y {X, A, B, C, D, E, F, G} --> Y ▫ Remember our discussion of super keys? 2. We can add the same attribute to both the determinant and the dependent and it will still determine the dependent ▫ If X --> Y, then: {X,A} --> {Y,A} ▫ Basically the reflexivity axiom • If EmpID --> Lname, then: ▫ {EmpID, DateOfBirth} --> Lname ▫ {EmpID, HairColor} --> {Lname, HairColor}

Functional Dependency

A core part of database design ▫ Section# determines Course# ▫ EmpID determines Name ▫ Etc... • Undesirable function dependencies are the 'seeds' of data redundancy leading to modification anomalies. -essentially technical translations of user-specified business rules expressed as constraints in a relation schema ▫ We cannot just ignore them when undesirable - we must still accomodate. -the building block of "normalization" principles

The prime directive for FD based Normalization

A relation schema R is fully normalized with regard to functional dependencies if for every non-trivial FD in R, the determinant is a superkey of R. ▫ Remember: A FD in R is trivial if and only if the dependent is a subset of the determinant. • The Goal: ▫ Decompose R such that in the decomposition D[R1, R2,..Rn] for every FD in D the determinant is a superkey of the respective relation schema in D.

Cartesian Product

Also known as "Cross-Product" or just "Product" • All combinations of tuples from two relations ▫ Syntax: R1 X R2 • Will result in nR1 x nR2 tuples The Cartesian product is of little value by itself • When combined with the select operator it is equivalent to a join operation, which is much more useful C X D is a relation that is the result of a Cartesian product operation

Prime Versus Non-prime Attributes

Any attribute, atomic or composite, in a relation schema R that is a proper subset of the primary key of R is called a prime attribute. • An attribute of R that is not a proper subset of the primary key is non-prime attribute except when it is a primary or candidate key of R. • Any candidate key of R not chosen as the primary key is referred to as an alternate key of R and like the primary key, is neither a prime nor a non-prime attribute of R.

Natural Join

Because the result of an Equijoin results in pairs of attributes with identical values in all the tuples, a new relational algebra operation called a Natural Join, denoted by *, was created to omit the second (and superfluous) attribute in an EQUIJOIN condition. The Natural join gets rid of the repeated attribute. Weakness of the natural join is that the attributes used for the join must have the same name in both relations!

Inner Join

Can be used for recursive (unary) relationships ▫ It is necessary to use table aliases if using the same table twice (or more) in one query Inner join RA syntax is: C ⋈(C.DCODE=D.DCODE) D • However, we also said we can do a Cartesian product and Select only the rows where the joining attribute matches: σ(C.DCODE=D.DCODE) C X D • SELECT * FROM table1, table2 WHERE table1.attribute = table2.attribute • This implies a 1:M relationship ▫ The table with the FK is the "child"

Single Row Character Functions

Concatenation (||) • SUBSTR • LENGTH • LTRIM • RTRIM • LPAD/RPAD • INSTR

A DUAL

DUAL is a table in Oracle (and some other DMBS) that you can query when you are executing queries that do not require a table, but you must specify one for parsing the query: ▫ SELECT 2 + 4 From DUAL ▫ Returns: 6 • Many DMBS do not require the use of dual. For example, is MSSQL, this works fine: ▫ SELECT 2 + 4 ▫ Returns: 6 DUAL has one tuple, with one attribute named "DUMMY" and one value, "X" • SELECT * FROM dual • SELECT dummy FROM dual

Delete

Delete must not violate referential integrity constraints • Since the WHERE clause in a DELETE statement is optional, a DELETE statement of "DELETE FROM <table-name>" can be used to delete all rows in a table. ▫ Will delete ALL the data, VERY quickly - Double check your deletes! • When used in this manner, while the target table has no rows after execution of the deletion, the table still exists and new rows can still be inserted into the table with the INSERT statement. • To erase the table definition from the database, the DROP TABLE statement must be used.

Desirable Versus Undesirable FDs

Desirable FDs in a relation schema R are those where the determinant is a super (candidate) key of R - no exceptions. • Undesirable FDs in a relation schema R are those where the determinant is not a super (candidate) key of R. ▫ That is, the FDs will cause data redundancy and the consequent modification anomalies in R.

Full Outer Join

Full Outer Join keeps all tuples in both the left and right relations when no matching tuples are found, padding them with null values as needed.

Dates and Times

Least standardized data type across platforms • Oracle 10g displays date values in a default DD-MON-YY format that represents a two-digit day, a three-letter month abbreviation, and a two-digit year (e.g., the date April 22, 2007 would be represented as 22-APR-07). • Although referenced as a non-numeric field, a date is actually stored internally in a numeric format. • Uses the Julian calendar (e.g., 2454213 = April 22, 2007)

Data Redundancy

Repeated appearance of same data value for an attribute does not automatically mean data redundancy. • Superfluous repetition that does not add new meaning constitutes data redundancy. • Error in what attributes are assigned to what entity type(s) leads to data redundancy. • leads to modification anomalies!

Right Outer Join

Right Outer Join keeps every tuple in the second or right relation S in the result R ⟖ S. ▫ If no matching tuple is found in R, then the attributes of R in the join result are filled or "padded" with null values.

SELECT FROM WHERE

SELECT <column list> FROM <table list> WHERE <condition> Optional • <column list> is a list of column names (attributes) whose values are to be projected • <table list> is a list of the table names (relations) required to process the query • <condition> is a conditional (Boolean) expression that identifies the rows to be retrieved by the query.

LPAD and RPAD examples

SELECT LPAD('Page 1', 14, '*') AS "LPAD Example" FROM DUAL; ▫ Returns: ********Page 1 • SELECT RPAD('Page 1', 14, '*.') AS "RPAD Example" FROM DUAL; ▫ Returns: Page 1*.*.*.*.

The SUBSTR Function

SUBSTR (char, m [,n]) returns a portion of char, beginning at character m, n characters long (if n is omitted, to the end of char). The first position of char is 1. • Example: ▫ SELECT SUBSTR('ABCDEFG',3,4) AS "Substring" FROM DUAL; ▫ Returns: CDEF • The SUBSTR function is often used in conjunction with the concatenation operator (||).

Select Operator

Selects a horizontal subset of tuples that satisfy a selection condition from the relation σ<section condition>R • Lower case sigma (σ) designates "select" • <selection condition> is a Boolean expression specified on the attributes of relation R R is generally a relational algebra expression whose result is a relation - the simplest form of R is a single relation ▫ This means relational algebra operations can be "nested" • The Boolean expression is of the form ▫ <attribute name> <comparison operator> <constant value> i.e., Lastname='Smith' or Salary > 30000 or ▫ <attribute name> <comparison operator> <attribute name> i.e., CreditsNeeded >= CreditsCompleted

Armstrong's Axioms

Three primary Axioms ▫ Reflexivity ▫ Augmentation ▫ Transitivity

Modification Anomalies

Three types of anomalies: ▫ Insert Anomalies ▫ Delete Anomalies ▫ Update Anomalies

Reflexivity

Trivial Dependencies ▫ Impossible not to satisfy and don't tell us anything new • If Y is a subset of X, then X --> Y ▫ If X:{A,B,C,D} and Y:{A,C} then {A,B,C,D} --> {A,C} • {SSN, FName, Lname} --> {SSN} • {SSN, FName, Lname} --> {Fname, Lname} • This also means attributes functionally determine themselves ▫ X --> X ▫ Y --> Y

Set Operations

Two relations (R and S) are "union compatible" if they: ▫ Have the same degree (number of attributes) ▫ Pairs of attributes from R and S have the same domain • Union compatibility is a requirement for all set operations (union, intersection, and difference) Union= either Intersection= both Difference= one or the other

TO_CHAR

Using TO_CHAR masks • To get the current month: ▫ select to_char(current_date, 'MON') from dual; ▫ Returns: APR • To format a number as money: ▫ SELECT to_char('50000','$99,999') from dual; ▫ Returns: $50,000 ▫ SELECT to_char('8000','$99,999') from dual; ▫ Returns: $8,000 • To convert a string to a date: ▫ SELECT to_date('22-MAR-2018') from dual;

Table Aliases

We can follow a table name with an "alias" to make it easier to refer to it in other parts of the query ▫ SELECT * FROM course C, department D WHERE C.dcode = D.dcode; exactly the same as doing: ▫ SELECT * FROM course, department WHERE course.dcode = department.dcode; • This is necessary when a table is used multiple times in one query

Boyce-Codd Normal Form (BCNF)

While 3NF removes most data redundancy issues, there is still potential for problems, particularly when: ▫ A relation schema has at least two candidate keys, ▫ Any two candidate keys are composite attributes, and ▫ There is an attribute overlap between the two candidate keys • BCNF to the rescue! ▫ Kind of... there is still a trade off between 3NF and BCNF • A relation schema R is in BCNF if for every non-trivial FD in R, the determinant is a superkey of R ▫ By this definition of BCNF, violation of 2NF or 3NF also implies violation of BCNF

The Decomposition Approach

fd1: B {G, H} fd2: A B fd3: C D • Step 1: Set superkey (K) of URS {A, B, C, D, E, G, H} ▫ K = {A, B, C, D, E, G, H} ▫ It should be obvious that K {A, B, C, D, E, G, H} at this point due to reflexivity (trivial dependency) • Step 2: Remove an attribute (H) from K, call it K'. ▫ Does (K' H) persist in F+, where K' = {A, B, C, D, E, G}? ▫ Answer: Yes, K' H since B H ▫ K' becomes K = {A, B, C, D, E, G}, and K {A, B, C, D, E, G, H} • Step 3. Remove another attribute (G) from K, call it K' ▫ Does (K' G) persist in F+, where K' = {A, B, C, D, E}? ▫ Answer: Yes, K' G since B G ▫ K' becomes K = {A, B, C, D, E}, and K {A, B, C, D, E, G, H} • Step 4. Remove another attribute (E) from K, call it K' ▫ Does (K' E) persist in F+, where K' = {A, B, C, D}? ▫ Answer: NO! We have no way of determining E based on {A,B, C, D}: {A,B, C, D} ⇸ E ▫ K remains as it was, K={A, B, C, D, E} , and K {A, B, C, D, E, G, H} Step 5. Remove another attribute (D) from K, call it K' ▫ Does (K' D) persist in F+, where K' = {A, B, C, E}? ▫ Answer: Yes, K' D since C D ▫ K' becomes K = {A, B, C, E}, and K {A, B, C, D, E, G, H} • Step 6: Remove another attribute (C) from K, call it K'. ▫ Does (K' C) persist in F+, where K' = {A, B, E}? ▫ Answer: NO! We have no way of determining C based on {A, B, E}: {A, B, E} ⇸ C ▫ K remains as it was, K={A,B,C, E} , and K {A, B, C, D, E, G, H} • Step 7. Remove another attribute (B) from K, call it K' ▫ Does (K' B) persist in F+, where K' = {A, C, E}? ▫ Answer: Yes, K' B since A B ▫ K' becomes K = {A, C, E}, and K {A, B, C, D, E, G, H} Step 8. Remove another attribute (A) from K, call it K' ▫ Does (K' A) persist in F+, where K' = {C, E}? ▫ Answer: NO! We have no way of determining A based on {C, E}: {C, E} ⇸ A ▫ K remains as it was, K={A, C, E} , and K {A, B, C, D, E, G, H} • At this point, we have determined that K = {A, C, E} is a superkey that cannot be further reduced and thus becomes a candidate key of the URS {A, C, E} {A, B, C, D, E, G, H}

HAVING condition

filters the groups subject to some condition ▫ HAVING is very similar to WHERE, but is used on the results of aggregate functions

Divide Operator

is useful when there is a need to identify tuples in one relation that match all tuples in another relation. • T(Y) <---------- R(X, Y) ÷ S(X) with R and S being division compatible • Note that the Cartesian product of T and S is contained in the relation instance of R.

The problem

is when there are FDs in which the determinant in that FD is not a Super key [candidate key] of R - this is an "undesirable" FD

ORDER BY column name(s)

specifies the order of the output.

COUNT FUNCTION

• Aggregate functions (like count) ignore NULL values NO GROUP NULL VALUES

Key Versus Non-Key Attributes

• An attribute, atomic or composite, in a relation schema, R, is called a key attribute if it is a proper subset of any candidate key of R. • Any attribute, atomic or composite, that is not a proper subset of any candidate key is a non-key attribute • A candidate key is neither a key nor a non-key attribute • Based on the above discussion, we have an alternative definition for a candidate key from this point forward: ▫ A candidate key of a relation schema R fully functionally determines all attributes of R.

Decomposition

• Basically the opposite of Union - if a determinant determines two dependents together, it also determines them individually • If X --> {Y, Z} then... ▫ X --> Y and X --> Z • If EmpID --> {Fname, Lname}, then... ▫ EmpID --> Fname and EmpID --> Lname

Understanding functional dependencies

▫ What attributes should be grouped together in relations ▫ What values are valid candidate keys ▫ Where data redundancy problems exist

GROUP BY

▫ groups of rows with the same value ▫ Typically used with aggregate functions

Four inference rules

▫ Union ▫ Decomposition ▫ Composition ▫ Pseudotransitivity

Undesirable Functional Dependencies

▫ The determinant is not a candidate key

Second Normal Form (2NF)

A relation schema R is in 2NF if every non-prime attribute in R is fully functionally dependent on the primary key of R. • This means a non-prime attribute is not functionally dependent on a proper subset of the primary key of R. ▫ Partial dependency • The Second Normal Form (2NF) is based on a concept known as full functional dependency. • A functional dependency of the form Z A is a 'full functional dependency' if and only if no proper subset of Z functionally determines A. • In other words, if Z A and X A, and X is a proper subset of Z, then Z does not fully functionally determine A, i.e., Z A is not a full functional dependency; it is a partial dependency.

Normal Forms: An Overview

A relation schema is said to be in a particular normal form if it satisfies certain prescribed criteria for that normal form. • First normal form (1NF) reflects the properties of a relation schema - i.e., by definition a relation schema is in 1NF. • Normal forms associated with functional dependencies are second (2NF), third (3NF), and Boyce-Codd (BCNF) normal forms. The violations of each of these normal forms signal the presence of a specific type of 'undesirable' FD. ▫ violation of a normal form, can be interpreted as equivalent to an inadvertent mixing up of entity types belonging to two different entity classes in a single entity type.

Functional Dependencies

Abbreviated as FD • FDs specify a relationship between attributes in a relation schema • An attribute A (atomic or composite) in a relation schema R functionally determines another attribute B (atomic or composite) in R if for a given value of A there is a single, specific value of B ▫ Said another way: B is functionally dependent on A • Expressed as A --> B ▫ A is called the determinant ▫ B is called the dependent

The Concatenation Operator

Allows you to join multiple strings together using two vertical bars (AKA pipes) ▫ Can be values from text attributes or a literal string

The use of concatenation

Display the name and phone number of all professors with phone formatted as (xxx)xxx-xxxx for phone numbers that end with two digits ranging between 45 and 65. • SELECT professor.name, '(' || SUBSTR(professor.phone,1,3) || ') ' || SUBSTR(professor.phone,4,3) || '-' || SUBSTR(professor.phone,7,4) AS "Phone No" FROM professor WHERE SUBSTR(professor.phone,9,2) BETWEEN '45' AND '65'; • If you're feeling clever, you can test this out (except for the BETWEEN part) using a subquery and DUAL - follow what's happening here? ▫ SELECT '(' || SUBSTR(phone,1,3) || ') ' || SUBSTR(phone,4,3) || '-' || SUBSTR(phone,7,4) AS "Phone No" FROM (SELECT '7135551234' AS phone FROM dual);

The INSTR function examples

Example: SELECT INSTR('MISSISSIPPI','S',5,2) AS "In String Example" FROM DUAL; • Returns: 7 • Example: SELECT INSTR('MISSISSIPPI','S',5,1) AS "In String Example" FROM DUAL; • Returns: 6 INSTR Function Examples 12345678901 12345678901

First Normal Form (1NF)

First normal form (1NF) imposes conditions so that a base relation that is physically stored as a file does not contain records with a variable number of fields. ▫ This is accomplished by prohibiting multi-valued and composite attributes in a relation schema. • Such a constraint, in effect, prevents relations from containing other relations. • In essence, 1NF, by definition, requires that the domain of an attribute must include only atomic values and that the value of an attribute in a relation's tuple must be a single value from the domain of that attribute.

Synthesis Approach

Given a relation schema, R, and a set of FDs, F, that holds for R, find a subset, Z, of attributes of R such that Z+ (closure [Z | F]) includes all attributes of R ▫ Basically, find a FD where the dependent includes ALL attributes of R • Given R (A, B, C, D, E, G, H) and F [fd1, fd2, fd3] where FD1: B {G, H} FD2: A B FD3: C D Given R (A, B, C, D, E, G, H) and F [fd1, fd2, fd3] where FD1: B {G, H} FD2: A B FD3: C D ▫ A+ = ▫ B+ = ▫ C+ = ▫ {A,C} + = • Can all attributes of R be determined by {A,C}? ▫ No, we are missing E ▫ {A,C} is not a candidate key! • {A,C,E} {A,B,C,D,E,G,H} Therefore, {A,C,E} is a candidate key ▫ Augmentation allows us to add E to both sides! Synthesis Approach {A,B,G,H} {B,G,H} {C,D} {A,B,C,D,G,H}

Decomposition Approach

Given the universal relation schema R {A1, A2, A3, . . . , An} ▫ Step 1: Set superkey, K of R = {A1, A2, A3, . . . , An} ▫ Step 2: Remove an attribute Ai , (i = 1, 2, 3, . . . . ., n) from R such that {K - Ai } is still a superkey, K′, of R Note: In order for K′ to be a superkey of R, the FD: (K′ Ai ) should persist in F+ ▫ Step 3: Repeat step 2 above recursively until K′ is further irreducible • The irreducible K′ is a candidate key of R under the set of FDs, F. Given R (A, B, C, D, E, G, H) and F [fd1, fd2, fd3] where ▫ fd1: B {G, H} ▫ fd2: A B ▫ fd3: C D

How to be more efficient with selection

IN and BETWEEN can be used as comparison operators. • IN is evaluated in the context of being equal to any member of a set of values • A nested query can follow the IN/NOT IN operator • Between is evaluated as a pair of ≥ and ≤ operators

Equijoin

Involves join conditions with equality comparisons only. The result of an Equijoin includes all attributes from both relations participating in the join operation. This implies duplication of the joining attributes in the result. Much more useful than the Cartesian product, but the attribute used to joint the relations is repeated

Null and not null

NULL is the absence of data, so nothing is EVER "equal" to NULL - therefore we have to use the "IS" operator SELECT * FROM section WHERE room IS NULL SELECT * FROM section WHERE room IS NOT NULL

Normalization

Normalization is a technique that facilitates systematic validation of the participation of attributes in a relation schema from a perspective of data redundancy. • Normal Forms (NFs) provide a stepwise progression towards attaining a design that is guaranteed to be free of data redundancies that cause modification anomalies from a functional dependency perspective.

Theta Join

Occurs infrequently in practical applications, Theta Joins do not involve equality but inequality conditions for the join condition involving attributes that share the same domain.

A simple algorithm for Normalization

Pull out the undesirable FD(s) from the target relation schema R as separate relation schema(s) [R1, R2, etc.] • Retain the determinant of the pulled-out relation schema (say, R1) as an attribute(s) in the leftover target relation schema, R0, to facilitate reconstruction of the original target relation schema.

Project Operator

Selects a vertical subset of attributes from a relation π<attribute list>R • Lower case pi (π) designates "project" • <attribute list> is a subset of attributes of relation R As before, R is generally a relational algebra expression, the simplest form being a single relation ▫ i.e., relational algebra operations can be "nested" • The result of the projection operation contains the attributes specific in <attribute list> in the same order they are specified in the list • If the attribute list in a projection is not a superkey (i.e. is not unique), duplicate tuples will NOT be displayed ▫ Relations do not allow duplicates

Attribute Closure

Similar to closure of a set of FDs for a relation, but at lower level • A set of all attributes functionally determined by a determinant • If we are looking for closure for a set of attributes, Z, from the set of FDs, F, from relation R: ▫ Expressed as Z+ or Closure [Z | F] ▫ "Z under F" Given R(A,B,C,D,E,G,H), where F: ▫ FD1: B {G,H} ▫ FD2: A B ▫ FD3: C D • What is A+ (closure [A | F])? • By applying Armstrong's Axioms, we can infer: ▫ Transitivity of FD1 and FD2: A {G,H} ▫ Union of FD2 and FD1: A {B,G,H} ▫ Remember to include reflexivity: A {A,B,G,H} ▫ We can go no further, so A+ = {A,B,G,H} • {A,C} + = {A,B,C,D,G,H}

TO_CHAR and TO_DATE functions

TO_CHAR function is used to extract the different parts of a date/time and convert them to a character string. • TO_DATE function is used to convert character strings to a valid date format. • Both functions use a format mask • SYSDATE and CURRENT_DATE both return the current date and time ▫ SELECT SYSDATE FROM dual; ▫ SELECT CURRENT_DATE FROM dual;

ALL and ANY

The ALL and ANY operators can be combined with other comparison operators to treat the results of a subquery as a set of values, rather than as individual values. ▫ Operator Description ▫ > ALL Greater than the highest value returned by the subquery ▫ < ALL Less than the lowest value returned by the subquery ▫ < ANY Less than the highest value returned by the subquery ▫ > ANY Greater than the lowest value returned by the subquery ▫ = ANY Equal to any value returned by the subquery (equivalent to the IN operator)

The INSTR function

The INSTR function is used to return the numeric value of the location of a character string within a character column or character literal • Syntax: INSTR (char1, char2 [,n[,m]]) • Its purpose is to locate the position of the mth occurrence of char2 in char1, beginning the search at position n. ▫ If m is omitted, 1 is assumed. ▫ If n is omitted, 1 is assumed. ▫ The position is given relative to the first character of char1, even when n > 1.

The length function

The LENGTH (char) function returns the length of the character string char. • Example: SELECT LENGTH('Jones, John') FROM DUAL; Returns: 11 • Note: attributes of datatype char (as opposed to varchar) return a length that includes all trailing blank spaces

The LPAD and RPAD Functions

The LPAD and RPAD functions allow you to "pad" the left (and right) side of a column or character string with a set of characters. • Syntax: LPAD/RPAD (string, length, [,'set']) ▫ string is the name of the character column (or a literal string), ▫ length is the total number of characters long that the result should be (i.e., its width), and ▫ set is the set of characters that do the padding

The LTRIM function

The LTRIM (char [, set]) function removes unwanted characters from the left of char, with initial characters removed up to the first character not in set. • If no set of characters is specified, set defaults to ' ' (a blank space) and the function trims off leading blank spaces. • Example: ▫ SELECT LTRIM('xxxXxxLAST WORD', 'x') AS "Left Trim Example" FROM DUAL; ▫ Returns: XxxLASTWORD • Note: LTRIM is case-sensitive

The RTRIM function

The RTRIM (char [, set]) function returns char, with final characters removed after the last character not in set. • If no set of characters is specified, set defaults to ' ' (a blank space) and the function trims off trailing blanks. • The RTRIM function operates on the rightmost characters in a string in the same way that the LTRIM function operates on the leftmost characters in a string. • Example: ▫ SELECT RTRIM('STINSONxxXxx','x') AS "Right Trim Example" FROM DUAL; ▫ Returns: STINSONxxX ▫ SELECT RTRIM('Houston ') AS "Right Trim Example" FROM DUAL; ▫ Returns: Houston • Note: char (as opposed to varchar) deliver different results since char data type contains embedded trailing blanks.

Join vs. Cartesian Product

The primary difference is with join, only combinations of tuples satisfying the join condition will appear in the result, whereas in the Cartesian product, all combinations are in the result

Resolution of 2NF Violation

The resolution of 2NF violation is a two-step process that decomposes the target relation schema with the undesirable FDs into multiple relation schemas such that the undesirable FDs are rendered desirable. ▫ Pull out the undesirable FD(s) from the target relation schema as separate relation schema(s). ▫ Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema.

Resolution of 3NF Violation

The resolution of a 3NF violation is accomplished by applying the same two-step process used earlier to resolve the 2NF violation. The two-step process is: ▫ Pull out the undesirable FD(s) from the target relation schema as separate relation schema(s). ▫ Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema.

Resolution of a BCNF Violation

The resolution of a BCNF violation is accomplished by applying the same two-step process used earlier to resolve the 2NF & 3NF violations: ▫ Pull out the undesirable FD(s) from the target relation schema as separate relation schema(s). ▫ Retain the determinant of the pulled-out relation schema as an attribute(s) in the leftover target relation schema to facilitate reconstruction of the original target relation schema.

Uncorrelated Subqueries

The subquery is executed first and passes one or more values to the outer query ▫ Three operators may be used: IN, ANY, ALL ▫ May be negated with the NOT operator • The IN operator evaluates if rows processed by the outer query are equal to any of the values returned by the subquery (i.e., it creates an OR condition).

Correlated Subqueries

The subquery is executed once for every row in the outer query • A correlated subquery can be used if it is necessary to check if a nested subquery returns no rows, using the EXISTS operator which returns the value of true if a set is non-empty. • In a correlated subquery, the subquery is executed once for each row in the outer query. In addition, the execution of the subquery stops and the EXISTS condition of the main query is declared true for a given row should the condition in the subquery be true.

Choosing the Primary Key from the CKs

While we have noted that the choice of primary key from among the candidate keys is essentially arbitrary, some rules of thumb are often helpful in this regard: ▫ A candidate key with the least number of attributes may be a good choice. ▫ A candidate key whose attributes are numeric and/or of small sizes may be easy to work with from a developer's perspective. ▫ A candidate key that is a determinant in a functional dependency in F rather than F+ may be a good choice because it is probably semantically obvious from the user's perspective. ▫ Surrogate (i.e., artificial) key should only be used as a last resort.

Pseudotransitivity

• Basically the transitivity rule applied to individual attributes within a set • If X --> Y and {Y, W} --> Z then... ▫ {X, W} --> Z • If Dept --> College and {College, YearsWorked} --> Salary, then: ▫ {Dept, YearsWorked} --> Salary • If MIS--> Bauer and {Bauer, 10} --> 90000, then: ▫ {MIS, 10} --> 90000 • In words: If we know people that work for the MIS department are in Bauer college, and we know that people who have worked for Bauer college for 10 years make $90,000, then we also know that people who have worked for MIS for 10 years make $90,000

Derivation of Other Candidate Keys of [R | F]

• If Fc contains an FD (FDx) where a candidate key of R is a dependent, then the determinant of FDx is also a candidate key of R. • When a candidate key of R is a composite attribute, for each key attribute (atomic or composite), evaluate if the key attribute is a dependent in an FD (FDy) in Fc. ▫ If so, then the determinant of FDy can, via pseudotransitivity, replace the key attribute under consideration, thus yielding additional candidate key(s) of R. • Repetition of the above two steps for every candidate key of R will systematically reveal all the other candidate key(s), if any, of R.

Transitivity

• If X --> Y and Y --> Z, then X --> Z • If ZipCode --> City and City --> State, then... ▫ ZipCode --> State • If 77018 --> Houston and Houston --> Texas, then... ▫ 77018 --> Texas • Is the inverse true? Does State --> City, or City --> Zip? ▫ No! (Assuming Houston is a unique city name)

Union

• If a determinant determines two dependents independently, it also determines the union of those dependents • If X --> Y and X --> Z, then... ▫ X --> {Y, Z} • If EmpID --> Fname and EmpID --> Lname, then... ▫ EmpID --> {Fname, Lname}

Outer Join

• In Inner Join operations, tuples without a matching (or related) tuple are eliminated from the Join result. • Tuples with null values in the join attributes are also eliminated. • A set of operations, called Outer Joins, can be used when we want to keep all the tuples in R, or those in S, or those in both relations in the result of the Join, whether or not they have matching tuples in the other relation.

The problem with BCNF

• In many cases BCNF will clear up lingering redundancy from 3NF, however... • If we are forced to choose between BCNF without preserving dependencies and 3NF with preserving dependencies, it is generally preferable to opt for the latter (3NF) ▫ After all, if one can't test for dependency preservation efficiently, one either pays a high penalty in system performance or risks the integrity of the data in the database. • Neither is an attractive alternative, but the limited amount of redundancy allowed under 3NF is regarded as the lesser of the two evils.

Minimal Cover

• Often the semantically obvious FDs (F) have some redundancy or extraneous attributes, as they are derived from the business rules • The minimal cover of F (often called Gc) is a simplified set of FDs that is equivalent to F ▫ Expressed as Gc ≡ F

Aggregate Functions

• One final relational algebra operation: Aggregate Functions • Symbol is ℑ ("Script F") • General form is: <GROUPING ATTRIBUTE(S)> ℑ<Function List> R • Common functions include: ▫ COUNT(), AVG(), MIN(), MAX(), SUM() • Aggregate functions ignore null values! • All attributes you wish to see in the result MUST BE listed as a grouping attribute

Data Redundancy

• Repeated appearance of same data value for an attribute does not automatically mean data redundancy. ▫ Superfluous repetition that does not add new meaning constitutes data redundancy. • Data redundancy leads to modification anomalies! Insert anomalies Delete anomalies Update anomalies • Resolve with normalization

Composition

• Similar to the Union rule • If A --> B and C --> D, then... ▫ {A,C} --> {B,D} • If EmpID --> Lname and Dept --> College then: ▫ {EmpID, Dept} --> {Lname, College}

Comparison Operations

• Simple ones: ▫ >, <, >=, <=, <>, = SELECT * FROM employees WHERE salary > 50000; • The LIKE operator allows for wildcards ▫ % for many characters (* in MS Access) SELECT * FROM employees WHERE lname LIKE 'G%'; ▫ _ for a single character (? In MS Access) SELECT * FROM employees WHERE fname LIKE '_I_A'; • The IN operator is used to identify values in a list or a sub-query ▫ SELECT * FROM employees WHERE lname IN ('Smith', 'Jones', 'Han'); • All can be negated with NOT

Join Operators

• The JOIN operation is used to combine related tuples from two relations into a single tuple when the join condition is satisfied. ▫ Symbol is ⋈, written in your book as [X] • In order to join the two relations R and S, they must be join compatible ▫ The join operation must involve attributes from R and S which share the same domain • General form: R ⋈ <join condition> S • Resulting relation will have: ▫ Number of attributes (degrees) equal to the number of attributes in R plus the number of attributes in S ▫ Number of tuples (cardinality) equal to number of combinations of R and S where the combination satisfies the join condition

F and Closure of F (F+)

• The set of semantically obvious FDs specified on a relation schema R is denoted as F • The set that includes F and all other FDs inferred from F is called the closure of F, denoted as F+ • Having specified F from the semantics of the attributes of a relation schema R, the designer can develop F+ ▫ Armstrong's axioms are useful in deriving F+ • Given a set of semantically obvious FDs, F{fd1, fd2} where ▫ fd1: {Store, Product} Quantity ▫ fd2: Quantity Discount • Using transitivity, one can infer the presence of: ▫ fd3: {Store, Product} Discount in the closure of F (F+) • Note: Since trivial dependencies do not provide any additional information, they are usually excluded from F+.

FDs in the Stock relation

• Typically we think of all the FDs in a relation as a set of dependencies called "F" ▫ F{FD1, FD2, FD3, ..., FDn} • Some FDs are semantically obvious based on the business rules • Other FDs must be inferred • The set of all FDs in F is referred to as F+ ("F Closure")

Pattern Matching

• Use the LIKE operator works in conjunction with the two wildcard characters: ▫ % - Any number of characters ▫ - Exactly one character • Note: In access we use: ▫ * instead of % for any number of characters ▫ ? Instead of _ for exactly one character

Left Outer Join

▫ The Left Outer Join operation keeps every tuple in the first or left relation R in R ⟕ S. ▫ If no matching tuple is found in S, then the attributes of S in the join result are filled or "padded" with null values.


Conjuntos de estudio relacionados

unit 8: Entrepreneurship: Ready for Business

View Set

chapter 44 ~ assessment & management of pts w biliary disorders

View Set

Ch. 3 Network Protocols & Ch. 4 Network Access

View Set

AMERICAN DOWNLOAD C1/C2 (Words 2.95-2.164)

View Set

The Toddler and Family: Growth and Development of the Toddler

View Set

Triangles, Transversal, Triangle Congruence, Triangle Congruence - SSS, SAS, ASA, Triangle Congruence

View Set