CS4400 Quiz 2
INTERSECTION Operation
- operation done on two union compatible relations, R and S - resulting relation *includes all tuples that are in both R and S* - denoted by *R∩S* - commutative and associative
Selected Combinations
- part of a tuple relational calculus expression - *a condition to select the particular combinations of tuples to be returned* (tuples that are true are returned) - if not specified then it returns all the tuples in R - {t | R(t) AND *t.salary>30000*}
Domain Contraint
- schema-based constraint - constraint that specifies that each value within a tuple must be an atomic value from the domain
CARTESIAN PRODUCT (or CROSS JOIN) Operation
- binary set operation done on two relations, *does not have to be union compatible* - produces a new element by combining every tuple from one relation to every tuple from another relation - denoted by *X* - if relation R has n attributes and relation S has m attributes, then the *result Q has n + m attributes* - if relation R has a tuples and S has b tuples, then *result Q has a x b tuples*
Total Number of Tuples in a Relation:
- cartesian product of all the domains - *|dom(A1)| x |dom(A2)| x .... x |dom(An)|*
Relation Schema or Scheme
- denoted by *R(A1, A2,..., An)* - also called *relation intension* - R is the relation name - (A1...An) is the list of attributes - ex: STUDENT(Name: string, SSN: string, Age: int)
Universal Quantifier
- denoted by *∀* - *formula (∀t)(F) is true IFF formula F evaluates true for every tuple in the universe* - t is bound when in an ∀ clause
Aggregate Functions
- operations that cannot be be described by basic relational algebra - specifies *mathematical functions on collections of values from the DB to get statistics and summarize tables* - *SUM, AVERAGE, MAXIMUM, MINIMUM, and COUNT operations* - *<grouping attributes>ƒ<function list>(R)* - grouping attributes tells the query what to group the tuples by when doing the functions - duplicates not eliminated
Referential Integrity Constraint Violation Examples:
1. If you try to insert or update a foreign key that does not exist in the referenced primary keys 2. If you try to delete a tuple that is being referenced by a foreign key
Relational Database State
- *DB = {r1, r2,....,rm}* - a *set DB of relational states* in the relational db schema S - diagrams of this show *a snapshot of a DB state of each table (includes the tuples)*
Relational Database Schema
- *S = {R1, R2, R3, ...., Rm}* - a *set S of relation schemas* and a *set IC of integrity constraints* - diagrams of this *show the table names and the column names* (does not include tuples) - use a *DDL to define this* (most use SQL)
Join Selectivity
- *the expected size of the join result / max size of join* (Ns * Nr) - Ns: number of tuples in S - Nr: number of tuples in R
NATURAL JOIN Operation
- Variation of the EQUIJOIN operation that *gets rid of the attributes that have identical values* (only uses = operator) - denoted with a * - *don't need to specify a join condition* bc it finds the two attributes with the same name in each table and joins based off that - if the attributes don't have the same name, rename one of them first to be the same - ex: PROJ_DEP <-- PROJECT * DEPARTMENT
Attribute
- a *column* in the relation table - the *name of a role* played by some domain D - all values have the same data type - if a value isn't known or not applicable for a certain entity --> *NULL value*
Tuple
- a *row* in the relation table - *represents a fact* that typically corresponds to a real-world entity or relationship - ex: in a student table, each row corresponds to one students info
Relational Calculas
- a formal query language for the relational model - Two variations of this language: *tuple and domain* - *specifies what is to be retrieved rather than how to retrieve it; written in one declarative expression* (a nonprocedural language) - oder doesn't matter
Referential Integrity Constraint
- a relational model constraint - specified *b/w two relations* - used to maintain the consistency among tuples in the two relations - *a tuple in one relation that refers to another relation must refer to an existing tuple*
Entity Integrity Constraint
- a relational model constraint - states that *no primary key can be NULL* - specified on individual relations
Relational Algebra Expression
- a sequence of relational algebra operations
Complete Set of Relational Algebra Operations
- all of the basic relational algebra operations - *{σ, π, ρ, ∪, ∩, X}* - *doesn't include any of the join operations bc they can be done with these basic operations* - other operations can be done by combining all of these operations - ex: INTERSECTION can be done by using the union and minus, JOIN can be done by using the cartesian product and select
Schema-based Constraints
- also called explicit constraints - constraints that can be directly expressed in the schemas of the data model - *typically specified in the DDL* - ex: domain constraints, key constrains, constraints on NULL values, entity integrity constraints, and referential integrity constraints
Inherent Constraints
- also called inherent model-based constraints - constraints that are inherent in the data model - ex: a relation cannot have duplicate tuples
LEFT OUTER JOIN Operation
- an outer join operation - denoted by *⟕* - *keeps every tuple in the first (or left of symbol) relation* --> searches right relation for a match --> if no match then the attributes of right relation are filled with NULL; otherwise filled in with the matching tuple values - *RESULT <-- (EMPLOYEE⟕<ssn=mgrssn>DEPT)* (shows what department a manager manages, if employee not found then not a manager so put NULL)
RIGHT OUTER JOIN Operation
- an outer join operation - denoted by *⟖* - *keeps every tuple in the right relation* --> seaches left relation for a match --> if no match then attributes of left relation are filled with NULL; otherwise with matching tuple values - *RESULT <-- (DEPT⟖<mgrssn=ssn>EMPLOYEE)*
FULL OUTER JOIN Operation
- an outer join operation - denoted by *⟗* - *keeps all the tuples in both the left and right relation*; puts NULL anywhere there is no match for any attribute
DIVISION Operation
- binary operation - result relation *T holds the values that every attribute in S shares in R* - used in queries that involve *universal quantification* - denoted by *÷* - *T(Y) <-- R(Z) ÷ S(X)* - X, Y, Z are attribute lists - *X is a subset of Z; result attribute list Y = Z - X; Z = X∪Y*
Transition Constraints
- constraint that deals with *state changes in the database* - handled within application program, triggers, or assertions - ex: the salary of an employee can only increase
Cardinality of D
- denoted by *|D|* - the total number of values in the domain
Existential Quantifier
- denoted by *∃* - *formula (∃t)(F) is true if the formula F evaluates true for at least one tuple* - t is bound when in an ∃ clause
Relation State
- denoted by r or r(R) - also called just *relation or relation extension* - set of *ordered n-tuples* - *r(R) = {t1, t2,.., tn}*
Primary Key
- designate one of the candidate keys this; all other candidate keys become unique keys - *values of this key are used to identify tuples in a relation* - *underlined* in schema diagrams
In-line Expression
- doing multiple operations in a single relational algebra expression - ex: π<attributes>(σ<cond>(table)) (does a project and select in one step)
Relational Algebra
- formal relational model language - *defines a set of operations for the relational model* - *must write a sequence of operations in a particular order for a retrieval request *(procedural language) - result of a retrieval query is a new relation called the *result relation*
Assignment Operation
- give *a result table a name* to reference it in later queries - denoted by *<--* (left arrow) - *can also name the resulting attributes* w (col1, col2, col3, ...) - ex: RESULT(col1) <-- σ<cond>(table) (new table name will be RESULT with one column named col1)
Recursive Relationship
- happens *when a foreign key references the primary key within the same table* - ex: EMPLOYEE primary key is ssn, and each employee tuple has a supervisor_ssn which is a foreign key pointing to ssn; *relates one employee tuple to another employee tuple*
Set Null/Set Default (constraint violation handler)
- happens when a delete operation violates a constraint - *sets foreign keys to NULL or another primary key*
Cascade (constraint violation handler)
- happens when a delete operation violates a constraint - DB attempts to *cascade the deletion by also deleting the tuples that reference the deleted tuple*
Duplicate Elimination
- if attribute list for the PROJECT operation only has nonkey columns, then *only the distinct tuples are shown* (duplicates removed) - involves sorting so takes more processing power
Attribute Roles
- if many attributes have the same domain, the attribute names indicate different interpretations for the domain - ex: Home_phone and Office_phone have same domain but obviously represent two different things
Relational Data Model
- introduced by *Tedd Codd* - uses the mathematical relation concept - uses the SQL language - *represents database as a collection of relations* - relations are represented by a *table or flat file*
Atom
- makes up the formula (boolean condition) of a tuple relational calculus expression - three different forms: 1. R(t) 2. t.A op t.b (where op is in the set {=, <, <=, >, >=, !=} 3. t.A op c (op is same as above, c is a constant) - evaluates to either true or false
Key of R
- not redundant like superkeys - has the same *uniqueness constraint* as superkeys, but is *a minimal superkey* (required) - cannot remove any attributes and still have the uniqueness property hold - usually formed by one attribute
OUTER UNION Operatoin
- operation developed to *take the union of two relations that are not union compatible* - need to be *partially compatible* tho --> some of their attributes are union compatible - attribute list X is the list of union compatible attributes (also need same name), and then Y and Z are the non compatible attributes - *T(X, Y, Z) <-- R(X, Y)∪S(X,Z)* - tuples with the same values for the X attributes will only appear once - tuples that appear only in R will have NULL values for the Z attributes - tuples that appear only in S will have NULL values for the Y attributes
UNION Operation
- operation done on two union compatible relations, R and S - resulting relation *includes all tuples in R and all tuples in S* (duplicates removed) - denoted by *R∪S* - commutative and associative
MINUS (or SET DIFFERENCE) Operation
- operation done on two union compatible relations, R and S - resulting relation *includes all tuples that are in R but not in S* (whichever one comes first) - denoted by *R-S* - not commutative
Recursive Closure
- operation that cannot be be described by basic relational algebra - *applied to a recursive relationship b/w tuples of the same type*
THETA JOIN Operation
- operation used to *combine related tuples from two relations into single longer tuples* - important for relationships b/w relations - denoted by *⋈* - *R⋈<cond>S* - only tuples that are true for the condition appear - condition is usually comparing values of an attribute in R with values of an attribute in S (ex: ssn=mgr_ssn) - a cartesian product followed by a select statement can be replaced with this operation
Range Relation
- part of a tuple relational calculus expression - *specifies the relation (aka range) to look at* - if not specified then the range is the universe :-o - {t | *R(t)*}
Requested Attributes
- part of a tuple relational calculus expression - the *set of attributes to be retrieved* - if not set then all the attributes of R are returned - {*t.First, t.Last* | R(t) AND t.age>21}
Time-invariant Property
- property that table keys must hold - *key of a table must continue to be distinct when we insert new tuples into the relation* - ex: if we choose name to be the key then it could eventually break the uniqueness property bc two people can have the same name - solve this by picking something that will always be unique
Current Relation State
- reflects only the *valid tuples that represent a particular state of the real world* - *changes often* as tuples get updated - relation schema (table setup) doesn't change often because it is not common to add new attributes (columns)
Alternate Definition of a Relation
- standard def --> ordering of values in a tuple and the ordering of attributes is important - alt def uses a *mapping for each tuple so ordering doesn't matter as much* - has column name and then tuple value - ex: t = <(Name, Kelsey Murdock), (Age, 21), ...> (tuple can be ordered in anyway since column name is included)
The Closed World Assumption
- states that *the only true facts in the universe are those present within the extension state of the relations* - each value in a tuple is a fact about a certain entity and any other combination of values is false for that entity - useful for *queries based on relational calculus*
Selection Condition
- the boolean condition in the SELECT operation - *format: <attribute name><comparison op><constant val>* (last param can also be another attribute name) - use AND, OR, or NOT to specify multiple boolean conditions in one operation - ex: Dno=4; Salary>30000
Degree or Arity of a Relation
- the total number of attributes in a relation == n - *number of columns in a table*
Join Attribute
- the two identical attributes in a NATURAL JOIN operation - resulting relation only keeps one of these attributes to avoid the redundancy like in EQUIJOIN
Ordering of Tuples in a Relation
- this is *not a part of the relation definition* - *ordering of tuples (rows) doesn't matter* but can be specified - ex: Ordering rows by name in alphabetical order
Outer Joins
- type of *join operation developed so that users can keep any tuples they need regardless of whether or not they are true for the join condition* - can keep all tuples from R, all tuples from S, or all tuples in both
Inner Joins
- type of *join operation that only shows matching tuples in the result* - tuples that don't match the join condition or tuples with NULL values are eliminated - includes the *theta join, equijoin, and natural join operations*
Flat Relational Model
- type of relational model - based off the *first normal form assumption --> assumption that every value is atomic* - Multivalued attributes must be represented by separate tuples - Composite attributes represented by multiple attributes
Superkey of R
- type of schema-based constraint; more specifically a *uniqueness constraint* - no two distinct *tuples in any state can have the same value for a subset of attributes* - *default is the set of all attributes*, but a smaller subset can be defined - can have redundant attributes tho
SELECT Operation
- unary operation (operate on one table) - used to *choose a subset of the tuples* from a relation that *satisfy a boolean condition* - denoted by *σ<boolean condition>(R)* - every tuple that is true for the condition is returned in the resulting relation - multiple of these operations can be applied in any order to get the same result; *Commutative*
Foreign Key
- used to define a referential integrity constraint - *when one table has a column(s) that refers to the primary key of another table* (can also refer to the primary key of its own table) - *the primary key must exist in the referenced relation* (can be NULL) - if these two ^ properties hold, then the referential integrity constraint holds - *drawn with a line pointing to the referenced primary key*
Valid vs. Not Valid DB State
- valid: state satisfies all the constraints in the defined set of integrity constraints - not valid: does not obey the IC
Atomic Values
- values not divisible into smaller components - *Composite and multivalued attributes are not allowed*
Tuple Relational Calculus
- variation of relational calculus - based on specifying a number of *tuple variables that range over a particular DB relation* - *{t | R(t)}* --> simple query returns all the tuples that are in relation R - SQL based on this
Candidate key
- when a relation schema has multiple keys - each key is called this
Union Compatible (or type compatible)
- when two relations have the same number of attributes and each corresponding pair of attributes has the same domain
Entity Integrity Constraint Violation Example:
1. If you try to insert or update a primary key with a NULL value
Domain Constraint Violation Example:
1. If you try to insert or update a value that is not in the domain of Ai
Key Constraint Violation Example:
1. if you try to insert or update a primary key that already exists
EQUIJOIN Operation
- A JOIN operation where the *only comparison operator used is the "="* - result relation *always has two or more attributes that have identical values in every tuple*
Update Operation
- DB modification operation - *changes the values of one or more attributes in a tuple* (or multiple tuples) - has the potential to violate domain, key, entity integrity, and referential integrity constraints
Delete Operation
- DB modification operation - *deletes a record (tuple)* from the table - can *violate referential integrity only* - *handles violations with restrictions (reject), cascades, set null, or a combo*
Insert Operation
- DB modification operation - provides *a list of attribute values for a new tuple t, that is to be inserted into relation R* - can violate domain, key, entity integrity, and referential integrity constraints - *constraint violations handled by rejecting operation*
Generalized Projection Operation
- operation extends the projection operation by allowing functions of attributes to be included in the attribute list - can't be described by the basic relational algebra operations - *π<F1, F2, ..., Fn>(R)* - F1, F2,..., Fn are functions over the attributes; *may include arithmetic operators or constant vals* - Ex: π<0.25 * Salary>(EMPLOYEE) (takes taxes out of the salary from the Salary attribute)
Domain
- Denoted with *D(Ai)* - a set of atomic values - *specifying the data types of values that can appear in each column, and the column name itself* - ex: Employee_age: age of an employee; must be an int b/w 15 and 80
RENAME Operation
- Operation that does the same thing as the assignment operation but is formally defined - denoted by *ρS(B1, B2,..., Bn)(R) or ρS(R) or ρ(B1, B2,...,Bn)(R)* - S is the new table name, (B1,B2,...,Bn) are the new attribute names
Selectivity of the Selection Condition
- The fraction of tuples selected by a selection condition
PROJECT Operation
- Unary operation - *selects certain columns* from a table and discards the rest - denoted by *π<attribute list>(R)* - attribute list is the desired columns to show in the same order as they appear in the list - *Not commutative*; order of calling does matter
Domain Relational Calculus
- Variant of relational calculus - *QBE* based off of this (query by example) - *variables range over single values from domains of attributes* - {x1,x2,..,xn | R(x1, x2,...xj)} - first list are the attributes to be retrieved - list <x1,x2,...xj> states that there must be a tuple with these values in R