CS331 Final Study Part 2
key
- A _______ K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey anymore - functionally determines a tuple (row)
Secondary index
- Can be specified on any nonordering field. Data file can have several ________ ________ - a way to efficiently access records in a database (the primary) by means of some piece of information other than the usual (primary) key.
Full functional dependency
- a Functional Dependency Y -> Z where removal of any attribute from Y means the Functional Dependency does not hold any more Examples: {SSN, PNUMBER} -> HOURS is a ___________ since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a ____________ (it is called a partial dependency) since SSN -> ENAME also holds
delete anomaly
- improper organization of a database that results in the loss of all or significant information about an entity when a row is deleted - when you may attempt to delete a specific record, but this would lead to the deletion of or loss of relevant records that you did not intend to delete
Primary index
- the index structure that determines the placement of a record on disk - Specified on the ordering key field of ordered file of records
LIKE
- used for string pattern matching - % replaces an arbitrary number of zero or more characters, WHERE Address LIKE '%Houston,TX%'; - underscore (_) replaces a single character, WHERE SSN LIKE '_ _ 1_ _ 8901';
EXCEPT
- used to filter records based on the intersection of records returned via two SELECT statements - to retrieve all the unique records from the left operand (query), except the records that are present in the result set of the right operand (query)
Achieving the BCNF by Decomposition
1. List all of the determinants. 2. See if each determinant can act as a key (candidate keys). 3. For any determinant that is not a candidate key, create a new relation from the functional dependency. Retain the determinant in the original relation.
Boyce-Codd Normal Form (BCNF)
A relation schema R is in _______ if it is in 3NF and whenever a Functional Dependency X -> A holds in R, then X is a superkey of R. A relation is in __________ if every determinant is a candidate key. A decomposition D: {R1, R2} of R has the lossless (nonadditive) join property with respect to a set of functional dependencies F on R if and only if either
third normal form (3NF)
A relation schema R is in ___________ if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key. NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency.
second normal form (2NF)
A relation schema R is in _____________ if it is in 1NF and if every non-prime attribute A in R is fully functionally dependent on the primary key Relations that have a single attribute for a key are automatically in ________
functionally determines
A set of attributes X __________ a set of attributes Y if the value of X determines a unique value for Y. X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y. For any two tuples t1 and t2 in any relation instance r(R): if t1 = [X] = t2 =[X], then t1[Y] = t2[Y]
Informal Design Guidelines for Relational Databases
Guideline 1: Informally, each tuple in a relation should represent one entity or relationship instance. Guideline 2: Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any anomalies present, then note them so that applications can be made to take them into account. Guideline 3: Relations should be designed such that their tuples will have as few NULL values as possible. Attributes that are NULL frequently could be placed in separate relations. Guideline 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.
candidate key
If a relation schema has more than one key, each is called a ________ ____.
primary key
One of the candidate keys is arbitrarily designated to be the _______ ________, and the others are called secondary keys.
keyword DISTINCT in the SELECT clause
Only distinct tuples should remain in the result
Normalization
The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations
Functional Dependency
describes a relationship between attributes in a single relation. An attribute is _________ ___________ on another if we can use the value of one attribute to determine the value of another.
CASE
goes through conditions and returns a value when the first condition is met (like an if-then-else statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause SQL will order the customers by City. However, if City is NULL, then order by Country SELECT CustomerName, City, Country FROM Customers ORDER BY (____ WHEN City IS NULL THEN Country ELSE City END);
Insertion Anomaly
it may be impossible to add certain data to a table without including extra information that may not be entirely relevant to the data you want to add
DELETE
may also update a number of tuples (rows) in a relation (table) that satisfy the condition ______ FROM EMPLOYEE WHERE Lname='Brown';
UPDATE
may update a number of tuples (rows) in a relation (table) that satisfy the condition ______ PROJECT SET Plocation = 'Bellaire', Dnum = 5 WHERE Pnumber = 10;
Denormalization
the process of storing the join of higher normal form relations as a base relation— which is in a lower normal form
INSERT
typically inserts a tuple (row) in a relation (table)
UNION
used to combine the result-set of two or more SELECT statements. Every SELECT statement within __________ must have the same number of columns The columns must also have similar data types The columns in every SELECT statement must also be in the same order
INTERSECT
used to retrieve the records that are identical/common between the result sets of two or more tables.
Indexes
used to speed up record retrieval in response to certain search conditions. ______ file stores the ______field with a list of pointers to all disk blocks that contain records with that field value CREATE [UNIQUE] INDEX <index name> ON <table name> (<column name> [<order>] {, <column name> [<order>]} ) [CLUSTER];
Modification anomaly
when you cannot update certain records without having to update many different records across your data
Key constraint
A primary key value cannot be duplicated Dnumber INT PRIMARY KEY; Dname VARCHAR(15) UNIQUE;
Projection attributes
Attributes whose values are to be retrieved
Dynamic Multilevel Indexes Using B-Trees and B+Trees
B-Trees Provide multi-level access structure. Tree is always balanced. Space wasted by deletion never becomes excessive. Each node is at least half-full, each node in a B-tree of order p can have at most p−1 search values.
Selection condition
Boolean condition that must be true for any retrieved tuple. ______________ ______s include join conditions when multiple relations are involved
Aggregate Functions
COUNT, SUM, MAX, MIN, AVG
Entity Integrity Constraint
A primary key value cannot be NULL
Logical comparison operators
=, >, >=, <, <=, <>
prime attribute
A _____ _____ must be a member of some candidate key. A Nonprime attribute is not a ____ _____—that is, it is not a member of any candidate key.
superkey
A _________ of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]
Standard arithmetic operators
Addition (+), subtraction (-), multiplication (*), and division (/) may be included as a part of SELECT
Normal form
Condition using keys and Functional Dependencies of a relation to certify whether a relation schema is in a particular normal form
Specify an asterisk (*)
Retrieve all the attribute values of the selected tuples
Basic Retrieval Queries in SQL
SELECT <attribute list> (is a list of attribute names whose values are to be retrieved by the query) FROM <table list> (is a list of the relation names required to process the query.) WHERE <condition> (is a conditional (Boolean) expression that identifies the tuples to be retrieved by the query.) GROUP BY <grouping attribute(s)> Partition relation into subsets of tuples (The grouping attribute must appear in the SELECT clause) HAVING <group condition> Provides a condition to select or reject an entire group ORDER BY <attribute list> DESC to see result in a descending order of values, ASC to specify ascending order explicitly
VIEW
Single virtual table derived from other tables. SQL queries can use the _____ relation in the FROM clause. SQL _____ is updatable (that is, inserts, updates or deletes can be applied on the view) - if the from clause has only one database relation, - if the query does not have a group by or having clause, and - if the select clause contains only attribute names of the relation (does not have any expressions, aggregates, or distinct specification).
CREATE TRIGGER
Specify automatic actions that database system will perform when certain events and conditions occur. _____ has three components; Event(s), Condition, Action. _____ do not run on views
BETWEEN comparison operator
WHERE (Salary _____ 30000 AND 40000)
Referential integrity constraints
The "foreign key "must have a value that is already present as a primary key, or may be null. Default operation: reject update on violation Attach referential triggered action clause - Options include SET NULL, CASCADE, and SET DEFAULT - Action taken by the DBMS for SET NULL or SET DEFAULT is the same for both ON DELETE and ON UPDATE - CASCADE option suitable for "relationship" relations
Transitive functional dependency
a Functional Dependency X -> Z that can be derived from two Functional Dependencies X -> Y and Y -> Z
Clustering index
Used if numerous records can have the same value for the ordering field. File records are physically ordered on a non key field without a distinct value for each record