DSCI 351 midterm 1
ER vs relational model
both used to model data ER has multiple concepts (entities, relationships, is-a, etc.), well-suited for capturing application requirements but not well suited for computer implementation relational model: single concept is relation, world is represented with a collection of tables, well-suited for efficient manipulations on computers
declaring multi attribute keys
can also be another element in the list of elements of a create table statement form is essential if the key consists of more than one attribute
Multirelation queories
can be addressed by listing all relations in the FROM clause, with <relation>.<attribute> syntax
char and varchar in mysql
char(n): - right-pad with spaces to store exactly n characters - trailing spaces are removed when retrieved varchar(n) - does not pad with spaces - store length as prefix (one byte if length < 255, otherwise 2)
E/R approach
create one relation for each subclass, with only the key attribute(s) and attributes attached to that entity set
use nulls
create one relation; entities have null in attributes that don't belong to them
defining a database schema
database schema comprises declarations for the relations of the database many other kinds of elements may also appear in database schema, including views, indices and triggers
schemas
describes structure of data Schema of a relation: relation name plus attribute names schema of a database: a set of relational schemas
explicit tuple-variables
distinguish copies by following the relation name by the name of a tuple-variable in the FROM clause
object-oriented approach
each entity belongs to exactly one class; create a relation for each possible subtree including the root, with all its attributes
attributes
each has an atomic domain: string, integer, reals etc.
keys
every entity set needs at least one, can be more than one key per set
referential integrity constraint
exactly one value exists in a given role attribute has a non-null single value explicitly requires a reference to exist
controlling duplicate elimination
force result to be a set by SELECT DISTINCT: may distinct multiple attributes Force the result to be a bag (i.e., don't eliminate duplicates) by ALL, as in ...UNION ALL
introduced by exists or not exists
form boolean expression exists (subquery) evaluates to true if subquery has at least one result not exists (subquery) evaluates to true if subquery has no results
dates and times
form of date: yyyy-mm-dd datetime: yyyy-mm-dd hh:mm:ss
relational algebra
formalism for creating new relations from existing ones using relational operators
importance of contraints
give semantics to data references to entities enable efficient storage and lookup
having clauses
having <condition> may follow a group by clause if this is the case, condition applies to each group, and groups not satisfying the condition are eliminated
benefits of sql
high-level language, programmer able to avoid specifying a lot of data manipulation queries are optimized quite well, making it efficient
mathematical relationship
if A, B are sets, then a relationship R is a subset of A x B
restrictions on SELECT lists with aggregation
if any aggregation is used, then each element of the select list must either be aggregated or an attribute on the group by list
insertion
insert single tuple: INSERT INTO <relation> VALUES (<list of values>)
subquery in where clause
introduced by = or !=: - x = subquery - x can be attribute or tuple of attributes - needs to return exactly one result introduced by in or not in: - x in (subquery), may return multiple results
common constraints
keys, single-value constraints, referential integrity constraints, domain constraints, general constraints
specifying attributes in INSERT
may add to relation name a list of attributes reasons to do so: - forget standard order of attributes for relation - don't have values for all attributes and want the system to fill in missing components with null or default value
adding attributes
may change a relation schema by adding a new attribute (column) by: ALTER TABLE <name> ADD <attribute declaration>;
requirements on Having conditions
may refer to any relation or tuple-variable in the from clause can refer to attributes of those relations if attribute is either a grouping attribute or aggregated
Null values
missing value or inapplicable when value is compared with null, truth value = UNKNOWN
ER relationships
modeled as mathematical set binary and multiway relationships converting a multiway one into many binary ones constraints on the degree/multiplicity of the relationship (many-one, one-one, many-many)
operational semantics
one tuple-variable for each relation in FROM clause if tuple-variables are pointing to tuples that satisfy the WHERE clause, send tuples to SELECT clause
order by
orders by desc or ascending, can set limit for number of returned values offset x - offset of first down to be returned (starts from 0)
entities
real-world objects distinguishable from other objects, described using a set of attributes
grouping
relation that results from select-from-where is group according to values of all those attributes, and any aggregation is applied only within each group
two mathematical definitions of relations
relations as subset of cartesian product (tuple = element of string x int x string x string, relation = subset of string x int x string x string), order is important relation as a set of functions: fix set of attributes (A tuple = function t: A-> attribute domains, relation = a set of tuples/functions), order is not important
deleting attributes
remove an attribute from a relation schema by: ALTER TABLE <name> DROP <attribute>;
right outer join
retain dangling tuples from right relation
schema and instances as programming
schema = type/class instance = value/instance database schema = stable over long periods of time database instance = changes constantly as data is changed
Instances
schema instance = data relational schema = instance = relation with a collection of tuples Database schema = instance = n relations, of types R1, R2, ..., Rn
Subquery in FROM Clause
select * from (select * from <relation> tupe variable needed to name the relation generated by the subquery
Bag Semantics
selection: preserve number of occurrences projection: preserve number of occurrences (no duplicate elimination) cartesian product, join: no duplicate elimination
declaring a relation
simplest form: CREATE TABLE <name> (<list of elements>); remove a relation from database with DROP TABLE <name>;
subclasses in ER diagrams
special cases, fewer entities, more properties
formal semantics
start with product of all relations in the from clause apply selection condition from the where clause project onto list of attributes and expressions in the SELECT clause
difference (except)
subtract number of occurrences {a,b,b,b,c,c} - {b,c,c,c,d} = {a,b,b}
inserting many tuples
use INSERT INTO <relation> (<subquery>);
SQL joins
use bag semantics can be used in place of relations in a FROM clause
weak entity sets
when some or all of their key attributes come from other entity sets
intersection
{a,b,b,b,c,c} (upside down U) {b,b,c,c,c,c,d} = {b,b,c,c} minimum of the two numbers of occurrences
SQL patterns
<attribute> LIKE/NOT LIKE <pattern>, used with "%<string>%'
comparison operators
=, !=, <, >, <=, >=, <>
entity set
A collection of like entities.
full outer join
A join in which all rows from both tables will be included regardless of whether they match rows from the other table
left outer join
A join that includes all of the rows from the first table in the query and only those records from the second table that match the join field in the first table
delete all tuples
DELETE FROM <relation> no where clause needed
deletion
DELETE FROM <relation> WHERE <condition>
eliminate duplicates in an aggregation
DISTINCT inside aggregation causes duplicates to be eliminated before the aggregation
Domains in a relational model
Each attribute has a type, must be atomic
other declarations for attributes
NOT NULL means that the value for this attribute may never be null DEFAULT <value> says that if there is no specific value known for this attribute's component in some tuple, use the stated <value>
nulls ignored in aggregation
NULL never contributes to a sum, average, or count of a specific column (e.g., count(price)), and can never be the minimum or maximum of a column (unless the column has only null values) if there are no non-Null values in a column, then the result of aggregation is NULL
declaring single-attribute keys
Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute
theta join
R JOIN S ON <condition> is a theta-join, using <condition> for selection gives us all (n, a, p, d, b) quadruples such that drinker n (same as d) lives at address a, has phone p, and frequents bar b.
aggregations
SUM, AVG, MIN, MAX, COUNT can be applied to a column in a SELECT statement to produce that aggregation on the column
primary key vs unique
There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL's, and there may be several tuples with NULL.
Three-valued logic
True = 1; false = 0; unknown = .5 AND = min, OR = max, NOT(x) = 1-x if x = Null then 4*(3-x)/7 is still NULL
update
UPDATE <relation> SET <list of attribute assignments> WHERE <condition on tuples>;
natural outer join
add natural before left/right
Union (all)
add number of occurrences {a,b,c,c} U {a,a,b,b,c} = {a,a,a,b,b,b,c,c,c}
constraint
an assertion about the data in the database that must be true at all times part of database schema very important to design to ensure integrity
declaring keys
an attribute or list of attributes may be declared with PRIMARY KEY or UNIQUE These each say the attribute(s) so declared functionally determine all the attributes of the relation schema
single value constraint
an entity has at most one value for a given attribute or relationship a many-one relationship also implies a single value constraint
additional MySQL operators
between x and y: [x,y] A in (x,y,z): A = x or A = y or A = z