DBMS Final
Transactions
Basic unit of change as seen by a DBMS: contains the execution of a piece of code that either executes completely or not at all. Helps prevent interference between two code executions that access & update the same data and manages failure scenarios.
ER to Relation: Weak Entity Types
Become a separate relation, with a FK taken from the identifying owner entity, and a composite PK composed of the discriminator and the FK
ER to Relation: multivalue attributes
Becomes new table, with PK of the parent entity as an FK, and a composite PK consisting of the FK and the attribute values.
Cardinality constraint
Bold line with constraint on edge, depicted as (<min>..<max>), where with * in place of <max> if no max exists
Code: create table
CREATE TABLE <table name> (<column 1> <datatype 1>, <column 2> <datatype 2>, etc.);
Keys and Nulls
Can do {<column name> <datatype> NOT NULL} in a create table statement. PKs by default disallow nulls. For UNIQUE constraints, allows nulls by disallows duplicate non-nulls. FKs can be null.
SQL commands
Case-insensitive, but string are case sensitive and must use single quotes.
Join types of R and S
Conditional/theta join: R∆∆S = σ(R x S), where σ selects rows that meet the selection condition θ. Equi-join: theta join where θ only contains equalities. Natural join: equi-join where result does not contain duplicate columns (same column in both R and S)
Relation
Consists of a relation schema and a relation instance. A relation schema specifies name of relation, and name and data type (domain) of each attribute. A relation instance is a set of tuples (a table) for a schema. Degree/arity is # columns/fields, cardinality = # of rows.
Aggregation
Conveys relationships between an entity set and another relationship type. Treat relationship as an abstract entity, and surround this with a dashed box.
ER to Relation: relationship w/o constraints (many-to-many)
Create a new table for the relationship, with a composite PK consisting of a FK for one entity and a FK for the other.
Deadlock
Cycle of transactions waiting for locks to be released by each other. Prevention: priorities based on timestamps. A transaction in the cycle must be aborted by DBMS (since transactions will wait forever) using algorithms or timeout to detect.
Commit and Abort
DBMS logs all actions so that it can undo the actions of aborted transactions. In the case of a failure, all actions of not-committed transactions are undone. Transaction commits: successfully completed. aborts: if not. COMMIT requests to commit current transaction, might commit might abort. ROLLBACK causes current transaction to abort always. SET AUTOCOMMIT ON: With auto-commit on, each statement is its own transaction and 'auto-commits'. With auto-commit off, statements form part of a larger transaction delimited by the keywords discussed above
Deferring Constraint Checking
DEFERRABLE INITIALLY DEFERRED: wait until transaction complete to check constraint, can change later dynamically via SET CONSTRAINT name IMMEDIATE. Allows to insert tuple referencing FK that does not yet exist in another table, but will be added to thaqt table later in the same transaction. Or the opposite: INITIALLY IMMEDIATE , SET CONSTRAINT name DEFERRED.
Central DBMS Services
Data Independence: do not need to worry about how the data is structured and stored. Declarative Querying: indicate what information is needed, and leave it to the system to work out how to process through the data to extract what you need.; express requests declaratively is in Select...where... statements. Transaction management & concurrency control.
PK and FK contraints
Declared at end of CREATE TABLE statements FKs: { FOREIGN KEY (<fk name>) REFERENCES <foreign table name>(<attribute in that table, usually PK>) } or {CONSTRAINT <constraint name> FOREIGN KEY (<fk name>) REFERENCES <foreign table name>(<attribute in that table, usually PK>) } PKs: { PRIMARY KEY (<pk name>) } or { CONSTRAINT <constraint name> PRIMARY KEY (<PK name>) } Unique: { UNIQUE (<candidate key>) }
Relationship Type (Relationship Set)
Describes a relation among n > 2 entities, each from entity sets (ER: diamond). In relational model, relationship types called a relation (and is a set of tuples, which are individual relationships), and entity types call also be represented as relations
ER to Relation: IsA Hierarchy
Distinct relations for the superclass and for each subclass. PK of the subclass is the PK of the superclass as an FK.
first normal form (1NF)
Domains of all attributes of R are atomic (so multivalued attributes)
Table vs relation
Each have unique name, columns have unique names, tuples have same struct, all attribute values atomic, every row is unique and or doesn't matter
Many-to-1 relationship (Key Constraint)
Each of A is related to at most 1 of B. (ER: arrow from A to B).
Relationship-Role
Each participating entity can be named with an explicit role in the relationship (role on line from relationship type to the entity type with that role)
Solving Schedule Conflict Serializability Problems
Expand schedule into rows for each transaction, with execution order still the same. Create set of rules based on which operations do not commute (read and write operations from different transactions on the same data item commute). E.g. (R1x, W1x, R2x) 2 cannot precede 1, because (Rx, Wx, Rx) ≠ (Rx, Rx, Wx). Select conflict serializable schedule from remaining options that don't violate the rules.
Join queries
FROM clause forms the cartesian product of tables R and S (FROM R, S). WHERE performs equality check for common columns (WHERE R.c1 = S.c1). To refer to same table twice, make two aliases (SELECT L.name, M.name FROM Lecturer L, Lecturer M WHERE L.manager = M.empid).
ER to Relation: 1-to-1 with participation constraint
For 1-to-1 as A-to-B with a participation constraint on A, import PK of B as an FK in the table for A, add NOT NULL constraint and UNIQUE constraint on the FK
ER to Relation: 1-to-1
For 1-to-1 as A-to-B, import PK of B as an FK in the table for A, add UNIQUE constraint on the FK
ER to Relation: Aggregation
For C related to aggregated (A-B), the C-(A-B) relationship is its own table, with a composite PK consisting of PK of C as an FK, and the PK of the A-B relationship as another FK
IsA relationships
For entity types whose set of properties are actual a subset of another entity type, where F is a E. F is a specialization of E, and E is a generalization of F. F inherits all attributes and relationships of E. (ER: triangle containing "IsA"). Overlap constraints: is disjoint if centity an only belong to one lower-level entity type ("disjoint" next to the IsA triangle), otherwise is overlapping. Covering constraints: is total if entity must belong to at least one lower-level entity type (bold line between superclass and IsA triangle), otherwise is partial.
Referential integrity
For foreign keys: FOREIGN KEY <fk> REFERENCES <table>(<attribute>) ON DELETE CASCADE: If the attribute that fk refers to is deleted in home table, the row that references it in the referencing table should also be deleted. ...ON UPDATE SET DEFAULT: If the attribute that fk refers to is deleted in home table, set the fk value in referencing table to the default value specified in the schema.
ER to Relation: many-to-1 with participation constraint
For many-to-1 as A-to-B with a participation constraint on A, import PK of B as an FK in the table for A, add NOT NULL constraint on the FK
ER to Relation: many-to-1
For many-to-1 as A-to-B, import PK of B as an FK in the table for A
ER to relation: many-to-many with participation constraint
For many-to-many as A-to-B, with participation constraint for A, in the table for A add constaint: { CHECK( EXISTS(SELECT * FROM <AB relationship table> ab WHERE <attribute>=ab.<attribute>) ) DEFERRABLE INITIALLY DEFERRED
Relational division
For queries such a finding students who have taken ALL core units of study. Find the items in candidate set (R/dividend) that are related to all tuples in base set set (S/divisor). Example: SELECT sid FROM Student NATURAL JOIN Enrolled WHERE uosCode LIKE 'INFO2%' GROUP BY sid HAVING COUNT(*) = (SELECT COUNT(*) FROM UnitOfStudy WHERE uosCode LIKE 'INFO2%')
Query-Clause Evaluation Order
From, Where, Group By, Having, Select, Order By
Test for candidate keys
Given R(<attributes>) and F(<dependencies>), check (<atr1><atr2>)+ by going through F and seeing which additional attributes are functionally determined. If (<atr1><atr2>)+is the full set of attributes, then (<atr1><atr2>) is a superkey => check that neither (<atr1>)+ nor (<atr2>)+ are superkeys => (<atr1><atr2>) is minimal, so is a candidate key.
Redundant Info Anomolies
Given Table R(A,B,C) for each distinct item in column B there is only one possible value for column C, have redundant data. Update anomaly, update value of column C for all but one tuple. Delete anomaly: if column A deleted, lose info on the relation between B and C. Insert anomaly: w/o at least one entry for column A, cannot establish the connetion between columns B and C. Solution: divide into table R(A,B) and table S(B,C).
Attribute closure (X+)
Given a set F of FDs and a set X of attributes, the attribute closure of X, denoted by X+ , is the set of attributes that are determined by X. Any set of attributes, whose attribute closure is the whole relation, is a superkey. A superkey is a candidate key if it is minimal (i.e., none of its subset is a superkey)
DBMS System Requirements
High Availability and Reliability, High Throughput, Low Response Time, Long Lifetime, Security
Foreign keys (FK)
Identifiers that enable a dependent relation to refer to its parent relation. Must refer to a candidate key (usually PK) of the parent relation. E.g. PK of a relationship type is a composite PK of two FKs, which are the two PK's of the two entity types involved in that relation. FKs are dash-underlined
Deferring ICs example
If FK in R references S and has CONSTRAINT <con1> DEFERABBLE, then: BEGIN TRANSACTION; SET CONSTRAINTS <con1> DEFERRED; INSERT INTO R....; INSERT INTO S....; COMMIT;
Avoid duplicate rows
If have PK, already taken care of. If not, specify all attributes as a single candidate key using UNIQUE.
Security & SQL Injection
In Java, use prepared statements: stmt = conn.prepareStatement("<sqlcode> ? <sqlcode> ?") stmt.setFloat(1, <variable1>) stmt.setFloat(2, <variable2>) In Python, use anonymous or named parameters ("args" argument in pgexec): studid = 12345 cursor.execute( "SELECT name FROM Student WHERE sid=%s" , (studid,) ) studid = 12345 cursor.execute( "SELECT name FROM Student WHERE sid=%(sid)s", {'sid': studid} )
Join types
Join operators are in the FROM clauses. Inner joins (default in SQL): Theta-join: R JOIN S ON <condition>; Equi-join: R JOIN S USING (<list of common attributes bt tables>); Natural join: R NATURAL JOIN S automatically performs equi-join on the common columns of the table. Outer joins: include the results of the inner join plus rows without matches: R LEFT OUTER JOIN S includes unmatched from R (will have null values for columns from S), RIGHT "" includes unmatched from S, FULL "" includes unmatched from both. Can do an inner join of 3 tables with "FROM R JOIN S USING <attribute1> JOIN T USING <attribute2>);"
UNIQUE constraint
Lists attributes that comprise a candidate key but are not the primary key. E.g. attributes A and B: UNIQUE (A, B) means can only be one tuple (A1, B1)
Correlated vs. Noncorrelated Subqueries
Noncorrelated subqueries: Do not depend on data from the outer query, execute once for the entire outer query. Correlated subqueries: Make use of data from the outer query i.e. refers to the table in the outer query, execute once for each row of the outer query (EXISTS and UNIQUE usually only this type).
RDBMS table extends mathematical relation
RDBMS allows duplicate rows, support an order of tuples or attributes, allows null 'values' (represent facts that are not relevant, or not yet known) for unknown information
Armstrong's axioms
Reflexivity rule: B is subset of BC, then BC -> B. Augmentation rule: If X -> Y, then XZ -> YZ. Transitivity rule: If X -> Y and Y -> Z, then X -> Z. Follows: Union rule: If X -> Y and X -> Z, then X -> YZ. Decomposition rule: If X -> YZ, then X -> Y and X -> Z.
Mapping cardinality
Relationship is a subset of crossproduct AxB.
Database Design Sequence
Requirements Analysis, Conceptual Design, Logical Design (Convert conceptual design into DDL), Schema Refinement, Physical Design, App & Security Design
Remove duplicate rows
SELECT DISTINCT ...
Rename
SELECT a.sid, a.empid AS lecturer FROM Assessment a renames Assessment to a, renames empid to lecturer
Ordering results
SELECT... FROM...WHERE... ORDER BY <column name> (ASC,DESC). ASC default
Nested queries/subqueries
SQL is compositional: output of one query can be used as input of another: SELECT <column1> FROM R WHERE <column2> IN (SELECT column2 FROM S WHERE <condition1> INTERSECT<SELECT column2 FROM S WHERE <condition2>)
Null values
Signifies that a value does not exist or not applicable. Can check using WHERE <attribute> IS [NOT] NULL. Arithmetic with NULL returns NULL. Comparison with NULL returns UNKNOWN. WHERE clauses predicate that evaluates to NULL treated as false. Aggregate functions other than COUNT(*) ignore null rows, unless all rows are null = returns NULL.
Query language
Specialized language for accessing data. Subtypes: Data Manipulation Language (DML), Data Definition Language (DDL) for creation, deletion and modification of tales, Data Control Language (DCL). SQL based on "formal" languages: Relational Algebra and Relational Calculus.
Integrity constraints
Specified in the schema, and checked when parts of DB are modified (after SQL statement by default, or after a transaction). If violated: reject DB operation, abort transaction, execution of maintenance operation to make DB legal again. Static ICs: describe conditions that every legal instance of a database must satisfy (Domain, Key Constraints/Refential Integrity, Assertions, Semantic). Dynamic ICs: are predicates on database state changes (triggers).
Integration approaches
Statement-Level Interface (SLI): Embed SQL in the host language. Call-Level Interface (CLI): Create special API to call SQL commands (JDBC, DBAPI)
Relation keys
The minimal set of attributes in a relation that can uniquely identify each row of that relation. No two distinct tuples in a legal instance can have the same values in all key fields. If any subset of the key violates this, then the "key" is actually a superkey. If there's at least one key for a relation, we call each of them a candidate key. One is chosen to be PK (automatically NOT NULL and UNIQUE).
SQL Aggregation
Used in SELECT clauses. COUNT, SUM, AVG, MAX, MIN. Applies to duplicates unless SELECT DISTINCT.
Functional dependencies
Used to identify schema sources of redundancy and to suggest refinements. X -> Y means X functionally determines Y, i.e. the value of one set X of attributes determines the value of another set Y of attributes. Also means X-to-Y is a many-to-1-relation. If two tuples of a relation R agree on values of all attributes of X, then they must also agree on the values of attributes of Y.
Entity Type/Set
a collection of entities that share common properties or characteristics (ER: rectangle)
'Enhanced' ER Model (EER)
all modelling concepts of basic ER + addition of some object-oriented concepts: subclasses/superclasses, specialization/generalization, categories, attribute inheritance
Entity
an individual object, e.g., a person, place, object, event, or concept about which you want to gather and store data
Domain Constraints
data type declared in schema, as well as DEFAULT (value to place if omitted), NOT NULL, NULL (for things like date where if no value inserted, doesn't automatically give a NULL), CHECK (<condition>) where condition such as { <attribute> IN(<str1>, <str2>) } can also be listed as a separate CONSTRAINT at bottom like PKs and FKs can. Can also CREATE DOMAIN <custom domain name> <dataType> ... etc, and then say <attribute> <custom domain name> in schema declaration.
Weak entity type
entity type (double rectangle) that does not have a self-contained primary key, whose existence depends on an identifying owner (strong) entity type. Must relate to the identifying owner entity type via a many-to-1 relationship (double diamond, with both key and participation constraints). Has discriminator which acts like PK (dashed underline). PK is the strong entity's PK + discriminator.
Entity Schema
entity type name, attributes (+domains), PK
Participation Constraint
every entity participates in at least one instance of a relationship (ER: thick line on the side of the entity type that must participate)
Key and Participation Constraint
every entity participates in exactly one relationship
Integrity Constaints
facilities to specify a variety of rules to maintain the integrity of data when it is manipulated, must be true for any instance of the database (i.e. legal instance). Specified/declared in the schema.
Key
minimal set of attributes that uniquely identifies an entity in the set. Entity types have PK. Relationship types have a superkey of the PKs of participating entity types.
Domain
possible values of an attribute; In contrast to relational model, values can be complex / set-oriented (e.g. multi-valued (double circle) and composite attributes)
Strict Two-phase Locking (S2PL) Protocol
shared (S) lock: Data item can only be read (but shared by multiple transactions). exclusive (X) lock: Data item can be accessed by just one transaction (for writing). All locks held by a transaction are released when the transaction completes. T2 only doesn't wait for completion of T1 if T1 has S lock on item X and T2 requests S lock. All others must wait.
3-valued logic for boolean
true = 1, unk = .5, false = 0. A and B = min(A,B) A or B = max(A,B) Not A = 1-A
DB-API: Transaction Example
try: curs = conn.cursor() curs.execute (<command1>) curs.execute (<command2>) curs.close() conn.commit() except: conn.rollback()
Stringmatching
using the LIKE operator: ... WHERE attribute LIKE '<desired result>'. E.G. for ABCDEF, LIKE '%BC%' returns ABCDEF, LIKE '_BC_' returns ABCD.
Set comparison operators
v [NOT] IN R: v is in or not in R. [NOT] EXISTS: whether R isn't empty or is empty. UNIQUE R: whether there are no duplicate rows in R. v op ALL R: whether v is >, =, etc all rows in R. v op SOME R: whether v is >, =, etc at least one rows in R.
Table Deletion and Modification
{ DROP TABLE <name> } destroys schema and tuples. { ALTER TABLE <name> ...} {... ADD COLUMN <name> <datatype/constraints> } {... RENAME COLUMN <old> TO <new> }
Insertion of tuples into a table
{ INSERT INTO <table name> ...} {... VALUES (<full tuple entry>) } {... (<column 1>, <column 2>, etc) VALUES (<value 1>, <value 2>, etc) }
SELECT-FROM-WHERE
πA(σB(R,S)): SELECT columns A (separated by commas, and can include arithmetic expression) FROM cartesian cross product of relations R,S,etc WHERE conditions B (if multiple, use AND/OR/NOT) are met
Non-conflict Serializable Schedules
› Reading Uncommitted Data ("dirty reads") › Unrepeatable Reads: may not read same value twice › Overwriting Uncommitted Data ("lost updates"):
Relational algebra operators
*Projection π∨projection list∨(R): deletes unwanted columns not in the projection list and duplicate rows, returns result with scheme composed of projection list attributes; *Selection σ∨column op condition∨(R), selects rows that meet selection condition and removes unwanted rows, schema of result same as of input; *RXS: cross-product, returns every possible combination of tuples from two relations; Joins; Set operations: R and S must have same arity, with corresponding fields with same domain: *Union R∪S: tuples in relation 1 or 2 or both; Intersection R∩S: tuples in relation 1 and 2; *Difference R-S: tuples in relation 1 not 2; *Rename p∨x(a1,a2,an)∨(R): rename R to x, rename it's attributes a1-an, assuming R has n attributes.
DBMS advantages
.Application-Data Independence: applications don't need to worry about data formats, applications don't need to process data access routines, reduced application development time, increased maintenance productivity, efficient access. Minimal Data Redundancy: Leads to increased data integrity/consistency. Improved Data Sharing & Security - Different users get different views of the data Enforcement of Standards - All data access is done in the same way › Improved Data Quality - Integrity constraints, data validation rules › Better Data Accessibility/ Responsiveness - Use of standard data query language (SQL) › Backup/Recovery, Concurrency - Disaster recovery is easier
Tiers of DB App Architecture
1-tier: Centralized system: user_module(presentation_services, application_services)-API-DBMS-DB 2-tier: Client-server model: client_machines(presentation_services, application_services)-comm_network-database_server_machine(DBMS-DB) 3-tier: PresentationTier: client_machines(presentation services)-comm_network-MiddleTier:app/web_server_machine(application_services)-comm_network-DataMgmtTier:database_server_machine(DBMS-DB)
Second Normal Form (2NF)
1NF + prohibition of partial dependencies: a non-trivial FD X -> Y for R where X is a strict subset of some key for R and Y is non-prime/not part of a key. I.E. for key ABC and X -> Y, X must equal ABC, cannot be AB->D for example.
Data model
A collection of concepts for describing data. Relational model: relation (essentially, a table) and schema. Relational DB: collection of such tables, which are usually connected to tone another by key attributes.
Transaction definition
A collection of one or more operations on one or more databases, which reflects a discrete unit of work. Completes or does not happen at all
Transaction Durability
A database should always be able to be recovered to the last consistent state. Database is stored redundantly on mass storage devices to protect against media failure (e.g., RAID). Write-Ahead Log
Database
A large integrated collection of data, central to some enterprise/organisation. All qualified users have access to the same data for use in a variety of activities
Assertion
A predicate expressing a condition that we wish the database always to satisfy. Act act as a schema object, like a table. When an assertion is made, the system tests it for validity, and tests it again on every update that may violate it. { CREATE ASSERTION assertion-name CHECK (condition) }. Sometimes better to check NOT EXISTS for condition that violates the assertion instead of check for the satisfying condition.
DBMS
A program/software that manages a database. Stores the database on some mass (persistent) storage providing fail safety. Supports a high-level access language. Provides transaction management to some guarantees for correct concurrent access to shared data
Serial Schedule
A schedule in which all transactions are executed from start to finish, without interleaving, one after the other. Each transaction is isolated from all others.
Conflict Serializable Schedules
A schedule is conflict serializable if it is conflict equivalent to some serial schedule. Two schedules are conflict equivalent if 1) they involve the same set of operations of the same transactions, 2) they order every pair of conflicting operations the same way. Two operations conflict if they access the same data X, they come from different transactions, and at least one of them wrote X. Reading: SELECT. Writing: INSERT, DELETE, UPDATE. Data objects are the rows in the table//// A schedule is conflict serializable if and only if its precedence graph is acyclic. IF CONFLICT SERIALIZABLE, THEN ALSO SERIALIZABLE.
Transaction Consistency
A transaction is consistent if, assuming the database is in a consistent state initially (satisfying all constraints), when the transaction completes, 1) all database constraints are satisfied (even if violated in intermediate), and 2) the new database state satisfies specifications of the transaction
Add/Modify/Remove Integrity Constraints
ALTER TABLE <table name> <constraint mod>. Constraint mods: ADD CONSTRAINT (name, constraint), DROP CONSTRAINT (name), RENAME CONSTRAINT (old name) TO (new name), ALTER COLUMN (column name, new domain constraint). Constraint does not get created if existing data in the table does not comply to it
Closure of F (F+)
All FDs logically implied by Armstrongs Axioms for an initial set of given FDs F
SQL Grouping
Applying aggregation to several different groups of tuples. SELECT <column1>, AGG(<column2>) FROM table WHERE <condition> GROUP BY <column1> HAVING AGG(<column2>) <comparison> <value>. condition is WHERE applied before grouping, condition in HAVING applied after groupin
Thrashing
As number of clients (and transactions) increase, and there is more contention for locks, each transaction will have to wait longer to obtain a lock. Thrashing is critical point where adding another transaction to wait for the lock would actually decrease the number of transactions served per unit time.
Properties of transactions
Atomicity (A transaction is either performed in its entirety or not performed at all), Consistency (A correct execution of a transaction must take a database from one consistent state to another), Isolation (Effect of multiple transactions is the same as these transactions running one after another), Durability (Once a transaction changes the database and the changes are committed, these changes must never be lost because of subsequent failure)
Set operations
Automatically eliminate duplicates. UNION, INTERSECT, and EXCEPT correspond to ∩, ∪, and -. Add "ALL" to retain duplicates, i.e. "multiset" versions of the operations.. E.G. if tuple occurs 3 times in R and 1 time in S, R EXCEPT S returns 0 tuples, R EXCEPT ALL S returns 2 tuples. Usually in the form of (SELECT FROM WHERE)₁ UNION (SELECT FROM WHERE)₂. Be careful of schema of the tables being created by the SELECT statement, e.g. if you want to find names of students in both Class A and B, but you intersect 2 SELECT name statements, will instead return the common names of two students in the two classes who may not be the same person, i.e. Ian in Class A different than Ian in Class B, but the intersect returns Ian nonetheless.
Transaction
BEGIN (sql statements) COMMIT.