CSCI 3287: Database Systems

¡Supera tus tareas y exámenes ahora con Quizwiz!

Constraint

when the DBMS is running, it will not allow any program to directly access any DBMS managed files (through OS permissions)

naming relationships

write the name of the entity above the rectangle and it should be singular

an occurrence of an entity

each row represents one occurrence of an entity

null/not null

enforces a column to NOT accept NULL values a column can hold null values

why use triggers?

enforces business rules, moves code from an application program to the database, performance improvement, all work done on server

currency control

ensures that one user's work does not inappropriately influence another user's work - no single concurrency control technique is ideal for all circumstances - tradeoffs need to be made between level of protection and throughput

purpose of indexes

faster processing by reducing disk I/O indexes are usually much smaller than the base tables they serve -- less I/O, less buffer space consumed. Indexes are often pinned in memory

tradeoffs with indexes

faster reads slower inserts/updates

the "trade off" with ACID transaction compliance

faster throughput data integrity/accuracy

table

general term for where data is stored in db's

Functional Dependencies (FD)

has the form of X --> Y (reads: X implies Y), where X and Y are sets of attributes. It means that whenever two tuples are identical on all the attributes in X, they must also be identical on all the attributes in Y alternatively: you can interpret X --> Y as: each possible value of X can correspond to exactly one value of Y

breaking deadlock

almost every DBMS has algs for detecting upfront when deadlock occurs, DBMS aborts one of the trans an rolls back any partially completed work

surrogate key

an artificial column added to a relation to serve as a primary key - DBMS supplied - short, numeric, and never changes - an ideal primary key - has artificial values that are meaningless to users - normally hidden in forms and reports

shared vs exclusive

an exclusive lock prohibits other users from using the locked resource a shared lock allows other users to read the locked resource but they cannot update it

Development of database

application program --> DBMS software --> database

Objective of normalization

arrange the data into a series of clearly defined, well-organized relations - each with a primary key - all attributes are functionally dependent on the primary key - each with required foreign keys referencing other relations

Why views?

base table or specific columns can be hidden from certain users who are only allowed access to the view very complex SQL to create the view can be hidden from end users

Why are products called "relational"?

because SQL lang based upon the principles of relational algebra

before and after images in logs

before: a copy of every database record (or page) before it was changed after: a copy of every database record (or page) after it was changed

buffer pools

caching disk I/O can hold objects in memory (smaller tables, indexes)

ROLLBACK

causes the transaction to abort, or terminate unsucessfully

COMMIT

causes the transaction to end successfully

Check

check is used as limit the value range that can be placed in a col if you use check constraint on a single col, it allows only certain values for this col, it applies the same for the table

select

choose which columns to display in the returned results

cold vs warm backup

cold: stop all db traffic and back up quickly MySQLDUMP vs file-level backup (OS) faster requires downtime warm: take the backup while db traffic is in-progress takes longer less inconvenience to user community

key

combination of one or more columns that is used to identify rows in a relation

projection

create a new subset of columns from R

DDL

data definition language - for creating tables, relationships, and other structures (create, drop, alter, truncate)

order by

designate the order in which selected rows should be displayed

read anomalies

dirty read, unrepeatable read, phantom read

Query cache

holds the compiled executable query queries may be run over and over by an application, changing only host variables

cardinality

how many of these are related to how many of these (zero, one or many) on both ends of the relationship

implicit vs explicit

implicit locks are locks placed by DBMS explicit locks are issued by the application program

types of locks

implicit vs explicit shared vs exclusive deadlock

relational data

is "self-describing" and it contains within itself all information about itself (aka metadata)

optionality

is the relationship mandatory (one or more) or optional (zero)

Full outer join

joins all rows in A and B plus those that have a matching entry in table A and B

hot standby via replication

keep a duplicate up-to-date at all times

Inner Join

keyword selects records that have matching values in both tables (joins only the rows that have a matching entry in table A and B)

cardinality and optionality symbols

line with two small cuts = one to mandatory line with one small cut and v = many to mandatory line with open circle and small cut = one to optional line with open circle and v = many to optional

what triggers a long flush to disk?

logs in memory must be written to disk (transaction logs) a log flush in MySQL ("redo logs" "archive logs" in Oracle)

foreign key - constraints

may be defined at the col level should be defined at the table level so you can give it a name maintaining RI

Unique

may have null values give it a name check constraint - applies cond to a col rules for conditions same as WHERE clause not avaliable in mysql

primary key - constraints

may not have null values may be defined at the column level if one column must be defined at the table level if more than one column it includes NOT NULL and UNIQUE

ID Dependence

ID dependence means they have the same ID in their entities

Domain relation name

(column-name1:datatype, column-name2:datatype) ProductID:integer

VIEW

- VIEW is an empty shell of a table definition - view contains no data until it is queried - sometimes considered a virtual table - each time the view is queried, the underlying query that populates the view is re-executed

Normalization

- a method of cleaning up and organizing data - structures the data to fit the RELATIONAL MODEL - a set of rules you apply to the data to clean up the data - a structured, defined, detailed process - BCNF is necessary in normalization process

Widespread use of relational DBMS software due to the following reasons:

- allow to query the info - supports huge data sets (terabytes, petabytes) - provide durability - help control access - 90% of market share

parent-to-child relationships

- an order belongs to one customer - a customer can place zero, one or many orders - an order may contain one or many OrderDetails - a product may be purchased on zero, one or many OrderDetails

Main process of data modeling

- draw a horizontal line across each entity rectangle, and enter the name of the primary key attribute above the line - then list all the remaining attributes within the rectangle below the line - identify any foreign key attributes with an (FK) - walk through the model with your customers

begin process of data modeling

- draw a rectangle to represent each ENTITY - draw a circle to contain each ATTRIBUTES - draw a diamond to represent the RELATIONSHIP - draw a RELATIONSHIP line between each related entity - Relationships with at most one instance of an entity connect with arrows - Relationships with at least one instance of an entity connect with bold lines - draw CARDINALITY and OPTIONALITY symbols on both ends of each relationship line - Highlight the Primary key - resolve any many-to-many relationships by creating an ASSOCIATION (child) entity between the two "parent" entities

Codd's Rules for Relational Model

- each attribute of a tuple must be atomic - each column of a relation defines a . domain - tuples in a relation are a SET, not a list. Order is immaterial. - the columns are a SET of attributes. Order is immaterial. - each tuple must have an attribute that uniquely identifies that tuple - there can be NO duplicate tuples in a relation

Rules about functional dependencies

- if i know a true FD for a relation, I can deduce other true FD's for that relation - Transitive Rule: in relation R(A,B,C) if A->B and B->C then A->C - Augmentation Rule: In relation R(A,B,C) if A->C then A+B->C+B - Combining and splitting rule in a relation R(A,B,C,D) if A+B->C, and A+B->D, then A+B->C,D - Trivial FDs - are assumed to be true. The right side is a subset of the left side - Closure: for a relation, we seek ALL TRUE functional dependency rules - A relation: R(A,B,C,G,H,I) FD's = A->B, A->C, C+G->H, C+G->I, B->H

Preparation of data modeling

- once data requirements are clear then decide business area you are modeling - organize all the data items into ENTITIES and ATTRIBUTES - determine an attribute that can serve as a PRIMARY KEY for each entity - if no appropriate candidate key exists, then plan to create SURROGATE key page

usefulness of DBMS software

- query the information into database - support huge data sets - provides durability (do not lose anything) - control access

Why reorganize an index? tradeoffs?

- reads go faster when indexed - inserts require not only data table updates, but also index updates - for applications that are insert-intensive, indexes can cause significant delays - for small tables, an index can actually hurt performance

group function/group by

- require sql to create an interim answer set, and then process the group function against the interim answer set, delivering a final answer set that contains only the final total for the function - when you combine a GROUP FUNCTION with a WHERE clause, keep in mind that the WHER clause simply reduces the number of rows in the INTERIM answer set before the GROUP function does its calculation

why has the relational model prevailed for over 30 years?

- solid, reliable operating principles - can grow very large (multi-terabytes) - efficient to manage, maintain - available talent in the marketplace - fabulous DBMS software alternatives - design approach is simple, limited yet versatile - data operations (via sql) are simple, limited yet versatile - SQL is very powerful: few lines of code = big results

B-tree

- supports both sequential access of rows AND keyed access - B = Balanced - B-tree indexes balance each leaf page node between half-full and full - B-tree indexes balance the levels of leaf pages - as pages get full, b-tree indexes will "split" to create more room for indexes to grow out - index has pre-allocated overflow space - as the overflow space gets full, the "split" leaf nodes get spread out - therefore B-tree indexes occasionally need to be reorganized, drop the index and recreate it

Subquery

1. WHERE. a query within a query. The answer set to an inner query is used as a predicate in a where clause in the outer query 2. SELECT or FROM derived value or table. Answer set to an inner query is used as a table in select or from clause 3. Subquery must be enclosed in ( )

Key (formal definition)

1. a set of attributes (A1, A2, A3, etc) is a KEY for the relation if that set of attributes functionally determines all the other attributes of the relation 2. No subset of those attributes (A1, A2, A3, etc) functionally determines all the other attributes of the relation

Steps in DB recovery

1. identify the most recent full backup 2. identify the latest transaction log file 3. id the point-in-time of the failure 4. restore the most recent full backup 5. apply transaction logs up to the last commit before the failure occured

Characteristics of Relational DB

1. purpose of a db is to help people track things of interest to them 2. data is stored in tables, which have many rows and cols like a spreadsheet 3. something we store data about is an entity 4. a db stores BOTH data (in tables) AND relationships (between tables) 5. a db may have multiple tables, where each table stores data about a different thing 6. each row in a table stores data about one occurrence of the thing of interest 7. each col is essentially one attribute of the entity or table 8. a db stores metadata - data about itself stored within itself

ACID compliance

A = atomicity - transaction is all or nothing C = Consistency - transaction must maintain data consistency I = isolation - multiple users/transactions cannot step on each other D = durability - once a transaction is finished, the results cannot be lost

Composite/Concatenated Key

A primary key that consists of two or more columns. Need NOT be contiguous.

DBMS must contain

ACID compliance

Rank in terms of complexity and price

Complexity: oracle - hardest IBM DB2 Microsoft SQL server MySQL Microsoft Access Expense: Oracle - $47000 SQL server - 15000 MySQL - 5000 MS Access - FREE

DML

Data manipulation language for querying and modifying the data (insert, update, delete, two formats for inserts, bulk load with insert)

Steps of normalization

First normal form: - remove any multi-valued cells and/or any rows requiring a specific sequence - remove repeating group of attributes - put repeating attributes into a new relation + ex. create a key for a new relation (concatenated key) the key of the original relation + an identifier for each occurrence of the Repeating Group of attributes Second Normal form: - for entities with concatenated keys, make sure that all attributes are dependent on the full key - only meaningful when there is a concatenated key - create a new relation - create a key for the new relation - migrate only the dependent cols to the new relation Third Normal form: - make sure that no attributes are dependent on any other non-key attributes - the same steps as 2nd normal form fourth normal form: - eliminate multivalued dependencies - multiple independent attributes are dependent on the same determinant

E.F. Codd

Founder of relational database, worked for IBM, wrote famous article descr. how relational model could eliminate lock between apps and data

handling dates

HH - represents hour (00 to 23) MM - represents the minutes SS - represents the seconds NNN - represents the fractional seconds

As of fall 2016 there are 4 out top 5 software products used in industry are relational. Which are these?

Oracle, MySQL, SQL server, PostgreSQL

constraint

R <constraint> S says that every tuple in R is also in S R = 0 is a constraint that says R must be empty

where

Restricts the rows returned by placing limits on the col values

primary key

a candidate key selected as the primary means of identifying rows in a relation (only one primary key per relation)

data modeling

a conceptual picture of your database, drawn with a computer based tool

schema

a description of the relation in this format(column-name1, column-name2) used when using SELECT

attributes

a fact of characteristics describing an entity

column

a fact of characteristics describing an entity

table

a general term for where data is stored in databases

What is a DBMS software?

a layer between the data and the application programs

trigger

a piece of code, associated with a TABLE, associated with an EVENT, the CODE fires when the EVENT happens

tuple

a row (a set, not a list each tuple must have an attribute that uniquely identifies that tuple, and no duplicate tuples, atomic)

superkey

a set of attributes within a table whose values can be used to uniquely identify a tuple

relation

a specific kind of table that follow rules such as each cell must hold a single value

relation

a specific kind of table that follows rules such as each cell must hold a single value

page

a unit of I/O, same as "block"

R <union> S

all elements in R or S or both

benefits of clustered index

allow for binary search foreign keys may be indexed other non-key columns may be indexed ("secondary" index) data rows will not be in secondary key sequence MySQL secondary indexes include the primary key values so that access by secondary keys can leverage the clustered index

preventing deadlock

allow users to issue all lock requests up front

why do we care about all of this?

must consider all data when designing a database, must organize data into relations. need to understand FD's to determine keys for every relation. Need keys so that you can normalize all of the data. Must normalize the data to design the database properly.

deadlock

occurs when tow transactions are each waiting on a resource that the other transaction holds

rules of "ONE"

one entity = one table one occurrence of that entity = one row one attribute = one column all cols in the row describe ONE occurrence of that entity all cos are functionally dependent on one primary key

product

one table multiplied by another table (interim answer set)

order by

options are ASC or DESC, defaults to ASC

data modeling process

prep begin main process

referential integrity

prevents inserting a row into a child table where the parent key value is missing

constraints

primary key null/not null unique check foreign key

foreign key

primary key of one relation that is placed into another relation to form a link between the relations. can be a single column or a composite key best enforces referential integrity

join

produce a new relation that combines matching tuples on a common attribute

theta join

produce a new relation that combines matching tuples on a condition

selection

produce a new relation that is a subset of tuples from R

cartesian product

produce a new relation that is the combination of every tuple in one table combined with every tuple in the other in mysql the CROSS JOIN produced a result set which is the product of rows of two associated tables when no WHERE clause is used with CROSS JOIN. result set appeared by multiplying each row of the first table with all rows in the second table if no condition introduced with CROSS JOIN product = one table multiplied by another table (interim answer set)

count

provides a count of how many rows have a value in a column, counted across many rows can be used with any column or with a (*) to simplify count rows

avg

provides the average of the values in a column across many rows must only be used with numeric columns

max

provides the highest value in a column across many rows

min

provides the lowest value in a column across many rows

sum

provides the sum of the values in a column across many rows must only be used with numeric columns

isolation levels

read uncommited, read commited, repeatable read, serializable

anomolies

redundancy - storing a piece of data more than once update/delete errors

What does 'serializable' transactions mean?

refers to two transactions that run concurrently and generate results that are consistent with the results that would have occurred if they had run separately

TRUNCATE statement

removes all rows, keeps structure

DROP statement

removes all rows, removes structure

What does the DBMS software do?

removes necessity for application programs to maintain a direct record/field mapping to the way the data is physically stored

how do locks implement currency control?

resource locking prevents multiple applications from obtaining copies of the same record when the record is about to be changed

distinct

restrict the returned results to have unique values - sql cannot easily determine whether or not a row is a duplicate of another row - answer set may contain duplicate rows - removes duplicates from the answer set

Outer Left/Right Join

returns ALL rows from LEFT/RIGHT table only matching rows from RIGHT/LEFT table

recursive relationship

same entity to participate in the relationship

DESCRIBE statement

shows you want mysql knows about a table

session memory

smaller temp tables - client specific memory - table read buffers - sort operations

entity

some identifiable person, place, thing, event

clustered index

sort and store the data rows in the table or view based on their key values are the columns included in the index definition can only be one clustered index per table, because the data rows themselves can be stored in only one order if table has no clustered index, its data rows are stored in an unordered structure (heap)

thread cache

splits up the execution paths of MySQL into multiple threads, one per user conenction, so they run in parallel

supertype/subtype

super type contains all common attributes, while the subtypes contain specific attributes super type may have a discriminator attribute which indicates the subtype

answer set

table (create and execute queries)

we often use which terms interchangably

table, relation, file, column, attribute, field, row, tuple, record

USE

tells the query engine which database you want to use for your query

Catalog consists of?

the catalog consists of tables. catalog tables contain information about tables, cols, rows, indexes, users, etc.

R <intersection> S

the set of elements in R and S

R <difference> S

the set of elements in R but not in S

insert

used to add new data to a table

group by

used to collapse row with identical values for one or more columns together

Alias

used to give a table, or a column in a table a temporary name often used to make column names more readable alias only exists for duration of the query

association/associate entity

used to resolve a many-to-many relationship by creating an ASSOCIATION("child") entity between the two "parent" entities a dependent child between two independent parents relationship to parents is mandatory


Conjuntos de estudio relacionados

Bach's Brandenburg Concerto No. 5 1st Movement vs. Maru Bihag (Hindustani Raga)

View Set

GEOG221 Lab#2 Earth - Sun Geometry

View Set

Chapter 07: Legal Dimensions of Nursing Practice

View Set

Psychosomatic Disorders & Malingering

View Set

Explorando la ciudad - Los lugares de la ciudad - Lista A

View Set

Anatomy - Brain and Cranial Nerves Guided Reading

View Set