Database Tech Test review
A phantom read occurs when 2 identical queries are executed and the collection of rows returned by the queries are identical
FALSE
Any relation scheme R which is constrained by functional dependencies has a lossless and dependency-preserving decomposition into BCNF
FALSE
In general one can not represent graphs as relations
FALSE
Isolation level READ COMMITTED is higher level isolation than SERIALIZABLE
FALSE
R ⊲⊳ R = R under bag semantics.
FALSE
The ACID rules are a set of properties whose primary purpose is to guarantee that database indexing is processed reliability
FALSE
Under functional dependencies F over R[Z] where Z is the set of attributes of R, if {X}+ = Z then X is a candidate key of R
FALSE
XML document must have an associated DTD to evaluate XPATH queries over it
FALSE
You may equivalently write an ER diagram expressing a trinary (3-ary) relationship between 3 entities as 3 binary relationships between 3 entities
FALSE
functional dependency is a statement that when 2 tuples agree on some particular set of attributes they must also agree on all the other attributes in the relation
FALSE
query preprocessor executes before the query parser in a typical relational database system architecture
FALSE
semi-structured data expressed in XML holds strictly to 1NF but not necessarily 3NF or BCNF
FALSE
Query optimization occurs primarily to insure that buffer pages are associated with individual transaction
FALSE - query optimization determines the most efficient way to execute a query by considering a possible query plan
The decomposition of R[ABCDE] into R1[ABC] R2[DE] R3[AE] under fd: AB->CD A->E is lossless
FALSE -decomposition of R into R1 and R2 is lossless-join decomposition if at least on eof the following function depends on F+ R1 ∩ R2 -> R1 or R1 ∩ R2 -> R2
XPATH uses FLWR expressions
FALSE , xquery
We can represent all possible functional dependencies through multi way relationship in ER diagrams via arrow notation
False
R1 ∩ R2
R1 - (R1 - R2) R1 ⨝R2
List the total times spent on all conversations grouped by country combination
SELECT FC.phone, TC.phone, SUM(ABS(start-end)) as 'total time' FROM Call JOIN Country AS FC ON FC.phone = Call.fromPhone JOIN Country AS TC on TC.phone = Call.toPhone GROUP BY FC.phone, TC.phone;
For any relation schema there is a dependency preserving decomposition into 3NF
TRUE
For the universal relation R(ABCDEF) under FDs: AB -> CE C->EB E->D C->D the following decomposition is dependency preserving R1(BF) R2(ACB) R3(CDE)
TRUE
If a schema is in BCNF with respect to a set of functional dependencies F then it is also in 3NF with respect to F
TRUE
XML is an example of a semi-structured data model
TRUE
an attribute declared UNIQUE may unless specified otherwise have NULL values
TRUE
if X->Y and X U Z -> Y-Z
TRUE
if A->B and BC->D then AC->D
TURE
The log manager often request that buffers be written to storage
True
In SQL with recursion - give all the phone numbers that were called from +4687909103 or called from phone number called by +4687909103, or called rom a phone called from ... on September 13 2009
WITH RECURSION phoneLog(phoneNo) AS ( SELECT fromPhone FROM call WHERE fromPhone = '+4687909103' AND start = '2009-09-13' UNION SELECT toPhone FROM phoneLog JOIN CALL ON phoneLog.phoneNo = Call.fromPhone WHERE start = '2009-09-13' ) SELECT phoneNo From phoneLog;
lossless decomposition
a decomposition {R1, R2, ..., Rn} of a relation R is called lossless decomposition for R if the natural join of R1, R2, ..., Rn produces exactly the relation R
Theta join R1 ⨝θ R2 = σθ (R1 × R2)
a join that involves a predicate - here θ can be any condition
superkey
a set of attributes that functionally determine all the attributes, with no requirement of minimality - a combination of attributes that can be uniquely used to identify a database record - there may be the combination of redundant attribute
eq-join R1 ⨝A=B R2 = σA=B (R1 × R2)
a theta join where θ is an equality - this is by far the most used variant of join in practice - basically just a join with an equality
transaction state
active -> partially committed active -> failed partially committed -> committed partially committed -> failed committed -> terminate failed-> aborted (rollback) aborted ->terminate - cannot rollback transaction - compensating transaction ( new transaction)
{t.boss|payroll(t)∧∃s(payroll(s)∧s.boss = t.boss∧ s.name 6= t.name ∧ ¬∃r(payroll(r) ∧ r.boss = t.boss ∧ r.name 6= s.name ∧ r.name 6= t.name))}
all bosses with exactly 2 employees
2NF
all the non primary attributes should be fully functionally dependent on each candidate key
bossγAV G(salary)(payroll)
averaged salary grouped by boss
πa.name(σa.salary<b.salary∧b.boss=a.name(ρa(payroll)×
bosses with an employee who earns more than their own salary
{t.name|payroll(t) ∧ ¬∃s(location(s) ∧¬∃r(payroll(r)∧ r.boss = t.name ∧ r.dept = s.dept))}
bosses with employees on all floors
πboss(σf loor=1(payroll ✶ location) ✶ πboss(σf loor6=1(payroll ✶ location)))
bosses with employees on the first floor and at least on other floor
candidate keys (minimal superkeys)
can be any column or combination of columns that can qualify as unique key in database - can qualify as a primary key - attribute that is not present in the right hand side, will be present in the candidate key - super key that is not reducible further ( no redundancy)
serializable
changes the isolation level of the current transaction]] - doesn't allow phantom read because it locked the table - if plan to delete 4 it wont insert 4 - at end won't even delete 4
Renaming ρB1,...,Bn (R)
changes the schema, not the instance - example: ρN, S(Employee) -> Answer(N, S
preprocessor
component of the parser that perform semantic validation - runs after the parser has built the parser tree
transaction
consists of either a collection of DML statements or a DDL or DCL or TCL statement to form a logical unit of work in a database session. - either you execute program completely or not at all (no partial execution) - set of instruction which performs logical work ( works in atomic nature, either execute or not execute) - interacts with database
DDL
data definition language data description language -syntax similar to a computer programming language for defining data structures, especially database schemas
snapshot
does not hold the lock on table during the transaction so the table can be modified in other sessions - no phantom
δ
duplicate elimination - extensions for bags
Cartesian Product R1 × R2
each tuple in R1 with each tuple in R2 - vary rare in practice; mainly used to express joins
Projection ΠA1,...,An(R)
eliminates columns - example: project social-security number and names Π_{SSN, Name} (Employee) Answer (SSN, Name) semantics differs over set over bags
{t.boss|payroll(t) ∧ ∃s(payroll(s) ∧ s.boss = t.name)}
every boss over at least one other boss
create view d(e,f) as select h.name, sum(h.salary) from payroll h natural join location g group by h.name;
everyones name and their salaries
a bag is the set that allows subsets as elements
false
chase-test
fixed-point algorithm testing and enforcing implication of data dependencies in database system - can be used to test that an arbitrary decomposition of universal relation is lossless
γ
grouping and aggregation -extensions for bags
READ committed
guarantees that any data read was committed at the moment is read - restricts the reader from seeing any intermediate, uncommitted, 'dirty' read -default transaction isolation level
dependency preserving
if closure of functional dependencies after decomposition is the same as the closure of Fd before decomposition - check if we can derive all the original fd from fd present after decomposition
composite index
index on 2 or more columns -where fistName = '' and lastName='' -always use the index by itself
Concurrency control
is a database management systems (DBMS) concept that is used to address conflicts with the simultaneous accessing or altering of data that can occur with multi-user system
⨝
join
create view a(b,c) as select distinct x.name, sum(x.salary)/count(*) from payroll x, payroll y where x.name=y.boss group by x.name;
manager name and their salaries
R◦⊲⊳S (i.e. outer join) and further assume that R and S share only attribute A and A is a primary key of S
max = m + n min = n
if R = m and S = n what is the max and what is the min number of tuples in σC (R) × S
max = m xn min = 0
R ⊲⊳ S and further assume that R and S share only attribute A and A is a primary key of S
max = max (m,n) min = 0
Natural Join R1 ⨝R2
meaning: R1⨝R2 = Π_{A}(σ(R1 × R2)) where: -the selection σ checks equality of all common attributes - the projection eliminates the duplicate common attributes
null approach
one tupler per entity but these tuples are long, they have components for all attributes whether or not they are appropriate for a given entity
select distinct( floor(sum(t.c)/sum(s.f)*100)) p from a t right outer join d s on t.b = s.e;
percentage that manager's salaries are of the total take
In relational algebra which phone number have the people named michael minock called
pi_{call.toPhone}(sigma(name="Michael Minock") (HasName) join Uses) join Call.fromPhone = Uses.phone (Call))
Π
projection
unrepeatable read problem
read 10 read 15 - every transaction think it has isolation (only one executing) - how did the value change to 15? - who changed the value -result of inconsistency
Dirty Read
read of uncommitted data - data that is currently being modified by a transaction - read uncommitted and nolock should be avoided; does the road there database is full of data anomalies that are extremely hard to fix
ρ
rename
Selection σc(R)
returns all tuples which satisfy a condition -examples σSalary > 40000(Employee) σname = "Smith"(Employee) * the condition c can be =, <, <= , >, >=, <>
repeatable read
select query data of table that is unused under transaction of isolation level - cannot be modified from any other sessions till transaction is completed - what i read is exactly is exactly the same as what i read before the transaction) - doesn't allow you to delete it until the transaction finished (inserted and that because what needs to happen before)
σ
selection
closure of attribute (A+)
set of attributes that can be determined from A
straight-E/R method
several tuples for each entity but only the key attributes are repeated - this method could use more or less space than the nulls method
τ
sorting -extensions for bags
nonclustered index
the data is not where the index is, points to the data -firstName = 'John' - dont want index on something not used a lot
1NF
the domain of each attribute contains only atomic (indivisible) values
clustering index
the index contains keys built from one or more columns in the table or view - a special type of index that reorders the way record in the table are physically stored -reorganizes the actual data - can only have one of those -faster and better, just one
3NF
there should not be any transitive relationship or dependencies among non primary attributes
read uncommitted
transaction level does not wait for committed values on table
SELECT * FROM R UNION SELECT * FROM R always yields exactly the same results as SELECT DISTINCT * FROM R;
true
XML as a data representation language does not meet first normal form
true
Xquery uses the same data model as XPATH
true
ISA
we connect an entity set to its subclasses using this relationship - every isa relationship is one to one - has components from the parents - relates a subclass to its superclass
phantom read problem
x = 5 delete x in other transaction read x - x does not exist
cascade delete
"When the referenced row is deleted from the other table (master table), then delete also from the child table".
3NF synthesis algorithm
- Let F be the set of all FDs of R - We will compute a lossless-join, dependency-preserving decomposition of R into S, where every relation in S is in #NF 1. Find a minimal bassis for F, say G 2. For every FD X -> A in G, use X∪A as the schema for one of the relations in S 3. If the attributes in none of the relation in S form a superkey for R, add another relation to S whose schema is a key for R - find minimal basis * check if 1 doesn't imply other closure of each attribute on left) * check if you can break the left down *get rid of redudant attrib/ dep - take attribute of each FD as a relation * remove ones that are subset of antoher * must consider the key (at least one)
object oriented approach
- entities are allowed to have representative in a tree of entity sets - objects however are assumed to exist in exactly one class or subclass - object would create new entity set - ER model would just put it into both * only one tuple per entity and that tuple has components for only those attributes that make sense for the entity (minimum possible space usage)
Conflict Serializable Schedule
- non serializable --> chance it is inconsistent; more efficient - serial --> consistent conflicting instruction - must belong to different transaction - must operate on same data value - at least one of them is a write instruction - serializable - after swapping non conflicting instruction from non-serial and it becomes serial
<company name="Acme"> <dept name="Admin" floor="3"> <emp name="Johan Sund" salary = "34000" boss = "Larry McDonald" /> ... </dept> ... </company> Give an XPath expression that returns the names of all the departments on the fifth floor with at least one employee with salary over 40000
//dept[@floor ='5']/emp[@salary>40000]/../@name
BCNF decomposition algorithm
1. Check whether R is in BCNF. If so, nothing more needs to be done. Return {R} as the answer 2. If there are BCNF violations, let on be X->Y. Use algorithm for Closure of set of Attributes to comput X+. Chose R1 = X+ as one relation schema and let R2 have attributes X and those attribute of R that are not in X+/ 3. Use algorithm for project a set of Functional Dependencies to compute the sets of FD's for R1 and R2; let these be S1 and S2 4. Recursively decompose R1 and R2 using this algorithm. Return the union of these results of these decompositions
Computing a Minimal Basis
1. If there is an FD D in F that follows from the other FDs in F, remove D from F 2. Let Y -> B be an FD in F with at least 2 attributes in Y and let Z be Y with on e of its attributes removed. If Z-> B follows from the FDs in F, replace Y-> B by Z-> B 3. Repeat the first 2 step until no more changes can be made to F
parser
1. checks the statement for syntactic and semantic validity 2. determines whether the process issuing the statement has privileges to run it 3. allocates a private SQL area for the statement
Relational Algebra has 2 sementics
1. set semantics - no duplicates 2. bag semantics - duplicates allowed - number of duplicates is significant - SQL uses bag semantics by default
advantages of concurrency
1. waiting time (less) 2. response time (less) 3. resource utilization (high) 4. efficiency (high)
BCNF
A -> BCD BC -> AD D -> B - R is in 3NF - for each function dependencies X-> Y (X should be a super key) - as the determinant function (left side) is a key then it is BCNF - does not always preserve given dependencies (3nf does)
ACID
Atomic - basic fundamental unit of the transaction (transaction always execute in atomic nature - must either all execute or none at all); (transaction management component) Consistency - if the database was consistent before execution of transaction it should remain consistent after the execution (responsible of the programmer if other 3 work, consistency) Isolation - logical isolation, if a transaction is executed with another transaction if the result is the same as if it was executed in isolation --> isolation (concurrency control component takes care of isolation) Durability - whatever changes you make that change must persist in the database with respect to hardware and software(recovery management component) - set of properties of a database transaction intended to guarantee validity even in the even of errors, power failurs, etx
if you would like a constraint to be able to be violated within a transaction you should declare it
DEFERRABLE INITIALLY IMMEDIATE
completely remove table a from the database
DROPTABLE TABLE A CASCADE