COSC344

Ace your homework & exams now with Quizwiz!

Consider the following relation: Employee(SSN, name, departmentID, street, suburb, city, postcode), where SSN is the primary key. The above relation has the following non-trivial functional dependencies: (suburb, city) -> postcode postcode -> city. (a) Is this relation in 3NF? If not, how would you normalise it to 3NF? (b) Is this relation in BCNF? If not, how would you normalise it to BCNF?

(a) No its not in 3NF. This is because city, postcode, and suburb are all dependent on SSN. To normalise remove Postcode from the relation and make a new relation consisting of Suburb, City, and Postcode where the primary key consists of suburb and city. This eliminates the transitive dependency that Postcode has on SSN. (b) The resulting relation is not BCNF, because City which is a prime attribute that is dependent on Postcode which is a non-prime attribute.

For the COMPANY database used in the lectures, suppose the tables have been created based on the following relational schema. (diagram 3) Consider the following query: Retrieve all projects controlled by departments located in "Houston". Show only Pname, Pnumber, and Dname in the query results. (a) Give the relational algebra statement.

(check figure 4) [make figure 4 (:]

Give an example of a single-valued attribute

(this is an attribute that is fully explained by a singular value.) Region = "Otago"

List three benefits to use View in database.

- Allows us to hide parts of the database from certain users and therefore provides access control/security. - Allows customization in the way users access data depending on their needs. - Allows renaming of attributes or to change the order of attributes. - Simplifying complex operations on base tables.

Describe the recovery process when the deferred update approach is used.

- Find the checkpoint. - Scan forward - each transaction is added to 'active list'. And entries are bundled with their operation. -the transaction is then committed and the bundle is moved to the 'competed' list. -scan complete -process each transaction in the completed list.

List three types of NoSQL databases

1) key-value 2) wide-column 3) document 4) graph

part B of the disk block

1778 blocks

How many disk accesses are required to find a record based on the primary key using the B+ -tree? (C part for Block storage)

4

How many disk blocks required? How much wasted space in that block storage question?

8000 blocks and 192,000 wasted space.

What is the process of performing a checkpoint in a database system?

A checkpoint is a point at which a record is written into the log when the system writes all modified buffers to the database. The process is as follows: - suspend all active transactions -force write all modified buffers to disk -force write a checkpoint entry to log -resume all active transactions.

Describe the difference between a commit point in a transaction, and a checkpoint in system log.

A commit point occurs when all operations were executed successfully and are recorded in a log file, from this point the transaction is then committed and recorded in the database. A checkpoint is a point at which a record is written into the log when the system writes all modified buffers to the database on disk. all further changes become relevant to this checkpoint.

What is dense index? Which types of indexes are dense?

A dense index is an index that stores a pointer for every record in a file. Types of indexes are: - Primary index that stores an index for every record in a file. - a secondary index that indexes a key field that the file is not ordered by. - a secondary index on a non-key that stores a pointer to every record in the file, even if they share the same index value.

What is a dirty read problem in concurrency control? give an example.

A dirty read problem occurs when one transaction updates a database item and then the transaction fails for whatever reason. The updated item is accessed by another transaction before it is changed back to its original value. T1 read(x) x = x-1 write(x) T2 read(x) x = x+1 write(x)

primary key

A field (or group of fields) that uniquely identifies a given entity in a table

Define a foreign key in the relational model (fig 7)

A foreign key is used to link two related tables together via a JOIN statement. In the figure given the foreign key is dept_id. This is because it is derived from the departments table where it is the primary key.

Scenarios where a foreign key is added when converting from ERD to relational schema

A foreign key needs to be added when mapping: Multi-value attributes To identify which tuple a given element in a collection "belongs" to Weak Entities To uniquely identify the tuples of a weak entity Foreign Key + Partial Key = Primary Key of Weak Entity. Relationships (1:N, M:N, etc.) So we can identify tuples on either side of the relationship

Distinguish between a full functional dependency and a partial functional dependency. give examples:

A full functional dependency is where a set of attributes (e.g. ABC) fully determines another attributes value (e.g. D). For example Total_price is determined by quantity*price. A partially functional attribute is where we can remove an attribute and the set still get the same result. E.g. ABC -> D and AB -> D.

Explain in which situation a mutating table error can occur during the execution of a trigger.

A mutating table error occurs when a trigger or procedure attempts to execute on a table which is currently being modified by another statement. This can occur if you write a trigger to read from a table.

Definition of "Entity type" give an example.

A named collection of attributes which describes a type of entity. e.g. an entity type "location" may have the attributes "street number", "street name", "town".

foreign key

A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables

Distinguish between a trigger and a PL/SQL procedure.

A procedure can be called when required by the user, whereas triggers are called automatically by specific events determined when the trigger is written. Triggers cannot be called inside procedures, but procedures can be called inside triggers.

What is an attribute?

A property of an entity.

Relationship-type:

A relationship type to a relationship is what an entity type is to an entity. Each entity-type involved in the relationship participates in the relationship. The number of entity-types participating in the relationship determines the degree of the relationship. 2 entity-types means the relationship has a degree of 2 (A binary relationship). Through entity-types means a degree of 3 (a ternary relationship) etc.

What is an Entity. Give an example of each type of entity.

A representation of any "thing" in the real world with independent existence. This could be physical existence or conceptual existence. physical: location conceptual: job title.

distinguish between a superkey and a key in the relational data model.

A superkey is a set of attributes that can identify some tuple. A key is an attribute that can identify 1 tuple.

Distinguish between a trigger and a PL/SQL procedure.

A trigger is a stored subprogram associated with a table and is automatically invoked by the DBMS in response to specified changes to the database. PL/SQL is a block-structured language. Each logical block generally corresponds to a problem or a subproblem. PL/SQL Subprograms are NOT automatically invoked and are NOT associated with a table.

Domain:

A value set or range associated with each entity type, which specifies the set of values that may be assigned to the attributes of that entity.

Describe the advantages and disadvantages of using Sorted Files in a database system.

Advantages: - Reading in the order of ordering field is very efficient - Find the next record often needs no disk access since it's usually on the same block - Search on the ordering field results in fast access (binary search) - Search involving >, >=, <, <= on the ordering field is efficient Disadvantages: - Insertion and deletion are expensive - Modifying the ordering field may change its position, requires a delete followed by an insert. - No help for searches on non-ordering field

Candidate key

An attribute or set of attributes that can be used to uniquely identify any tuple in a table. It can be single attribute, or it can consist of multiple attributes but must be minimal.

Key attribute:

An attribute which must be a unique value. It can either be single or multi valued however it should always be minimal. e.g. StudentID.

Describe the differences between primary index, clustered index, and secondary indexes. Which are dense and which are non-dense?

An index is a number of files stored on a disk to provide alternative ways to access records, without having to alter the structure. A primary index are ones defined on ordered data files, whose ordering field is a key. (non-dense) Clustered indexes are defined on ordered data files where the ordering field isn't a key. (dense) Secondary indexes are 'secondary' to primary indexes. They are defined on unordered data files.(can be dense or non-dense)

List three characteristics of big data

Any of the V's: - Volume - Variety - Veracity - Velocity

List the four properties of transactions to deal with failures and concurrency problems

Atomic - transactions are all or nothing. everything occurs or nothing occurs Consistent- the database always moves between well defined states. Isolation- transactions are isolated, i.e transactions don't interfere with each other Durable- once the client has been notified of change, all change must be persistent.

Give an example of an atomic attribute:

Atomic: e.g. "gender = "male"

Describe the Bell-Lapadula Model used in the Mandatory approach for database security.

Bell-Lapadula Model enforces access control in government and military aspects. Each data object is given a classification level and each user is given a clearance level. Levels (in order): top secret > secret > confidential > none A subject S is not allowed to READ an object O UNLESS clearance(S) >= classification(O) A subject S is not allowed to WRITE an object O UNLESS clearance(S) <= classification(O)

Define Join ⋈ and given an example.

Combines attributes (columns) from one or more relations by common values. e.g. (PROJECT ⋈ Dnum = Dnumber DEPARTMENT ⋈ Dnumber = Dnumber DEPT_LOCATIONS) ? would join the tables by Dnum and display all the data.

define (Set) Union U

Combines the results of two or more queries and returns a single result containing all the tuples satisfied by the queries.

Distinguish between a composite attribute and multi-valued attribute. Give examples.

Composite attributes are made up of other values, e.g. 'address' is made up of 'number', 'street name', 'suburb', 'city' A multi-valued attribute is one with several values, e.g. spoken_languages could be 'english', 'french', 'spanish'

Give an example of a composite attribute.

Composite: e.g. full name is composed of first_name, middle_name, surname

The three-schema architecture enables data independence. Explain what data independence is. Distinguish between logical data independence and physical data independence. You can use examples to help explain them

Defined as the ability to change the schema at one level of a database system without having to change the schema at the next higher level Logical data: independence - The capacity to change the conceptual schema without having to change external schemas - e.g. add/remove record type, change constraints Physical data: independence - The capacity to change the internal schema without having to change the conceptual schemas - e.g. creating additional access structures

What is the discretionary and mandatory approach?

Discretionary is where each user is granted certain rights with respect to some entity in the database. E.g. User1 can read the customer table but not perform write statements. Mandatory is where each data object is tagged with a classification level. And depending on that level has certain permissions in the system.

Give four situations where trigger can be used.

Enforcement of complex business rules. Enforcement of some types of referential integrity. E Auditing purpose (creating audit log). Automatic generation the values for derived attributes Creation of replica tables and backup files

(Set) Intersection ∩

For two sets A & B the Intersection is the set that contains all the elements common to A & B.

(Set) Difference -

For two sets A & B written A-B, the Difference is the set of elements unique to A which are not elements of B.

List the scenarios in which a foreign key needs to be added into a relation when mapping from ERD to relational schema.

Foreign keys need to be added when mapping the following: - multi-value attributes - weak entities - relationships such as 1:N or M:N

Relationship attributes

Just as entity-types can have attributes. So can relationship types.

What is map reduce? explain how it works

MapReduce consists of two main parts (map and reduce). The Map part splits a task into smaller items, and the Reduce part combines or aggregates the results back together. An example is counting words in a book. The task is much easier if you split the task into smaller sub tasks. So you get say 10 computers, each doing a 10th of the book. Once each sub task is complete, you aggregate the totals together to get the main total of each word.

what is map reduce? explain how it works.

MapReduce consists of two main parts (map and reduce). The Map part splits a task into smaller items, and the Reduce part combines or aggregates the results back together. An example is counting words in a book. The task is much easier if you split the task into smaller sub tasks. So you get say 10 computers, each doing a 10th of the book. Once each sub task is complete, you aggregate the totals together to get the main total of each word.

What is NoSQL? give four features.

NoSQL is a category of non-relational database management systems that are only suited for huge volumes of data as they scale better than relational systems. Features include: -horizontal scaling -no strict schemes -ACID not guaranteed -allows fast queries on large datasets

If a view is updated, the underlying tables may not be updated properly. Give two types of views that are generally not updatable.

One type of view which may not be updated properly is one that is defined on multiple tables using joins. Another is a view which is defined by using grouping and aggregate functions.

what is a PL/SQL procedure?

PL/SQL is a block-structured language. Each logical block generally corresponds to a problem or a subproblem. PL/SQL Subprograms are NOT automatically invoked and are NOT associated with a table.

Participation constraints:

Participation constraints specify if an entity necessarily participates in a relationship. There are two constraints : total and partial.

Distinguish the two approaches for view implementation: query modification and view materialisation

Query modification is when we transform view into a temporary query, and then modify/transform the view query into a query on the underlying base tables. View materialisation is when we transform view into temporary tables. We physically create a temporary table when the view is first queried and then incrementally update to keep the tables updated with the underlying base tables.

Explain the referential integrity constraint in the relational data model.

Referential Integrity constraint: Specified between two relations. Maintains consistency among tuples in two relations. Informally, it states that a tuple in one relation that refers to another relation must refer to an existing tuple in that relation.

define Rename ρ

Results from relational algebra are themselves unnamed relations. The rename operation allows us to rename the output relation.

Give an example of a correlated sub-query:

SELECT * FROM employee outer WHERE 5 < ( SELECT SUM(hours) FROM works_on inner WHERE outer.ssn = inner.ssn)

For the COMPANY database used in the lectures, suppose the tables have been created based on the following relational schema. (diagram 3) Consider the following query: Retrieve all projects controlled by departments located in "Houston". Show only Pname, Pnumber, and Dname in the query results. Give the SQL statement based on the joined tables.

SELECT Pname, Pnumber, Dname FROM Project, Dept_locations, Department WHERE Dlocation = "Houston" AND Project.Dnum = Dept_Locations.Dnumber AND Dept_Locations.Dnumber = Department.Dnumber

For the COMPANY database used in the lectures, suppose the tables have been created based on the following relational schema. (diagram 3) Consider the following query: Retrieve all projects controlled by departments located in "Houston". Show only Pname, Pnumber, and Dname in the query results. Give the SQL statement based on correlated subquery, and explain how it is executed.

SELECT Pname,Pnumber,Dname FROM Project, Department WHERE Department.Dnum IN (SELECT Dnumber FROM Dept_Locations WHERE Dlocation="Houston") OR Project.Dnum IN (SELECT Dnumber FROM Dept_Locations WHERE Dlocation="Houston")

what are the heuristic rules used in query optimisation?

SELECT and PROJECT as early as possible to reduce tuples and attributes. JOIN as late as possible to reduce large intermediate data tables.

List two heuristic rules used in query optimisation and explain why they are efficient in execution.

SELECT early as it reduces the number of tuples in a query PROJECT early as it reduces the number of attributes involved in a query process.

Define what: Select (σ) does. Give an example

Selects tuples that satisfy the given predicate from a relation e.g. σ pokedex_num = 001(POKEMON) would return the tuple containing information on Bulbasaur

Define what: Project Π does: Give an example.

Shows project attributes that satisfy a given predicate. e.g. Π pokedex_num, name(POKEMON) Would return a table consisting of two columns: pokedex num, and name. The order of the columns will be the same as it is in the original POKEMON table

Describe the steal/no-force mechanism used for data recovery.

Steal allows writing an updated buffer to disk before the transaction commits. No-force means when a transaction commits, the changes made to actual objects are not required to be written to disk. The Steal/No-Force approach accepts the risk of writing possibly uncommitted data to memory to gain the speed of not forcing all commit effects to memory.

What is the 'lost update problem'? give an example and explain how it occurs

T1: - R(x) x+=10 T2: -R(x) x+=15 w(x) commit w2(x) commit T1 would be overwritten. The problem is when we have two seperate processes running at the same time on the same value which can then overwrite the final value or provide a wrong end value.

whats the difference between a candidate key and a key/minimal superkey?

Technically nothing, both are used in a minimal set of attributes that can identify a tuple. However a candidate key can be more than 1 identifiers.

in Oracle DBMS, what are two main issues that transactions need to deal with? What are ACID properties for transactions?

The ACID properties can be defined as: Atomicity: transactions fully occur or they dont. Consistency: transaction which causes the database to move between two consistent states Isolation: Transactions are not visible/affect each other. Each is isolated. Durability: Once a transaction has been committed successfully the changes must be persistent. The two main issues to deal with are failures of software/hardware (A&D) or concurrency which is simultaneous execution of multiple transactions (C&I).

Describe the locking algorithm for concurrency control and explain how the locking mechanism deals with the lost update problem.

The locking algorithm works by using read_lock and write_lock when accessing attribute values which eliminates the chance of a value being being overwritten by a concurrent transaction, as the second process wouldn't acquire permission while the value is being altered already.

Consider diagram (1) What problems does this ER design have? Explain how to revise the ER diagram and turn it into a good design. Give your revised design below (diagram 2 [make a model and update this])

The problems with this ER design is that on works_for the cardinality is the wrong way round. Every department should have N lecturers but every lecturer should have 1 department. The key-attributes should also have an underline, along with paper should be its own entity not a multi-value attribute.

The three-schema architecture for database systems separates the user applications from the physical database by defining schemas at three different levels. Give the schema defined at each level and explain what it represents. You can draw a diagram in the box provided below to help explain them.

Three-Schema Architecture: to separate the user applications from the physical database. External level: Describes the part of the database that a particular user group is interested in. Hides the rest of the database from that user group Conceptual level: Hides the details of the physical storage structures. Concentrates on describing entities, data types, relationships, user operations, and constraints Internal level: Uses a physical data model. Describes the physical storage structure of the database

Give an example of a total and partial participation constraint:

Trainer has total participation in "trained_by" because without at least one pokemon, they are not pokemon trainers. Pokemon has only total participation in "trained_by" because pokemon can exist in the wild without trainers.

Explain what transactional commit and Transactional Rollback are.

Transactional commit is a logical unit fo work successfully completed. From here the database is in a consistent state and all updates can be made permanent. Transactional rollback is when something has gone wrong and the database is in an inconsistent state and thus all updates must be undone or rolled back.

Why are transactions used in a database system? What are the four properties of transactions?

Transactions are used to guarantee data is stored properly. This also provides a mechanism to stop problems like deadlock, etc. The 4 properties of transactions are ACID: -Atomic -Consistent -Isolation -Durable

List three situations where triggers can be used

Triggers can be used to maintain referential integrity, to allow complex business logic to be executed and to allow auditing (see who modified a table when and how)

Explain the difference among USER, ALL and DBA used as the prefix of System Catalog in Oracle.

USER: user's view (what is in the user's schema) ALL: expanded user's view (what the user can access) DBA: database administrator's view (what is in all users' schemas)

List two reasons why view is needed. What is view materialisation?

View materialisation is a sort of optimisation. It is responsible for generating 'real tables' based on a view and making sure its kept consistent One reason this is needed is to provide an easy way to perform complex queries. Another reason is to allow for decoupling result sets from the query itself.

When would we use a foreign key?

We use a foreign key when we need to map two tables together which both share the primary key of one table.

How would you restrict a user access using a view?

We would do this by creating a view with classification levels. So sensitive data cannot be viewed by users without the right classification, to them they wouldn't know the data even exists.

Explain how the wound-wait technique can be used to prevent deadlock.

Wound-Wait scheme. It is a preemptive technique for deadlock prevention. It is a counterpart to the wait-die scheme. When Transaction Tn requests a data item currently held by Tk, Tn is allowed to wait only if it has a timestamp larger than that of Tk, otherwise Tk is killed (i.e. Tk is wounded by Tn).

suppose a DEPARTMENT relation has a derived attribute named "total_salary", and a trigger needs to be implemented to maintain the value of this derived attribute. Give two different designs for such a trigger.

You can either implement this with a row-level or statement-level trigger. A row-level trigger is for data-related activities and will execute once for each row in a transaction and is identified by a FOR EACH ROW clause. A statement-level trigger executes once for each transaction. A statement trigger therefore E.g. if a single transaction inserted 500 rows into the Employee table, then a statement-level trigger on that table would only be executed once.

Given a relation R { A, B, C, D, E} and a set of functional dependencies F = { AB → C, C → E, AE → B}, what is the closure of {A, B} under F? Is {A, B} a candidate key? Explain your answer.

a candidate key is an attribute or set of attributes that can be used to uniquely identify any tuple in a table. {A,B} is not a candidate key because it doesn't include D in the relation. = {A, B} (from itself) = {A,B,C} (From AB -> C) = {A,B,C,E} (From C -> E) = {A,B,C,E}

superkey:

a combination/set of columns that can uniquely identify any tuple in a table. The superkey does not need to be minimal, i.e. it can contain more columns than necessary.

Key:

a minimal subset of attributes that uniquely identifies a tuple. If you remove one more attribute, then the key can no longer uniquely identify the relation.

what is a query tree? what is the execution order of a query tree?

a query tree is a data structure representing a relational algebra expression. The tables of the query are represented as leaf nodes, and algebra operations as internal nodes. The order of execution starts from the leaf and makes its way up to the root node.

In a relational model describe the difference between a relation, tuple, and attribute:

a relation is a table where the attributes of the relation are the column headers. Tuples are the rows in the relation. An attribute is a column header in a table.

What is a stored attribute and derived attribute

a stored attribute is an attribute which has been stored in the database, whereas a derived attribute comes from other values in the database. e.g. totalPrice would be derived from every item's price combined.

What is a trigger?

a stored subprogram associated with a table that is automatically invoked by the DBMS in response to specified changes in the database.

What is the system catalog in Oracle?

a system catalog in oracle is a data dictionary which allows access through its views.

Explain the difference between a weak entity type and regular entity type.

a weak entity is one one that can only be identified by combining its partial key with the key of its owning entity. A regular entity is fully identifiable by its candidate key.

What is a weak entity type? Use an example to explain.

a weak entity type is one that cannot be identified independently from its owning entity. E.g. a weak entity could be a DEPENDENT of an Employee.

define a prime and non -prime attribute

any attribute which is a member of a candidate key is called a prime key, al other attributes are non-prime.

What is write-ahead logging?

changes to data files must be written or overwritten only after those changes have been logged and flushed to disk.

Give an example of a multi-valued attribute

classes = "COSC344", "COSC349"

Cardinality:

every relationship has a cardinality ratio which specifies the number of relationship instances an entity can participate in. e.g. a pokemon gym can only give out one type of badge. And a badge can only be associated with one gym. So we have a 1:1 relationship.

define hash files

hash files are of hashed records. A hashing function is used when records are inserted meaning a collision resolution strategy will be needed. It offers fast insertion and deletion and is efficient at storing files.

define a heap file

heap files are unordered records which offer fast insertion but slow searching and expensive sorting.

describe the difference between heap files, sorted files, and hash files in a database system.

heap files are unordered records which offer fast insertion but slow searching and expensive sorting. sorted files are records which have some kind of ordering field that allows them to be ordered relative to each other. This offers fast searching on the relative field but insertion/deletion is complex.

explain the referential integrity constraints in the relational data model.

referential integrity constraints ensure that there is a consistency maintained between two relations. it ensures the relationship between tables in a database remain accurate by applying constraints to prevent incorrect data.

Distinguish between statement level triggers and row-level triggers

row-level triggers are executed once per each row affected by the trigger statement, whereas statement level triggers execute only once, even if multiple rows are affected by the trigger statement.

define a sorted file

sorted files are records which have some kind of ordering field that allows them to be ordered relative to each other. This offers fast searching on the relative field but insertion/deletion is complex. hash files are of hashed records. A hashing function is used when records are inserted meaning a collision resolution strategy will be needed. It offers fast insertion and deletion and is efficient at storing files.

why is steal/no-force most commonly used?

steal is when cache buffers updated by a transaction may be flushed to disk before the transaction commits. no-force is when a transaction commits, the buffers are not required to be written to disk. both of these are primarily used as crash recovery.

weak entity-types:

weak entity-types do not have their own key attributes. They are identified by their relationship with specific entities from other entity-types.

Relationship:

when an attribute of an entity-type refers to a different entity-type. e.g. Customer and Sale.


Related study sets

ENGL1102: How to Read Poetry Quiz

View Set

Ch. 29 Care of Patients With Noninfectious Upper Respiratory Problems Review questions

View Set

Econ Unit 1 and 2 AP classroom questions

View Set

Unit 4 - intestinal and Rectal Disorders

View Set

Ch. 28 Care of Patients with Burns Evolve Ignatavicius Iggy

View Set

PrepU-Fluid & Electrolyte Balance

View Set

Chapter 10- Brain Damage and Neuroplasticity

View Set

OB: chapter 23: postpartum complications

View Set