COSC 304 Final Exam review
Return the employees who are either directly supervised by 'R. Davis' or directly supervised by 'M. Smith'
(SELECT E.ename FROM emp as E, emp as M WHERE E.supereno = M.eno and M.ename = 'M.Smith') UNION (SELECT E.ename FROM emp as E, emp as M WHERE E.supereno = M.eno and M.ename = 'R.Davis')
What does a typical OLAP query on a star schema involve?
- A star join connecting the dimensions to the fact table - Selection of the data using the dimensions - Group by one or more dimensions - Aggregate the numeric facts/values ex) For each customer in the city "New York" find the total number of products ordered from supplier "ABC": SELECT C.name, SUM(amount) FROM OrderItem OI, Customer C, Supplier S WHERE OI.custID = C.id and OI.suppID = S.id and S.name = 'ABC' and C.city = 'New York' GROUP BY C.id, C.name;
Transactions have the following properties:
- Atomicity: Either all operations of the transaction are properly reflected in the database or none are - Consistency: Execution of a transaction in isolation preserves the consistency of the database - Isolation: Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions - Durability: After a transaction successfully completes, the changes it has made to the database persist, even if there are system failures
What are the aggregate functions?
- COUNT - SUM - AVG - MIN - MAX
What are some integration problems?
- Different data models and conflicts within a model - Incompatible concept representations - Different user or view perspectives - naming conflicts
A view is only updatable if?
- Does not use DISTINCT - Every element in SELECT is a column name - contains all fields of base relation that are non-null - FROM clause contains only one table - WHERE does not contain any nested selects - No GROUP BY or HAVING in defining query
What are the main interfaces of the JDBC API?
- Driver - Connection - Statement - ResultSet - DatabaseMetaData
What factors do you consider when selecting a DBMS?
- Features - Compatibility - Performance - Price
What are some choices for database hosting?
- Local machine - On-premise physical or virtual machine - Cloud based hosting
FLWOR expressions ("for-let-where-order by-return") are similar to SQL and consist of:
- One or more for and/or let clauses (bind variables) - An optional where cluase (filters bound tuples) - An optional order by clause - A return clause (generates output) Variables begin with a dollar sign "$"
What does an XML document consist of?
- XML declaration line indicating the XML version - Elements (or tags) called markup. Each element may contain free text, attributes, or other nested elements. - Attributes - Comments
How do you add a new row to a ResultSet
//rs will be scrollable and updatable record set Statement stmt = con.createStatement( ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE); ResultSet rs = stmt.executeQuery("SELECT eno, ename FROM emp"); rs.moveToInsertRow(); // move cursor to insert row rs.updateString(1, "E9"); rs.updateString("ename", "Joe Smith"); rs.insertRow(); //Insert new row in DB rs.moveToCurrentRow(); //Move cursor to row you were on before insert
How do you update an existing row with an Updatable ResultSet
//rs will be scrollable and updatable result set Statement stmt = con.createStatement ( ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE); ResultSet rs = stmt.executeQuery("SELECT eno, ename FROM emp"); rs.absolute(2); // Go to 2nd row rs.updateString(2, "Joe Blow"); // Change name of employee rs.updateRow(); //Update data source
What two components does a namespace consist of?
1) A declaration of the namespace and its abbreviation 2) Prefixing tag names with the namespace name to exactly define the tag's origin
Once a document is parsed in XML what one of two interfaces are used to manipulate the document?
1) DOM (tree based) 2) SAX (event-based)
What are the properties of a relation?
1) Each relation name is unique 2) Each cell of the relation (value of a domain) contains exactly one atomic (single) value 3) Each attribute of a relation has a distinct name 4) The values of an attribute are all from the same domain 5) Each tuple is distinct 6) the order of attributes is not important 7) The order of tuples has no significance
A set of FDs F is minimal if it satisfies these conditions:
1) Every FD in F has a single attribute on its right hand side 2) We cannot replace any FD X -> A in F with Y -> A where Y ( X and still havea set of dependencies that is equivalent to F 3) We cannot remove any FD and still have a set that is equivalent to F
What are some data warehouse tools?
1) Extraction and cleansing tools 2) report design tools 3) analytical (OLAP) user tools 4) administration and management tools
What are the challenges with Relational Databses?
1) Fixed Schemas - The Schema must be defined ahead of tim, changes are difficult and lots of real world data is messy Solution: Get rid of schemas 2) Complicated Queries - SQL is declarative and powerful but may be overkill Solution: Simple query mechanisms and do a lot of work in code 3) Transaction overhead - Not all data and queries need to be perfect 4) Scalability - Relational databases may not scale sufficiently to handle high data and query loads or this scalability comes with a very high cost
What are the steps to building a data warehouse?
1) Identify the data available in operational systems that is necessary for analytical queries 2) Build a model for the data warehouse, usually as a star schema 3) write extraction and cleansing routines (often using design tools) for exporting data from operation systems and importing it into the data warehouse 4) develop pre-made analysis reports and queries on the data warehouse and provide access to ad hoc query facility 5) formalize procedures for maintenance of the data warehouse and associated metadata including backup, refresh procedure, user access, and how changes to operational systems will be supported
What are some view limitations?
1) If a view contains an aggregate function, it can only be used in SELECT and ORDER BY clauses of queries 2) A view whose defining query has a GROUP BY cannot be joined with another table or views
What are the three major types of update anomalies?
1) Insertion anomalies 2) Deletion anomalies 3) Modification anomalies
What is the general process for programming with a database
1) Load the database access library 2) create a connection to the database 3) execute a SQL command 4) retrieve database results produced by the command 5) Close the database connection
What are some not relational NoSQL systems
1) MapReduce - For large scale analysis 2) Key-value Stores - For specific data records from a large set of data 3) Document stores - similar to key value except value is a document in some form (ex JSON) 4) Graph databases - represent data as graphs
What are two challenges in preserving consistency
1) Must handle failures of various kinds 2) Must support concurrent execution of multiple transactions and guarantee that concurrency does not cause inconsistency
What are the two special types of statement objects?
1) Prepared statement 2) Callable statement
What are the two types of triggers?
1) Row Level 2) Statement Level
How do you convert a non-1NF relation to a 1NF relation?
1) Splitting method 2) Flattening method
Well designed relational schemas have several important properties:
1) The most basic property is that relations consist of attributes that are logically related 2) Lossless-Join-Property: ensures that the information decomposed across many relations can be reconstructed using natural joins 3) Dependency preservation property: ensures that constraints on the original relation can be maintained by enforcing constraints on the normalized relations
What are the two important variations of executing statements?
1) The statement executed is an INSERT, UPDATE or DELETE and no results are expected to be returned ex) rowcount = stmt.executeUpdate("UPDATE emp set salary = 0"); 2) The statement executed is an INSERT which is creating a new record in a table whose primary key field is an auto number field: ex) rowcount = stmt.executeUpdate("INSERT product VALUES ('Prod.Name')" statement.Return_gen_key"); ResultSet autoKeys = stmt.getGeneratedKeys();
What are the SQL basic rules?
1) There is a set of reserved words that cannot be used as names for database objects (SELECT, FROM, WHERE) 2) SQL is case insensitive 3) White space is ignored 4) semi-colon is used as a statement terminator 5) Data and time constraints have a defined format 6) Two single quotes '' are used to represent a single quote character in a character constant
What are the three types of joins?
1) Theta Join 2) Equijoin 3) Natural Join
Transitivity rules can be used for three purposes:
1) To determine if a given FD X -> Y follows from a set of FDs F 2) To determine if a set of attributes X is a superkey of R 3) To determine the set of all FDs (called the closure of F+) that can be inferred from a set of initial functional dependencies F
What are two ways of specifying a schema for XML?
1) XML Schema 2) Document Type Definition (DTD) (original, order)
What are the three types of outer joins?
1) left outer join 2) right outer join 3) full outer join
What are the two main areas of application design?
1) transactions 2) user interface
What is the general DTD form?
<!DOCTYPE myroot [ <elements> ]>
Set difference
A binary operation that takes two relations R and S as input and produces an output relation that contains all the tuples of R that are not in S
Intersection:
A binary operation that takes two relations R and S as input and produces an output relation which contains all tuples that are in both R and S
What is a database?
A collection of logically related data for a particular domain
Theta-Join
A derivative of the cartesian product instead of taking all combinations of tuples from R and S, we only take a subset of those tuples that match a given condition
Valid XML document
A document that is well-formed and it conforms to a document type definition (DTD) or an XML Schema Definition (XSD)
Trivial Functional Dependency
A functional dependency of an attribute on a subset of itself
Explicit join
A join condition directly in the FROM clause instead of the WHERE Ex) Return the employees who are assigned to the 'Management' department SELECT ename FROM emp JOIN dept ON emp.dno = dept.dno WHERE dname = 'Management'
full outer join
A join in which all rows from both tables will be included regardless of whether they match rows from the other table
Equijoin
A join operator that links tables based on an equality condition that compares specified columns of the tables.
Left outer join
A join that includes all of the rows from the first table in the query and only those records from the second table that match the join field in the first table.
Right outer join
A join that includes all of the rows from the second table in the query and only those records from the first table that match the join field in the second table.
Data definition language (DDL)
A language that allows users to add, modify, drop tables, create views, define and enforce integrity constraints, enforce security restrictions
Attributes
A property of an entity or a relationship type
Subquery
A query that is embedded (or nested) inside another query. Also known as a nested query or an inner query. ex) SELECT ename, salary, bdate FROM emp WHERE salary > (SELECT AVG(salary) FROM emp)
Key value stores
A simple pair of a key and an associated collection of values. Key is usually a string. Database has no knowledge of the structure or meaning of the values.
Boyce-Codd Normal Form (BCNF)
A special type of third normal form (3NF) in which every determinant is a candidate key. A table in BCNF must be in 3NF.
Composition
A stronger form of aggregation where the part cannot exist without its containing whole entity type and the part can only be part of one entity type
Fourth Normal Form (4NF)
A table is in 4NF if it is in 3NF and contains no multiple independent sets of multivalued dependencies.
Relation
A table with columns and rows
Rollback
Aborts current transaction and discards its updates. Failures may also cause a transaction to be aborted
What are the advantages and disadvantages of running a database on-premise
Advantages: - Data does not leave organization - Potential for higher performance Disadvantages: - Organization responsible for deploying, configuring, maintaining and securing both hardware and database software
What are the advantages and disadvantages of views?
Advantages: - Data independence - Security - Easier querying - Convenience/Customization Disadvantages: - Updatable views are not always supported and restrictive - Performance: views add increased overhead
What are the advantages and disadvantages of running a database on your machine?
Advantages: - Full control over database and install process Disadvantages: - May not be easy to connect to by clients depending on the machine - Must take time to install/configure database software
bitmap indexes
Allow for efficient lookup of data values. A bit vector is created for each possible value of the attribute being indexed/ The length of the bit vector is the number of tuples in the indexed table. The j-th bit of the vector is 1 if tuple j contains that value.
Namespaces
Allow tag names to be qualified to avoid naming conflicts. A naming conflict would occur when the same name is used by two different domains or vocabularies
Scrollable ResultSet
Allows you to navigate in any direction through it, and these methods can now be used: -> absolute(int, row) -> afterLast(), beforeFirst(), first(), last(), next(), previous()
Enhanced Entity-Relationship (ERR) modeling
An extension of ER modeling to include object-oriented concepts such as superclasses and subclasses, specialization and generalization, aggregation and composition
What is security in SQL based on?
Authorization identifiers, ownership, and privileges
What are the advantages and disadvantages of document stores?
Benefits: High scalability, availability, performance Limitations: same as key value, may cause redundancy
How do you rename an attribute in the final result?
By using the keyword AS ex) SELECT ename, salary/52 AS pay FROM emp, proj WHERE ename = 'A.lee'
Tuple constraints
CHECK can also be used on an entire tuple instead of a single attribute ex) CREATE TABLE student ( num CHAR(10) NOT NULL, honors CHAR(1), gpa DECIMAL(3,2) CHECK ((honors = 'Y' AND gpa > 3.50) OR honors = 'N')
Create user-defined domain for Emp.title
CREATE DOMAIN titleType AS CHAR(2) DEFAULT 'EE' CHECK (VALUE IN (NULL, 'EE', 'SA', 'PR', 'ME'));
how do you create a schema?
CREATE SCHEMA employeeSchema AUTHORIZATION Joe;
Create table
CREATE TABLE emp ( eno CHAR(5), ename VARCHAR(30) NOT NULL, bdate DATE, title CHAR(2), salary DECIMAL(9,2) supereno CHAR(5), dno CHAR(5), PRIMARY KEY (eno), FOREIGN KEY (supereno) REFERENCES emp (eno) ON DELETE SET NULL ON UPDATE CASCADE
The workson relation has a foreign key to Emp(eno) if a user inserts a record in workson and the employee does not exist, the insert fails
CREATE TRIGGER insert WorksOn BEFORE INSERT ON WorksOn REFERENCING NEW ROW AS NewWO FOR EACH ROW WHEN (NewWO.eno NOT IN SELECT eno FROM emp)) INSERT INTO Emp(eno) VALUES (NewWo.eno);
Create SSN Type
CREATE TYPE SSN AS VARCHAR(10) FINAL;
How do you create a type?
CREATE TYPE typeName AS builtInType [FINAL] -> Final means that you cannot create subtypes of the new type
Create a view that has only the employees of department 'D2':
CREATE VIEW empD2 AS SELECT * FROM emp WHERE dno = 'D2'
Create a view that shows the employee number, title, and name
CREATE VIEW staff (Number, Name, Title) AS SELECT eno, ename, title FROM emp
Database design stages
Conceptual -> Logical -> Physical
How do you call the connection interface?
Connection con = DriverManager.getConnection(url, uid, pw); Statement stmt = con.createStatement(); ... con.close();
Application Programming Interface (API)
Contains methods to connect a database, execute queries, and retrieve results
Flattening method
Create a new tuple for the repeating data combined with the data that does not repeat
Three-tier client server architecture
DBMS is bottom tier, second tier is an application server containing business logic, top tier is clients
How do you delete a row?
DELETE ex) Fire everyone in the company DELETE FROM workson DELETE FROM emp
SQL Ex: Delete a product from the database
DELETE FROM product WHERE sku = '1234';
How do you remove duplicates in SQL?
DISTINCT Ex) SELECT DISTINCT title FROM emp
How do you drop a schema?
DROP SCHEMA emplyeeSchema;
Hows are views removed?
DROP VIEW viewName [RESTRICT | CASCADE ]
hierarchy of database objects
Database -> Schema -> Table
Two-Tier Client-Server Architecture
Dedicated machine running DBMS accessed by clients
Referential Integrity constraint
Defines a foreign key that references the primary key of another table ex) Workson contains two foreign keys: - workson.eno references emp.eno - workson.pno references proj.pno
Deletion anomalies
Deletion of a tuple may lose information that is still required to be stored
What does a fact table consist of?
Dimension attributes: key attributes of a dimension table Dependent attributes: attributes determined by the dimension attributes
Splitting method
Divide the existing relation into two relations: non-repeating attributes and repeating attributes
Complex Types
Elements that contain other elements ex) <xsd:element name = "Dept"> <xsd:complexType> <xsd:sequence> // Children defined here </xsd:sequence> </xsd:complexType> </xsd:element>
Simple Types
Elements that have no subelements ex) <xsd:element name = "name" type = "xsd:string" /> <xsd:element name = "budget" type = "xsd:decimal" />
Row Level Triggers
Executed for each row that is updated, deleted, or inserted
Callable statement
Extends PreparedStatement and is used to execute stored procedures
Prepared statement
Extends statement and is used to execute precompiled SQL statements
What are the benefits of a 3-schema architecture?
External: Each user an access the data, but have their own view of the data independent of other users Conceptual: single shared data representation for all applications and users which is independent of physical data storage Internal: Provides standard facilities for interacting with operating system for space allocation and file manipluation
DatabaseMetaData
For retrieving metadata (schema) information from a database
Allow the director full access to proj and the ability to grant privileges to other users
GRANT ALL PRIVILEGES ON proj TO DIRECTOR WITH GRANT OPTION;
Allow all users to query the dept relation
GRANT SELECT ON dept TO PUBLIC;
Only allow users Manager and Director to access and change salary in Emp
GRANT SELECT, UPDATE(salary) ON Emp TO Manager, Director;
Mediator
Global level software that receives global queries and divides them into subqueries for execution by wrappers
User interface
How will the data be displayed on the screen and how can the user initiate transactions
What are the benefits of JSON?
Human readable Supports semi-structured data Supported by many languages
SQL Ex: Insert a new product into the database
INSERT INTO product values ('1234', 'soap', 'ivory', 100);
Referential integrity constraint
If a foreign key exists in a relation then the foreign key value must match a primary key value of a tuple in the referenced relation or be null
Restrict command
If domain is being used, drop fails
Second Normal Form (2NF)
If it is also in 1NF and every non-prime attribute is fully functionally dependent on a candidate key
Third Normal Form (3NF)
If it is in 2NF and there is no non-prime attribute that is transitively dependent on the primary key
Full functional dependency
Indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A.
Insertion anomalies
Insertion of a tuple into the relation either requires insertion of redundant information or cannot be performed without setting key values to NULL
Natural join
Links tables by selecting only the rows with common values in their common attributes
Driver
Main class of the entire driver
Graph Databases
Model data as nodes (with properties) and edges (with labels)
What is JSON used for?
One command usage is for the data to be provided from a server (either from a relational or NoSQL database) and sent to a web client. The web client then uses javascript to convert the JSON into objects and manipulate it as required
Statement Level Triggers
Only executed once per statement regardless of how many tuple are affected
How do you do a statement level trigger on an update?
REFERENCING OLD TABLE AS oldTbl NEW TABLE as newTbl
Remove all privileges from user Joe on Emp table
REVOKE ALL PRIVILEGES ON Emp FROM Joe
ROLAP
Relational OLAP that uses a standard relational DBMS and star schema to store warehouse data
Aggregation
Represents a 'HAS-A' or 'IS-PART-OF' relationship between entity types. One entity type is the whole, the other is the part
Scalar Subqueries
Return a single value which is often then used in a comparison
Return the number of employees and their average salary
SELECT COUNT(eno) AS numEmp, AVG(salary) AS avgSalary FROM emp
Return all employees who have the same name as another employee
SELECT E.ename FROM emp as E, emp as E2 WHERE E.ename = E2.ename AND E.eno <> E2.eno
Return the employee number, department number and hours the employee worked per department where the hours is >= 10
SELECT W.eno, D.dno, SUM(hours) FROM workson AS W, dept AS D, proj AS P WHERE W.pno = P.pno and P.dno = D.dno GROUP BY W.eno, P.dno HAVING SUM(hours) >= 10
Return the employee name and salary of employee 'J. Doe'
SELECT bdate, salary FROM emp WHERE ename = 'J.Doe'b
Return all departments who have a project with a budget greater than 300,000
SELECT dname FROM dept WHERE dno IN (SELECT dno FROM proj WHERE budget > 300000
For employees born after December 1, 1965, return the average salary by department where the average is > 40,000
SELECT dname, AVG(salary) AS avgSal FROM emp, dept WHERE emp.dno = dept.dno and emp.bdate > DATE '1965-12-01' GROUP BY dname HAVING AVG(salary) > 40000
Return all departments (with and without projects
SELECT dname, pname FROM dept LEFT OUTER JOIN proj ON dept.dno = proj.dno SELECT dname, pname FROM dept LEFT OUTER JOIN proj USING(dno) SELECT dname, pname FROM dept NATURAL JOIN LEFT proj
Return the department names and the projects in each department
SELECT dname, pname FROM dept, proj WHERE dept.dno = proj.dno
Return all employees who are in any of of the departments (D1, D2, D3)
SELECT ename FROM emp WHERE dno IN ('D1','D2','D3')
Return all employee names that start with 'A'
SELECT ename FROM emp WHERE ename LIKE 'A%'
Return the employees who make at least 20,000 and less than or equal to 45,000
SELECT ename FROM emp WHERE salary BETWEEN 20000 and 45000
Return the employees who make more than all the employees with title 'ME' make
SELECT ename FROM emp as E WHERE salary > ALL (SELECT salary FROM emp WHERE title = 'ME')
Return the employees who are assigned to the 'Management' department
SELECT ename FROM emp, dept WHERE dname = 'Management' and emp.dno = dept.dno
Return the employees that are in the 'Accounting' department
SELECT ename FROM emp WHERE dno = (SELECT dno FROM dept WHERE dname = 'Accounting')
Return the birth date and salary of all employees whose title is 'EE':
SELECT ename, salary FROM emp WHERE title = 'EE'
Return the employee number, project number, and number of hours worked where the hours worked is > 50
SELECT eno, pno, hours FROM workson WHERE hours > 50
SQL Ex: Retrieve all products where inventory < 10
SELECT name, inventory FROM product WHERE inventory < 10;
SQL Ex: Retrieve all products in the database
SELECT sku, name, desc, inventory FROM product;
Return the title and number of employees of that title where the number of employees of the title is at least 2
SELECT title, COUNT(eno) AS numEmp FROM emp GROUP BY title HAVING COUNT(eno) >= 2
The isolation level can be specified by
SET TRANSACTION ISOLATION LEVEL = X where X is
Database hosting
Selects the machine where the database software executes
What are the advantages of XML?
Simplicity, open standard, extensibility, interoperability, separtation of data and presentation
How do you call the statement interface?
Statement stmt = con.createStatement(); ResultSet rst = stmt.executeQuery("SELECT ename, salary FROM emp");
Consider the Emp relation. We wnt the user staff to have only query access but not be able to see users birthdate and salary. How do we accomplish this?
Step #1: Create a view on the Emp relation CREATE VIEW EmpView AS SELECT eno, ename, title, supereno, dno FROM emp; Step #2: Provide SELECT privilege to staff GRANT SELECT ON empView TO STAFF; Step #3: REVOKE privileges ON base relations (emp) REVOKE SELECT ON Emp FROM staff
How do you call a driver interface?
String url = "jdbc:mysql://cosc304.ok.ubc.ca/TestDB";
What are the disadvantages of using custom APIs?
The code is written for a specific DBMS and making changes is difficult
First Normal Form (1Nf)
The first stage in the normalization process. It describes a relation depicted in tabular format, with no repeating groups and a primary key identified. All nonkey attributes in the relation are dependent on the primary key. a restriction that all fields must be atomic, or single valued
Internal view
The physical representation of the database on the computer. How the data is stored
Entity Integrity constraint
The primary key of a table must contain a unique, non-null value for each row. The primary key is specified using the PRIMARY KEY clause. Ex) PRIMARY KEY(eno)
Data integration
The process of combining data at the entity level. It requires resolving representational conflicts and determining equivalent keys
Fifth Normal Form (5NF)
The relation has no join dependency
predicate expression
The set of objects returned can be filtered by putting selection conditions on the path. Specified inside square brackets [..] ex) /Depts/Dept/name[.="Management"]
external view
The users view of the database that provides the part of the database relevant to the user
What are the benefits of using custom APIs?
They have increased functionality and performance
OFFSET
To start from a row besides the first ex) SELECT eno, slary FROM emp ORDER BY eno DESC LIMIT 3 OFFSET 2
What are keys used for?
To uniquely identify a tuple in a relation
Query processor
Translates user/application queries into low-level data manipulation actions
Transaction to deposit $50 into a bank account
UPDATE Account WHERE num = 'S1' SET balance = balance + 50; COMMIT;
BETWEEN command
Used for a range of values
WITH CHECK OPTION
Used for updatable views. It is used to prevent updates through a view that would then remove rows from the view ex) CREATE VIEW staff (Number, Name, Job, DeptNum) AS SELECT eno, ename, title, 'D2' FROM em WHERE DeptNum = 'D2'; UPDATE staff SET DeptNum = 'D3' WHERE Number = ' E3'; This update deletes employee E3 because changing his department to D3 no longer matches. To prevent this you can specify the WITH CHECK OPTION CREATE VIEW staff (Number, Name, Jon, DeptNum) AS SELECT eno, ename, title FROM emp WHERE DeptNum = 'D2' WITH CHECK OPTION
Drop table
Used to delete the table definition and all data from the database ex) DROP TABLE emp;
dimensionality modeling
Used to design data warehouses where: - A main table exists called a fact table with a composite primary key and foreign keys to one or more dimension table
HAVING
Used to filter out entire groups that do not match certain criteria and can contain any condition that references aggregate functions
Indentifiers
Used to identify objects in the database such as tables, views and columns It is also the name of the database object
What are the two types of views?
Virtual: Produced by a SQL query on demand Materialized: Stored as a derived table that is updated when the tables that is refers to are updated
Transaction
What queries/updates will the application perform
When do you use a custom API?
When JDBC standard does not provide sufficient functionality
Specialization/Type Hierarchy
When each class only has one superclass
How do you use the rename operation?
With the result operation: result(EmployeeNum, ProjectNum, Duration)
Transactions
a atomic program that executes on the database and preserves the consistency of the database
Union
a binary operation that takes two relations R and S as input and produces an output relation that includes all tuples that are either in R or in S or in both. Duplicate tuples are eliminated
data model
a collection of concepts that can be used to describe the structure of a database
relational datyabase
a collection of normalized relations with distinct relation names
schema
a description of the structure of the database that contains structures, names and types of data stored
XQuery
a full query language that uses XPath for path expressions
superclass
a general class that is extended by one more more subclasses
Entity type
a group of objects with the same properties which are identified as having an independent existence
data warehouse
a historical database that summarizes integrates, and organizes data from one or more operational databases in a format that is more efficient for analytical queries
semi join
a join between tables that returns rows from the first table where one or more matches are found in the second table used in EXISTS and IN constructs in SQL
Anti join
a join between two tables that returns rows from the first table where NO matches are found in the second table Used with NOT, EXISTS, NOT, IN, FOR ALL
data definition language (DDL)
a language used to define data structures and modify data
project champion
a manager or senior IT person who is the projects promoter and backer
Extensible Markup Language (XML)
a markup language that allows for the description of data semantics
Javascript object notation (JSON)
a method for specializing data objects into text form
candidate key
a minimal set of attributes that uniquely identifies a tuple in a relation
subclass
a more specific class that extends a superclass by inheriting its methods and attributes and then adding its own methods and attributes
attribute
a named column of a relation
View
a named query that is defined in the database
Entity instance
a particular example or occurrence of an entity type
relationship instance
a particular occurrence of a relationship type that relates entity instances
attribute defined subclasses
a particular type of predicate-defined constraint where the value of an attribute(s) determines if an object is a member of a subclass
Object-relational DBMS
a relational DBMS extended to support object-oriented features such as inheritance, user defined types, and polymorphism
Recursive relationship
a relationship type where the same entity type participates more than once in different roles
tuple
a row of a relation
relational algebra expression
a sequence of relational algebra operators
domain
a set of allowable values for one or more attributes
File based system
a set of applications that use files to store their data
relationship type
a set of associations among entity types, each relationship type has a name that describes its functions
foreign key
a set of attributes in one relation referring to the primary key of another relation
superkey
a set of attributes that uniquely identifies a tuple in a relation
Entity set
a set of entity instances
relational database schema
a set of relation schemas
relational algebra
a set of relational operations for retrieving data
relationship set
a set of relationship instances
Mission statements
a specification of the major aims and objects of the product
Functional dependencies (FB)
a statement about the relationship between attributes in a relation and represent constraints on the values of attributes in a relation and are used for normalization ex) eno -> ename eno, pno -> hours
XSLT (eXtensible stylesheet language for transformations)
a subet of XSL that provides a method for transforming XML. (or other text documents) into other documents (XML, HTML)
correlated subquery
a subquery (a query nested inside another query) that uses values from the outer query
data mart
a subset of a data warehouse that supports the business requirements of one department
normalization
a technique for producing relations with desirable properties
Entity-relationship modeling
a top-down approach to database design that models the data as entities, attributes, and relationships
projection operation
a unary operation that takes in a relation as input and returns a new relation as output that contains a subset of the attributes of the input relation and all non-duplicate tuples
selection operation
a unary operation that takes in a relation as input and returns a new relation as output that contains a subset of the tuples of the input relation
Commit
accepts updates of current transaction
ALL
all values returned must satisfy the condition
Updatable ResultSet
allow you to update fields in the query result and update entire rows
data manipulation language
allows for the insertion, modification, retrieval and deletion of data
data abstraction
allows the internal definition of an object to change without affecting programs that use the object through an external definition
WITH GRANT OPTION
allows users to grant privileges to other users. Can only grant privileges that they them self hold
CREATE DOMAIN command
allows you to define your own types that are subsets of built in types
XPath
allows you to specify path expressions to navigate the tree-structured XML document
Interface
an abstract class consisting of a set of methods with no implementation
Prime attribute
an attribute in any candidate key
derived attribute
an attribute whose value is calculated from other attributes but is not physically stored.
weak entity type
an entity type whose existence is dependent on another entity type
strong entity type
an entity type whose existence is not dependent on another entity type
ANY
any value returned by the subquery can satisfy the condition
What are the advantages and disadvantages of Key-Value stores
benefits: high scalability, availability, performance limitations: single record transactions, eventual consistency, simple query interface
How are attributes referenced in XML?
by putting a "@" in front of their name ex) /Depts/Dept/@dno - dno attribute of Dept element
How can the number of occurrences of elements be controlled in an XML schema?
cardinality constraints minOccurs and maxOccurs ex) <xsd:element name = "Emp" minOccurs="0" maxOccurs="unbounded"> <xsd:element name = "budget" type = "xsd:decimal" minOccurs="1" maxOccur="1" />
Modification anomalies
changing an attribute of a tuple may require changing multiple attribute values in other tuples
single valued attribute
consists of a single value for each entity instance
Composite attribute
contains a single component with an independent existence
Simple attribute
contains a single component with an independent existence
Statement interface
contains abstract methods for executing a single static SQL statement and returning the results it produces
Connection interface
contains abstract methods for managing a connection or session. Used to create statements for execution on the database
JSON parser
converts a JSON file (or string) into program objects assuming no syntactic errors
Metadata
data about data, or schema information
Cloud databases
databases hosted by a service provider that allow for easy setup, administration and scaling
Online transaction processing (OLTP) systems
databases that are designed to handle many small queries and updates
XSL (eXtensible stylesheet language)
defines how XML data is displayed
Document Type definition (DTD)
defines the grammatical rules for the document. It is not required but it provides a mechanism for checking a document's validility
relation schema
definition of a single relation (intension of the relation)
online analytical processing (OLAP) systems
designed for decision support applications where large amounts of data is analyzed often with ad hoc queries
disjoint constraint
determines if a member of a superclass can be a member of one or more than one of its subclasses
participation constraint
determines if every member in a superclass must participate as a member of one of its subclasses
Uniqueness constraints
dictate that the value of an element (with a specified path) must be unique ex) <xsd:unique name = "UniqueDno"> <xsd:selctor xpath = "Dept" /> <xsd:field xpath = "@dno" /> </xsd:unique>
Well formed XML documents
documents that obeys the syntax of the XML standard including: - having a single root element - All elements must be properly closed and nested
One-to-One relationship
each instance of an entity class E1 can be associated with at most one instance of another entity class E2 and vice versa
Many-to-Many relationship
each instance of an entity class E1 can be associated with more than once instance of another entity class E2 and vice versa
One-to-Many relationship
each instance of an entity class E1 can be associated with more than one instance of another entity class E2. However, E2 can only be associated with at most one instance of entity class E1
Domain constraint
every value for an attribute must be an element of the attributes domain or be null
File-server (embedded) architecture
files are shared but DBMS processing occurs at the clients
Connection
for connecting to the DB using the driver
statement
for executing a query using a connection
ResultSet
for storing and manipulating results returned by a statement
Reference constraints
forces the value of a reference to be constrained to specified keys ex) <xsd:keyref name = "DeptMgrFK" refer="EmpKey"> <xsd:selector xpath = "Dept" /> <xsd:field xpath = "@mgr" /> </xsd:keyref>
privileges
gives user the right to perform operations on database objects
PUBLIC
grants access to all users (present and future) of the database
ALL PRIVILEGES
grants all privileges to the user except the ability to grant privileges to other users
functional dependency transitivity rule
if A -> B and B -> C then A -> C
Type lattice
if a class has more than one superclass
total participation
if an entity's participation in a relationship is mandatory
cascade
if domain is used domain is dropped and fields using domain defaulted to base type
Entity integrity constraint
in a base relation, no attribute of a primary key can be null
Database updates
inserting rows, deleting rows, and updating rows are performed using their own statements Ex) INSERT INTO emp VALUES ('E9', 'S.Smith')
Conceptual database design
involves modeling the collected information at a high-level of abstraction without using a particular data model or DBMS
storage manager
maintains storage information including memory allocation, buffer management and file storage
multi-valued attribute
may have multiple values for a single entity instance
Tuning
monitoring and maintenance systems that assist the DBA in detecting performance issues
MOLAP
multidimensional OLAP that uses a specialized DBMS with a data cube model
cardinality
number of tuples a relation contains
chasm trap
occurs when a model suggests that a relationship between entity types should be present, but the relationship does not actually exist (missing relationship)
Redundancy
occurs when the same data value is stored more than once in a relation
degree
part of a relation it is the number of attributes it contains
Trigger Syntax: Action
performed when an event occurs and the condition is satisfied
transaction manager
performs scheduling of operations and implements concurrency control algorithms
Stored procedures
precompiled SQL code stored at the database that take in parameters for their execution
update anomalies
problems that arise when trying to insert, delete, or update tuples and are often caused by redundancy
XML parser
processes the XML document and determines if it is well-formed and valid (if a schema is provided)
ResultSet Interface
provides methods for manipulating the result returned by the SQL statement
Isolation level
reflects how a transaction perceives the results of other transactions
DROP command
remove domains from the system ex) DROP DOMAIN mgrType
Database administrator (DBA)
responsible for installing, maintaining and configuring the DBMS software
Data administrator (DA)
responsible for organizational polices on data creation, security and planning
Table subqueries
returns on or more columns and multiple rows
AVG
returns the average of the values in a column
MAX
returns the largest value in a column
COUNT
returns the number of values in a column
MIN
returns the smallest value in a column
SUM
returns the sum in a column
contraints
rules or restrictions that apply to the database and limit the data values it may store
program data independence
separating the application from the data it is manipulating
Path descriptors
sequences of tags separated by slashes / - If the descriptor begins with a /, then the path starts at the root - If the descriptor begins with //, the path can start anywhere - You may also start the path by giving the document name such as doc(depts.xml)/. Also denotes a sequence of nodes. These nodes may themselves contain other nodes (elements) ex) /Depts/Dept/name
Document stores
similar to key value stores but the value is stored in the structured document (JSON, XML)
Database management system (DBMS)
software designed for the creation and management of databases
Wrappers
software that converts global level queries into queries that the local database can handle.
predicate-defined constraints
specify when an object participates in a subclass using a certain rule
top-down process
start with a general class and then subdivide it into more specialized classes
bottom-up process
start with specialized classes and try to determine a general class that contains the attributes common to all of them
primary key
the candidate key designated as the distinguishing key of a relation
Conceptual view
the logical structure of the database that describes what data is stored and its relationships
Driver interface
the main interface that must be implemented by a DBMS vendor when writing a JDBC driver
participation
the minimum number of relationship instances for an entity participating in a relationship type Can be optional(zero) or mandatory(1 or more)
Degree of a relationship type
the number of entity types participating in the relationship Relationship of degree two = Binary Relationship of degree three = Ternary Relationship of degree four = Quaternary
multiplicity
the number of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship
inheritance
the process of a subclass inheriting all the methods and attributes of a superclass
roll-up
the process of aggregating along a particular dimension
Schema integration
the process of combining local schemas into a global, integrated view by resolving conflicts present between the schemas
Logical database design
the process of constructing a model of the information in the domain using a particular data model, but independent of the DBMS
Physical database design
the process of constructing a physical model of information on the secondary storage in a given data model for a particular DBMS
generalization
the process of creating a more general superclass from existing subclasses
specialization
the process of creating more specialized subclasses of an existing superclass
Drill-down
the process of dividing an aggregated value into its component parts
Testing
the process of executing the application programs with the intent of finding errors
SELECT INTO
the result of a selected statement that can be stored into a temporary table ex) SELECT E.ename INTO davisMgr FROM emp as E, emp as M WHERE E.supereno = M.eno and M.ename = 'M.smith' davisMgr UNION smithMgr
extension
the set of tuples currently in the relation
intension
the structure of the relation including its domains
predicate
to determine which tuples are in the output, the selection operation has a specified condition
required data
to specify that a column must always have a data value (cannot be null) specify NOT NULL after the column definition. Ex) eno CHAR(5) NOT NULL
View resolution
translates the query into a query over only the base relations
Key constraints
uniqueness constraints where the value cannot be null ex) <xsd: key name = "EmpKey"> <xsd:selector xpath = "Dept/Emp" /> <xsd:field xpath = "@eno" /> </xsd:key>
UPDATE
updating existing rows Ex) Increase all employees salaries by 10% UPDATE emp SET salary = salary*1.10;
LIMIT
used if you want the first N rows ex) SELECT ename, salary FROM emp ORDER BY salary DESC LIMIT 5
CHECK clause
used in a nested statement to retrieve values from the database
Alter Table
used to change an existing table ex) Add column location to dept relation: ALTER TABLE dept ADD location VARCHAR(50); ex) Add field SSN to Emp relation ALTER TABLE emp ADD SSN CHAR(10);
EXISTS
used to check whether the result of a nest query is empty or not. It will return true if the nested query has 1 or more tuples Ex) SELECT ename FROM emp as E WHERE EXISTS (SELECT * FROM emp as E2 WHERE E.ename = E2.ename AND E.eno <> E2.eno)
Set operations (UNION, INTERSECTION, DIFFERENCE)
used to combine the results of two SQL queries
Wild car operator "*"
used to denote any single tag ex) /*/*/name - match any name that is nested 3 levels deep
What is the attribute closure algorithm used for?
used to determine if a particular FD X -> Y holds based on the set of given FDs F
authorization identifier
used to determine which database objects the user has access to
Development lifecycle
used to develop database and information systems using a series of steps
Trigger Syntax: Condition
used to filter out when the trigger action is performed
Grant
used to give privileges on database objects to users ex) GRANT {privilegeList | ALL [PRIVILEGES] ON ObjectName TO {AuthorizationIdList | PUBLIC } [WITH GRANT OPTION]
ORDER BY
used to order the data ex) SELECT ename, salary, bdate FROM emp WHERE salary > 30,000 ORDER BY salary DESC, ename ASC
Triggers
used to perform actions when a defined event occurs. They are standardized but also have been present in various database systems for some time
XML schema
used to provide a standard XML schema language written in XML with better support for data modeling including: - Data types - Constraints - Explicit Primary Keys and Foreign Key References
REVOKE
used to remove privileges on database objects from users ex) Allow no users to query the dept relation REVOKE SELECT ON dept FROM PUBLIC;
relationship cardinalities
used to restrict how entity types participate in relationships in order to model real-world constraints
LIKE command
used to search for partial matches. partial string matches are specified by using "%" that replaces an arbitrary number of characters or underscore "_" that replaces a single character
IN command
used to see if a value is in a given set of values
indexes
used to speed up access to the rows of the tables based on the values of certain attributes Ex) creating an index on eno and pno in workson is useful as it will speed up joins with Emp and proj tables respectively CREATE INDEX idxEno ON workson(eno) CREATE INDEX idxPno ON workson(pno)
Query language
used to update and retrieve data that is stored in a data model
JSON validator
validates according to a schema and then performs the parsing
domain constraints
verify that the value of a column is in a given domain using CHECK ex) title CHAR(2) CHECK (title IN (NULL, 'EE', 'SA', 'PR', 'ME'));
Materialized views
views that are physically stored in the database. They speed up OLAP queries by pre-computing large joins
fan trap
when a model represents a relationship between entity types, but the pathway between certain entity instances in ambiguous
star schemas
when drawn in a diagram, the schema looks like a star because the fact table is in the middle with multiple dimension tables linked to it
Running "On-Premise" hosting
when the database is deployed on a (virtual) machine controlled by the organization
ownership
whenever a user creates an object, the user is the owner of the object and initially the only one that can access it
existence dependency
where the entity's existence depends on the relationship
How do you call the ResultSet Interface?
while (rst.next()) { System.out.println(rst.getString("ename") + "," + rst.getDouble(2));