CS 450 databases final (1-9 DB and DB users) (10-26 DB Systems concepts and architecture) (27-47 Relational Data Model and Relational Database Constraints) (47-65 Basic SQL)
For each department whose average employee salary is more than $30,000, retrieve the department name and the number of employees working for that department.
(a) SELECT DNAME, COUNT (*) FROM DEPARTMENT, EMPLOYEE WHERE DNUMBER=DNO GROUP BY DNAME HAVING AVG (SALARY) > 30000
Schema Construct
A component of the schema or an object within the schema, e.g., STUDENT, COURSE.
Domain
A domain has a logical definition: Example: "USA_phone_numbers" are the set of 10 digit phone numbers valid in the U.S.
Data model
A set of concepts to describe the structure of a database, the operations for manipulating these structures, and certain constraints that the database should obey.
valid state
A state that satisfies the structure and constraints of the database (at a given point in time)
Tuples
A tuple is an ordered set of values (enclosed in angled brackets '< ... >') The data elements in each row represent certain facts that correspond to a real-world entity or relationship In the formal model, rows are called tuples
Database queries
Access different parts of data and formulate the result of a request
Schema Diagram
An illustrative display of (most aspects of) a database schema.
Domain constraint
Another schema-based constraint is the domain constraint Every value in a tuple must be from the domain of its attribute (or it could be null, if allowed, for that attribute)
Create table types
Base tables (base relations) Relation and its tuples are actually created and stored as a file by the DBMS Virtual relations (views) Created through the CREATE VIEW statement. Do not correspond to any physical file.
Data Model Categories
Conceptual (high-level, semantic): provide concepts that are close to the way many users perceive data and objects around the world. Physical (low-level, internal): Provide concepts that describe details of how data is stored in the computer. Implementation (rep) data models: Provide concepts that fall between the above two, used by many commercial DBMS implementations (e.g. relational model and commerical systems). Self-describing data models: Combine the description of data (called meta-data) with the data values. Examples include XML, key-value stores and NoSQL systems
relational integrity constraints
Constraints are conditions that must hold on all valid relation states. There are three main types of (explicit schema-based) constraints that can be expressed in the relational model: Key constraints Entity integrity constraints Referential integrity constraints (Foreign key constraints) Schema-based: Domain constraints
Constraints
Constraints determine which values are permissible and which are not permissible in the database.
Insert Violations
INSERT may violate any of the constraints: Domain constraint: if one of the attribute values provided for the new tuple is not of the specified attribute domain Key constraint: if the value of a key attribute in the new tuple already exists in another tuple in the relation Referential integrity: if a foreign key value in the new tuple references a primary key value that does not exist in the referenced relation Entity integrity: if the primary key value is null in the new tuple
Numeric types of data
INTEGER, INT, SMALLINT FLOAT or REAL, and DOUBLE PRECISION
SQL Schema
Identified by a schema name Includes an authorization identifier and descriptors for each element
Delete violations
If the primary key value of the tuple being deleted is referenced from other tuples in the database Can be remedied by several actions: RESTRICT, CASCADE, SET NULL (see Chapter 6 for more details) RESTRICT option: reject the deletion CASCADE option: propagate the new primary key value into the foreign keys of the referencing tuples SET NULL option: set the foreign keys of the referencing tuples to NULL
Database modification
Insertions, deletions and updates to its content
Database state example
It is when the database has actual data in it at the given time., so those schema variables will have tuples in them
specifying constraints in SQL
Key constraint: A primary key value cannot be duplicated Entity integrity constraints: A primary key value cannot be null Referential integrity constraints: The "foreign key" must have a value that is already present as a primary key, or may be null.
Data Model Operations
Operations for specifying database retrievals and updates by referring to the constructs of the data model
Database retrieval
Querying, generating reports
The schema (description of a relation)
R(A1,A2,...An) relation with attributes setup Example CUSTOMER(Cust-id, Cust-name, Address, Phone#)
Initial Database State
Refers to the database state when it is initially loaded into the system.
Database Administrators
Responsible for authorizing access to the database, for coordinating and monitoring its use, acquiring software and hardware resources, controlling its use and monitoring efficiency of operations.
database designers
Responsible to define the content, the structure, the constraints, and functions or transactions against the database. They must communicate with the end-users and understand their needs.
(b) Suppose we want the number of male employees in each department rather than all employees (as in Exercise 5.4a). Can we specify this query in SQL? Why or why not?
SELECT DNAME, COUNT (*) FROM DEPARTMENT, EMPLOYEE WHERE DNUMBER=DNO AND SEX='M' AND DNO IN ( SELECT DNO FROM EMPLOYEE GROUP BY DNO HAVING AVG (SALARY) > 30000 ) GROUP BY DNAME
Retrieve the names of all employees who work in the department that has the employee with the highest salary among all employees.
SELECT LNAME FROM EMPLOYEE WHERE DNO = ( SELECT DNO FROM EMPLOYEE WHERE SALARY =( SELECT MAX(SALARY) FROM EMPLOYEE) )
Retrieve the names of employees who make at least $10,000 more than the employee who is paid the least in the company.
SELECT LNAME FROM EMPLOYEE WHERE SALARY >= 10000 + ( SELECT MIN(SALARY) FROM EMPLOYEE)
Retrieve the names of all employees whose supervisor's supervisor has '888665555' for Ssn.
SELECT LNAME FROM EMPLOYEE WHERE SUPERSSN IN ( SELECT SSN FROM EMPLOYEE WHERE SUPERSSN = '888665555' )
SQL
SQL (Structured query language) The origin of SQL is relational predicate calculus called tuple calculus (see Ch.8) which was proposed initially as the language SQUARE. SQL Actually comes from the word "SEQUEL" which was the original term used in the paper: "SEQUEL TO SQUARE" by Chamberlin and Boyce. IBM could not copyright that term, so they abbreviated to SQL and copyrighted the term SQL. Now popularly known as "Structured Query language". SQL is an informal or practical rendering of the relational data model with syntax.
Database schema example
STUDENT [Name][Student_number][Class][Major] <-- a one line table COURSE [course_name][course_number][credit_hours][department] . . .
Key constraints
Superkey of R: Is a set of attributes SK of R with the following condition: No two tuples in any valid relation state r(R) will have the same value for SK That is, for any distinct tuples t1 and t2 in r(R), t1[SK] != t2[SK] This condition must hold in any valid state r(R) Key of R: A "minimal" superkey That is, a key is a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey (does not possess the superkey uniqueness property) A Key is a Superkey but not vice versa, because the superkey need not be minimal.
Schema elements
Tables, constraints, views, domains, and other constructs
Date data type
Ten positions Components are YEAR, MONTH, and DAY in the form YYYY-MM-DD
Database state
The actual data stored in a database at a particular moment in time. This includes the collection of all the data in the database. Also called database instance (or occurrence or snapshot).
Database Schema
The description of a database, includes descriptions of DB structure, data types and the constraints on the database
Entity integrity
The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R).
Formal def - state
The relation state is a subset of the Cartesian product of the domains of its attributes each domain contains the set of all possible values the attribute can take.
relational model
The relational Model of Data is based on the concept of a Relation Proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82.
Data Abstraction
The separation of the logical view of data from its implementation, kinda like a table, CS 332 abstraction states
referential integrity constraint
The value in the foreign key column (or columns) FK of the the referencing relation R1 can be either: (1) a value of an existing primary key value of a corresponding primary key PK in the referenced relation R2, or (2) a null.
Inherent or Implicit Constraints
These are based on the data model itself. (E.g., relational model does not allow a list as a value for any attribute)
Application based or semantic constraints
These are beyond the expressive power of the model and must be specified and enforced by the application programs.
Schema-based or Explicit Constraints
They are expressed in the schema by using the facilities provided by the model. (Key constraint and foreign key constraints in Relational model).
use mini-word relationships in order to build databases, ex: part of a university
University has: STUDENTs COURSEs SECTIONs (of COURSEs) (academic) DEPARTMENTs INSTRUCTORs SECTIONs are of specific COURSEs STUDENTs take SECTIONs INSTRUCTORs teach SECTIONs COURSEs have prerequisite COURSEs COURSEs are offered by DEPARTMENTs STUDENTs major in DEPARTMENTs
DDL (Data Definition Language)
Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas.
DML (Data Manipulation Language)
Used to specify database retrievals and updates DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as COBOL, C, C++, or Java.
Three-Schema Architecture
User level (external schema) -> conceptual level (conceptual schema) -> internal level (physical schema)
Boolean data type
Values to true or false or null
Conceptual Schema
at the conceptual level to describe the structure and constraints for the whole (entire) database for a community of users.
External Schema
at the external level to describe the various user views.
Internal schema
at the internal level to describe physical storage structures and access paths
Database end users
casual: access database occasionally when needed Naive or parametric: They make up a large section of the end-user population Sophisticated: include business analysts, scientists, engineers, others thoroughly familiar with the system capabilities Stand-alone: mostly maintain personal databases using ready-to-use packaged applications
definition summary
informal terms to formal terms Table -------------- Relation Column header ---- Attribute All poss col vals ---- Domain Row ------------- Tuple Table def -------Schema of a relation Populated table--State of the relation
Database transactions
may read some data and "update" certain values or generate new data and store that in the database
Timestamp data types Includes date and time
optional WITH TIME ZONE qualifier
how to account for duplicates in query
use the distinct keyword SELECT DISTINCT SALARY FROM EMPLOYEE; Lists all the salaries without distinct values
Key of a relation
Each row has a value of a data item (or set of items) that uniquely identifies that row in the table Called the key In the STUDENT table, SSN is the key
Bit string data types
Fixed length: BIT(n) Varying length: BIT VARYING(n)
Character-string data types
Fixed length: CHAR(n), CHARACTER(n) Varying length: VARCHAR(n), CHAR, VARYING(n), CHARACTER VARYING (n)
Formal def - summary
Formally, Given R(A1, A2, .........., An) r(R) is a subset of dom(A1) X dom(A2) X ....X dom(An) (relation state)
Attributes
Each column has a column header that gives an indication of the meaning of the data items in that column In the formal model, the column header is called an attribute name (or just attribute)
Update violations
UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified Any of the other constraints may also be violated, depending on the attribute being updated: Updating the primary key (PK): Similar to a DELETE followed by an INSERT Need to specify similar options to DELETE Updating a foreign key (FK): May violate referential integrity Updating an ordinary attribute (neither PK nor FK): Can only violate domain constraints