DBMS Exam 1 (Chapters 1 - 4)
basic functions of application programs
- Create and process forms - Process user queries - Create and process reports - Execute application logic - Control the application itself
Functions of a DBMS
- Create database - Create tables - Create supporting structures (e.g., indexes) - Modify (insert, update, or delete) database data - Read database data - Maintain database structures - -- Enforce rules - Control concurrency - Perform backup and recovery
DBMS least to most powerful
- Microsoft Access (ADE) - Oracle MySQL - Microsoft SQL Server - IBM DB2 - Oracle's Oracle Database
Practical Problems in Designing Databases from Existing Data
- The multivalue, multicolumn problem - Inconsistent values - Missing values - General-purpose remarks column
two common types of database redesign
- database migration - integrating two or more databases
x --> y. What is "x -->y"? What is x?
- functional dependency (from x we can conclude what y is) - x is called the determinant
Relation Requirements
- the rows of the table must store data about an entity - the columns of the table must store data about the characteristics of those entities. - the names of the columns are unique; no two columns in the same relation may have the same name. - all of the values in a column are of the same kind The order of the columns is unimportant. The order of the rows is unimportant. No two rows may be identical
A null value can indicate one of three conditions:
- the value is inappropriate - the value is appropriate but unknown - the value is appropriate and known, but no one has entered it into the database.
SQL AS keyword (example usage)
/* *** SQL-Query-CH02-33 *** */ SELECT SUM(OrderTotal) AS OrderSum FROM RETAIL_ORDER;
HAVING clause case usage
/* *** SQL-Query-CH02-49 *** */ SELECT Department, COUNT(SKU) AS NumberOfCatalogItems FROM CATALOG_SKU_2017 WHERE CatalogPage IS NOT NULL GROUP BY Department HAVING COUNT(SKU) > 2;
Advantages of normalization
1. Eliminate modification anomalies 2. Reduce duplicated data - Eliminate data integrity problems - Save file space 3. Single table queries will run faster
Process for Putting a Relation into BCNF
1. Identify every functional dependency. Process for Putting a Relation into BCNF 2. Identify every candidate key. 3. If there is a functional dependency that has a determinant that is not a candidate key (Note: if there is more than one such functional dependency, start with the one with the most columns).: A. Move the columns of that functional dependency into a new relation. B. Make the determinant of that functional dependency the primary key of the new relation. C. Leave a copy of the determinant as a foreign key in the original relation. D. Create a referential integrity constraint between the original relation and the new relation. 4. Repeat step 3 until every determinant of every relation is a candidate key.
disadvantages of normalization
1. More complicated SQL required for multi-table subqueries and joins 2. Extra work for DBMS can mean slower applications
concurrency
A DBMS controls concurrency by ensuring that one user's work does not inappropriately interfere with another user's work.
transitive dependency
A condition in which an attribute is dependent on another attribute that is not part of the primary key
database migration
A database is adapted to new or changing requirements. In the migration process, tables may be created, modified, or removed; relationships may be altered; data constraints may be changed; and so forth.
Why are databases self discribing?
A database is self-describing because it contains a description of itself. Thus, databases contain not only tables of user data, but also tables of data that describe that user data. Such descriptive data is called metadata because it is data about data.
relational database
A database that represents data as a collection of tables in which all data relationships are represented by common values in related tables
client-server architecture
A network design in which client computers use a centrally administered server to share data, data storage space, and devices. USERS use DIVICES on which they have CLIENT applications to access SERVICES. These services are processed by SERVERS, which hold the DATABASES with the data needed (for the client apps).
foreign key
A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables
3NF
A relation is in 3NF if and only if it is in 2NF and there are no non-key attributes determined by another non-key attribute.
first normal form (1NF)
A relation that has a primary key and in which there are no repeating groups.
recursive relationship (definition and example)
A relationship found within a single entity type (a relationship between columns in the same table). For example, an EMPLOYEE is married to an EMPLOYEE or a PART is a component of another PART.
referential integrity constraint
A rule that states that either each foreign key value must match a primary key value in another relation or the foreign key value must be null. Ex. SKU in ORDER_ITEM must exist in SKU in SKU_DATA
null value
A special field value, distinct from zero, blank, or any other value, that indicates that the value for the field is missing or otherwise unknown.
surrogate key
A system-assigned primary key, generally numeric and auto-incremented. If the numbers used in primary key columns such as StudentNumber and ClassNumber are automatically generated and assigned in the database itself
SQL subquery
An SQL subquery is an SQL query statement used to determine a set of values that are provided (or returned) to the SQL query (often referred to as the top level query) that used (or called) the subquery. A subquery is often described as a nested query or a query within a query
a table
An arrangement of data made up of horizontal rows and vertical columns.
we can eliminate anomalies due to functional dependencies by placing all tables in _____
BCNF
Index
Because this is a large database, searching through the table to find, for example, all employees in the accounting department would take a long time. To improve performance, we can create an index (akin to the index at the back of a book) for DepartmentName to show which employees are in which departments. Such an index is an example of a supporting structure that is created and maintained by a DBMS.
BCNF
Boyce-Codd Normal Form A relation is in BCNF if and only if it is in 3NF and every determinant is a candidate key The only way a relation in 3NF can have problems actually requiring further normalization work to get it into BCNF is if it has overlapping composite candidate keys. If the relation (1) does not have composite candidate keys or (2) has non-overlapping composite candidate keys, then it is already in BCNF once it is in 3NF.
POS system
Businesses use point of sale (POS) systems to record every purchase in a database, to monitor inventory, and, if you have a sales promotion card from the store (the one you use to get those special prices for "card holders only"), to keep track of everything you buy for marketing purposes. All the data POS systems gather is stored in, of course, a database.
CHAR vs VARCHAR
CHAR columns use a fixed number of bytes to store the data, whereas VARCHAR columns adjust the number of bytes used to fit the actual length of the data
AVG
Calculate the average of all values (numeric columns only)
MAX
Calculate the maximum value of all values
MIN
Calculate the minimum value of all values
SUM
Calculate the sum of all values (numeric columns only)
COUNT({Name})
Count the number of rows in the table where column {Name} IS NOT NULL
If we want to count the unique values of Department, we need to use the SQL ______keyword as follows: (example for this table)
DISTINCT keyword; SELECT COUNT(DISTINCT Department) AS DeptCount FROM SKU_DATA;
Open Database Connectivity (ODBC)
Database middleware developed by Microsoft to provide a database access API to Windows applications.
e-commerce database application
Databases are a key component of e-commerce order entry, billing, shipping, and customer support. Surprisingly, however, the largest databases at an e-commerce site are not order-processing databases. The largest databases are those that track customer browser behavior
Digital dashboards (function)
Digital dashboards and other reporting systems assess past and current performance.
NOT LIKE
Does NOT match a sequence of characters
columns (def and alternative name)
Each column of a table stores a characteristic common to all rows Columns are also known as fields
row (def and alternative name)
Each row of a table has data about a particular occurrence, or instance of the thing of interest also called records
In 1970, a then little-known IBM engineer named __________- (better known as just _____) published a paper in the _______________ in which he applied the concepts of a branch of mathematics called relational algebra to the problem of "shared data banks," as databases were then known. The results of this work are now the relation_______ for databases, and all ______ DBMS products are built on this model.
Edgar Frank Codd; E. F. Codd; Communications of the ACM relational model; relational database;
referential integrity constraints
Every value in a foreign key must have a corresponding, valid value in the referenced primary key
RTRIM (meaning plus example)
For example, if we store the value "four" in a CHAR(8) column named Number, the DBMS will actually store "four " ("four" plus four spaces). To deal with these extra spaces, we use the RTRIM function to remove the trailing blanks before the comparison is performed: WHERE RTRIM(Number) LIKE 'four';
Funny widely known phrase to help with 1NF, 2NF, 3NF and BCNF. Explain
I swear to construct my tables so that all non-key columns are dependent on ■ the key, [This is 1NF] ■ the whole key, [This is 2NF] ■ and nothing but the key, [This is 3NF and BCNF] so help me Codd!
Over time, Codd convinced IBM management to develop relational model DBMS products. The result was ________ and its variants, which are still very popular today
IBM's DB2
SQL ORDER BY clause
If you want the DBMS to display the rows in a particular (default is ascending); ASC; DESC
entity integrity constraint
In order to function properly, a primary key, whether it is a single column or a composite key, must have unique data values inserted into every row of the table or in other words ... in a relational table, no primary key column can have null (empty) values
composite keys.
Keys that have two or more columns
if (A, B) --> C, then can we say that A-->C?
No. Neither A nor B will determine C by itself
fifth normal form (5NF), also known as
Project-Join Normal Form (PJ/NF)
SQL JOIN ON syntax
SELECT * FROM RETAIL_ORDER JOIN ORDER_ITEM ON RETAIL_ORDER.OrderNumber = ORDER_ITEM.OrderNumber ORDER BY RETAIL_ORDER.OrderNumber, ORDER_ITEM.SKU;
SQL LEFT OUTER JOIN syntax
SELECT StudentPK, StudentName, LockerFK, LockerPK, LockerType FROM STUDENT LEFT OUTER JOIN LOCKER ON STUDENT.LockerFK = LOCKER.LockerPK ORDER BY StudentPK;
SQL RIGHT OUTER JOIN syntax
SELECT StudentPK, StudentName, LockerFK, LockerPK, LockerType FROM STUDENT RIGHT OUTER JOIN LOCKER ON STUDENT.LockerFK = LOCKER.LockerPK ORDER BY LockerPK;
The basic form of SQL queries uses the SQL __________ framework
SELECT/FROM/WHERE
In the SQL SELECT statement, the _____ clause and the _____ clause are the only required clauses in the statement.
SELECT; FROM
One way to limit the results of this query is to use the _____ function. + ex.
SQL TOP {NumberOfRows} SELECT TOP 5 * FROM SKU_DATA;
There are five standard SQL built-in aggregate functions for performing arithmetic on table columns: (name them)
SUM, AVG, MIN, MAX, and COUNT
normalization theory can be divided into three major categories: ....
Source of anomaly: - functional dependencies, (1,2,3NF and BCNF) - multivalued dependencies (4NF) - data constraints and odd conditions (5NF and DK/NF)
SQL (What does it stand for? What is it?)
Structured Query Language (SQL), a database processing language, to query database data
Why do we say reduce data duplication rather than eliminate data duplication?
The answer is that we cannot eliminate all duplicated data because we must duplicate data in foreign keys
Rules for referencing foreign keys
The foreign key does not need to have the same name as the primary key it references—it only has to contain the same type of data!
data sublanguage
The language that is used to describe a database to a DBMS. A data sublanguage consists of at least two parts: Data Definition Language (DDL) and a Data Manipulation Language (DML) it has only those statements needed for creating and processing database data and metadata
INTERSECT
The result is all the row values common to both tables
UNION
The result is all the row values in one or both tables
EXCEPT
The result is all the row values in the first table but not the second
domain
The set of possible data values for a column
Explain digital dashboards and data mining applications
These applications use the data generated by order processing and other operational systems to produce information to help manage the enterprise. Such applications do not generate new data, but instead summarize existing data to provide insights to management.
composite key
Two or more fields that collectively define the primary key by unique combinations of their values. When more than one column in a table must be combined to form the primary key
always place the _____ clause before the _____ clause WHERE, GROUP BY; or GROUP BY, WHERE?
WHERE, GROUP BY;
equijoin
When the tables are joined using an inner join with an is equal to condition (like the one on OrderNumber)
A database schema
a "map" of data tables and their relationships to one another A complete logical view of the database, containing all the tables, all the columns in each table, the primary key of each table (indicated by underlining the column names of the primary key columns), and the foreign keys that link the tables together (indicated by italicizing the column names of the foreign key columns)
a key
a combination of one or more columns that is used to identify particular rows in a relation.
database management system (DBMS)
a computer program used to create, process, and administer the database. The DBMS receives requests encoded in SQL and translates those requests into actions on the database. The DBMS is a large, complicated program that is licensed from a software vendor; companies almost never write their own DBMS programs.
BI systems typically store their associated data in _______
a data warehouse
When Are Determinant Values Unique?
a determinant is unique in a relation only if it determines every other column in the relation
A candidate key
a determinant that determines all of the other columns in a relation
a set
a group of distinct items
In database structures, a recursive relationship is ...
a relationship between two columns in the same table
Integrated table
a table that stores both data and the relationships among the data
Unlike functional dependencies, the anomalies from multivalued dependencies are so serious that multivalued dependencies should....
always be eliminated.
A surrogate key
an artificial column that is added to a table to serve as the primary key.
A non-key attribute
an attribute that is neither (1) a candidate key itself nor (2) part of a candidate key
Looking at the statement syntax in SQLQuery-CH02-60, note that the SQL JOIN keyword is not used anywhere in the SQL statement—therefore, this is _________
an implicit inner join
customer relationship management (CRM) system
an information system that manages customer contacts from initial solicitation through acceptance, purchase, continuing purchase, support, and so forth.
An enterprise resource planning (ERP) system
an information system that touches every department in a manufacturing company. It includes sales, inventory, production planning, purchasing, and other business functions
indexes
are structures that speed the sorting and searching of database data
Venn diagrams
are the standard method of visualizing sets and their relationships.
SQL wildcard character
asterisk (*) - to indicate that we want all the columns to be displayed
Column equivalent names
attribute, field
Read-only databases are used in
business intelligence (BI) systems data warehouse
SQL SELECT clause specifies which _____ are to be listed in the query results
columns
SQL DML CRUD
create, read, update, and delete
SQL is not a complete programming language, like Java or C#. Instead, it is called a ______
data sublanguage
data integrity problems
deletion anomaly, insertion anomaly, update anomaly produce incorrect and inconsistent information users lose confidence in information the system gets a poor reputation can only occur if data are duplicated.
SQL/Persistent Stored Modules (SQL/PSM) statements
extend SQL by adding procedural programming capabilities, such as variables and flow-of-control statements, that provide some programmability within the SQL framework.
ETL system
extract, transform and load
decomposition rule
if A --> (B, C), then it is true that A --> B and A -->C
the union rule
if A --> B and A --> C, then it is true that A --> (B, C)
2NF
if and only if it is in 1NF and all non-key attributes are determined by the entire primary key the only way a non-key attribute can be dependent on part of the primary key is if there is a composite primary key. This means that relations with single-attribute primary keys are automatically in 2NF
business intelligence (BI) systems
information systems used to support management decisions by producing information for assessment, analysis, planning, and control
<>
is NOT equal to
IS NOT NULL
is NOT equal to NULL
NOT IN
is NOT equal to one of a set of values
NOT BETWEEN
is NOT within a set of range of numbers (includes the end points)
A foreign key
is a column or composite of columns that is the primary key of a table other than the one in which it appears.
database application
is a set of one or more computer programs that serves as an intermediary between the user and the DBMS. Application programs read or modify database data by sending SQL statements to the DBMS. Application programs also present data to users in the format of forms and reports. Application programs can be acquired from software vendors, and they are also frequently written in-house. The knowledge you gain from this text will help you write database applications.
=
is equal to
IS NULL
is equal to NULL
IN
is equal to one of a set of values
Database design (as a process)
is the creation of the proper structure of database tables, the proper relationships between tables, appropriate data constraints, and other structural components of the database.
database system
is typically defined to consist of four components: - users - the database application - the database management system (DBMS) - the database + SQL (in the textbook)
BETWEEN
is within a range of numbers (includes the end points)
LIKE
matches a sequence of characters
If you are constructing an updatable database, then you need to be concerned about __________ and _______
modification anomalies and inconsistent data
Relations are categorized into ________ based on the kinds of problems that they have
normal forms
Database professionals use a set of principles, collectively called _______, or ____________, to guide and assess database designs.
normalization; normal forms
Inconsistent values
occur when different users, or different data sources, use slightly different forms of the same data value Inconsistent data values are particularly problematic for primary and foreign key columns. Relationships will be missing or wrong when foreign key data are coded inconsistently or misspelled.
multivalued dependency
occurs when a determinant is matched with a particular set of values
We can divide database systems and DBMS products into two classes:
personal database systems and enterprise-class database systems.
Data mining applications (function)
predict future performance.
Each row in a table is uniquely identified by a ______
primary key
Access uses a GUI style called ________ to simplify ad-hoc queries.
query by example (QBE)
This SQL statement is a _____ statement, which asks _________ to obtain specific data from a database
query; the DBMS;
ad-hoc queries
questions that can be answered using database data
data
recorded facts and numbers.
Table equivalent names
relation, file
SQL underscore (_) wildcard character +ex.
represents a single, unspecified character in a specific position in the character string SELECT * FROM SKU_DATA WHERE Buyer LIKE 'Pete%'; or WHERE SKU_Description LIKE '%Tent%';
SQL percent sign (%) wildcard character +ex.
represents any sequence of contiguous, unspecified characters (including spaces) in a specific position in the character string. SELECT * FROM SKU_DATA WHERE Phone LIKE '330 388 999_';
domain integrity constraint
requirement that all of the values in a column are of the same kind Ex. all first columns have the ID, all second columns have FirstName and so on
What is considered when creating a database design for new system development
requirements for a new system, such as desired data entry forms and reports, user requirements statements, use cases (scenarios of users interacting with an information system to obtain a desired result), and other requirements, are analyzed to create the database design
COUNT(*)
returns the number of cells (rows) in a range that contains numbers
The SQL WHERE clause specifies which ______ are to be listed in the query result
rows
■ The SQL WHERE clause specifies which _____ will be used to determine the groups. ■ The SQL HAVING clause specifies which ____ will be used in the final result
rows; groups;
A primary key should be
short and (if possible) numeric
the only reason for having relations is to...
store instances of functional dependencies.
online transaction processing (OLTP)
system and is used to record all sales transactions of the company (whether in a store, on the Web, or from mail order or phone order sales).
The SQL FROM clause specifies which _____ are to be used in the query
tables
At Microsoft, the current DBMS engine within Microsoft Access is called _________
the Access Database Engine (ADE)
SQL join operation (def + implicit/ explicit)
the SQL JOIN operator is used to combine two or more tables by concatenating (sticking together) the rows of one table with the rows of another table. If the JOIN operator is actually used as part of the SQL statement syntax, we refer to the join operation as an explicit join. If the JOIN operator itself does not appear in the SQL statement, we refer to the join operation as an implicit join.
database integrity
the data in the database is accurate and consistent at all times.
Users
the final component of a database system, employ a database application to keep track of things. They use forms to read, enter, and query data, and they produce reports to convey information.
The Internet (origin info)
the global computer network of networks—was created as the ARPANET in 1969 and then grew and was used to connect all the LANs (and other types of networks).
CROSS JOIN (def, alt name, result)
the result is what is mathematically known as the Cartesian product of the rows in the tables, which means that this statement will just stick every row of one table together with every row of the second table
Subqueries are very powerful, but as noted, they do have a serious limitation: ...
the selected data can come only from the top-level table. Therefore, we cannot use a subquery to display data obtained from more than one table. To do so, we must use an SQL join instead.
what is used to create the relationships between the tables in databases?
the value of primary keys
Row equivalent names
tuple, record
Data definition language (DDL) statements
used for creating tables, relationships, and other structures
Data manipulation language (DML) statements
used for querying, inserting, modifying, and deleting data.
SQL INSERT statement + ex
used to add rows of data to a table INSERT INTO EQUIPMENT_ITEM SELECT DISTINCT ItemNumber, EquipmentType, AcquisitionCost FROM EQUIPMENT_REPAIR;
Data control language (DCL) statements
used to grant database permissions (or to revoke those permissions) to users and groups so that the users or groups can perform various operations on the data in the database.
Transaction control language (TCL) statements
used to mark transaction boundaries and control transaction behavior.
functional dependency
when the value of one or more attributes determines the value of another attribute
data warehouse
which is a database system that has data, programs, and personnel that specialize in the preparation of data for BI processing
Guidelines for Assessing Table Structure
• Count rows and examine columns • Examine data values and interview users to determine: - Multivalued dependencies - Functional dependencies - Candidate keys - Primary keys - Foreign keys • Assess validity of assumed referential integrity constraints
SQL statement categories
■ Data definition language (DDL) statements ■ Data manipulation language (DML) statements ■ SQL/Persistent Stored Modules (SQL/PSM) statements ■ Transaction control language (TCL) statements ■ Data control language (DCL) statements
Three types of database design
■ Database design from existing data ■ Database design for new systems development ■ Database redesign of an existing database
information
■ Knowledge derived from data ■ Data presented in a meaningful context ■ Data processed by summing, ordering, averaging, grouping, comparing, or other similar operations
SQL provides two different techniques for querying data from multiple tables: (name them)
■ The SQL subquery ■ The SQL join
three constraints, taken as a whole, is to create database integrity:
■ The domain integrity constraint ■ The entity integrity constraint ■ The referential integrity constraint