Data Warehousing & Data Mining

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Transaction

(1) A group of actions that is performed on the database atomically; either all actions are committed to the database, or none of them are. (2) In the business world, the record of an event.

Key

(1) A group of one or more attributes that identify a unique row in a relation. Because relations cannot have duplicate rows, every relation must have at least one key that is the composite of all the attributes in the relation. A key is sometimes called a logical key. (2) With some relational DBMS products, an index on a column used to improve access and sorting speed. It is sometimes called a physical key. See also nonunique key, unique key, and physical key.

Field

(1) A logical group of bytes in a record used with file processing. (2) In the context of the relational model, a synonym for attribute.

Data model

(1) A model of users' data requirements, usually expressed in terms of the entity-relationship model. It is sometimes called a users' data model. (2) A language for describing the structure and processing of a database.

Maximum cardinality

(1) The maximum number of values that an attribute can have within a semantic object. (2) In a relationship between tables, the maximum number of rows to which a row of one table can relate in the other table.

Primary Key

A candidate key selected to be the key of a relation. Uniquely identifies the records (rows) in a table (relation).

Commit

A command issued to a DBMS to make database modifications permanent. After the command has been processed, the changes are written to the database and to a log in such a way that they will survive system crashes and other failures. A commit is usually used at the end of an atomic transaction. Contrast this with rollback.

Concurrent transactions

A condition in which two or more transactions are processed against a database at the same time. In a single CPU system, the changes are interleaved; in a multi-CPU system, the transactions can be processed simultaneously, and the changes on the database server are interleaved.

Deadlock

A condition that can occur during concurrent processing in which each of two (or more) transactions is waiting to access data that the other transaction has locked. It also is called the deadly embrace.

Data mart

A facility similar to a data warehouse, but with a restricted domain. Often, the data are restricted to particular types, business functions, or business units.

Composite key

A key of a relation that consists of two or more columns.

Structured Query Language (SQL)

A language for defining the structure and processing of a relational database. It can be used as a stand-alone query language, or it can be embedded in application programs. SQL was developed by IBM and is accepted as a national standard by the American National Standards Institute.

Data manipulation language (DML)

A language used to describe the processing of a database.

Data definition language (DDL)

A language used to describe the structure of a database.

Column

A logical group of bytes in a row of a relation or a table. The meaning of a column is the same for every row of the relation.

Fifth normal form (5NF)

A normal form necessary to eliminate an anomaly where a table can be split apart but not correctly joined back together. Also known as Project-Join Normal Form (PJ/NF).

Entity instance

A particular occurrence of an entity; for example, Employee 100 (an EMPLOYEE) and Accounting Department (a DEPARTMENT).

Cascading deletion

A property of a relationship that indicates that when one row is deleted, related rows should be deleted as well.

Cascading update

A referential integrity action specifying that when the key of a parent row is updated, the foreign keys of matching child rows should be updated as well.

Fourth normal form (4NF)

A relation in BCNF in which every multivalued dependency is a functional dependency.

Second normal form (2NF)

A relation in first normal form in which all non-key attributes are dependent on all the keys.

Third normal form (3NF)

A relation in second normal form that has no transitive dependencies.

Join operation

A relational algebra operation on two relations, A and B, that produces a third relation, C. A row of A is concatenated with a row of B to form a new row in C if the rows in A and B meet restrictions concerning their values. For example, A1 is an attribute in A, and B1 is an attribute in B. The join of A with B in which (A1 = B1) will result in a relation, C, having the concatenation of rows in A and B in which the value of A1 is equal to the value of B1. In theory, restrictions other than equality are allowed—a join could be made in which A1 < B1. However, such non-equal joins are not used in practice. Also known as inner join. See also natural join,compare to outer join.

Functional dependency

A relationship between attributes in which one attribute or group of attributes determines the value of another. The expressions X → Y, "X determines Y," and "Y is functionally dependent on X" mean that given a value of X, we can determine the value of Y.

Identifying relationship

A relationship that is used when the child entity is ID-dependent upon the parent entity.

Child

A row, record, or node on the many side of a one-to-many relationship. See also parent.

Database

A self-describing collection of related records, or, for relational databases, of related tables.

Entity class

A set of entities of the same type; for example, several EMPLOYEE entity instances make up the EMPLOYEE class.

Database management system (DBMS)

A set of programs used to define, administer, and process a database and its applications.

Modification anomaly

A situation that exists when the storing of one row in a table records facts about two themes or the deletion of a row removes facts about two themes, or when a data change must be made in multiple rows for consistency.

Business rule

A statement of a policy in a business that restricts the ways in which data can be inserted, updated, or deleted in the database.

Data warehouse

A store of enterprise data that is designed to facilitate management decision making. A data warehouse includes not only data, but also metadata, tools, procedures, training, personnel information, and other resources that make access to the data easier and more relevant to decision makers.

Intersection table

A table (also called a relation) used to represent a many-to-many relationship. It contains the keys of the relations in the relationship. If it contains nonkey columns, it is called an association table. See association entity.

SQL: ALTER TABLE

Adding Column ALTER TABLE Employee ADD EmpID Char(10) NOT NULL; Changing Column (data type, null constraint, unique) ALTER TABLE Employee ALTER EmpID Char(10) NOT NULL UNIQUE; Deleting Column ALTER TABLE Employee DROP EmpID;

Candidate key

An attribute or a group of attributes that identifies a unique row in a relation. One of the candidate keys is chosen to be the primary key.

Foreign Key

An attribute that is a key of one or more relations other than the one in which it appears.

ID-dependent entity

An entity that cannot logically exist without the existence of another entity. APPOINTMENT, for example, cannot exist without CLIENT to make the appointment. To be an ID-dependent entity, the identifier of the entity must contain the identifier of the entity on which it depends. Such entities are a subset of a weak entity. See also strong entity and weak entity.

Association entity

An entity that represents the combination of at least two other objects and that contains data about that combination. It is often used in contracting and assignment applications.

Concurrent update problem

An error condition in which one user's data changes are overwritten by another user's data changes. Also called lost update problem.

Composite identifier

An identifier of an entity that consists of two or more attributes.

First normal form (1NF)

Any table that fits the definition of a relation.

SQL: Aggregation Functions

COUNT - Counts the number of rows that match the specified criteria MIN - Finds the minimum value for a specific column for those rows matching the criteria MAX - Finds the maximum value for a specific column for those rows matching the criteria SUM - Calculates the sum for a specific column for those rows matching the criteria AVG - Calculates the numerical average of a specific column for those rows matching the criteria

SQL: CREATE TABLE

CREATE TABLE EMPLOYEE ( EmpID char(10) PRIMARY KEY, LastName varchar(20) NOT NULL, FirstName varchar(20) NOT NULL, MiddleName varchar(20) NULL, SSN char(9) UNIQUE NOT NULL );

SQL: CREATE TABLE (PK Constraint)

CREATE TABLE TABLE_X( colA char(10) NOT NULL, colB varchar(10) NOT NULL, CONSTRAINT TblX_PK PRIMARY KEY (colA, colB) );

SQL: DELETE

DELETE FROM TableName WHERE <match criteria>;

SQL: DROP TABLE

DROP TABLE EMPLOYEE;

Metadata

Data concerning the structure of data in a database stored in the data dictionary. Metadata are used to describe tables, columns, constraints, indexes, and so forth.

SQL: ALTER TABLE (Constraints)

Example: adding primary key constraints to an existing table ALTER TABLE Employee ADD CONSTRAINT Emp_PK PRIMARY KEY(EmpID); Example: adding foreign key constraint to an existing table ALTER TABLE Employee ADD CONSTRAINT Emp_Dept_FK FOREIGN KEY(DeptID) REFERENCES DEPT(DeptID); Example: Dropping constraint in an existing table ALTER TABLE Employee DROP CONSTRAINT Emp_Dept_FK;

SQL: INSERT

INSERT INTO TableName VALUES(value1, value2, value3, ....); INSERT INTO TableName (colName1, colName2, colName3, ...) VALUES(value1, value2, value3, ....);

Cardinality

In a binary relationship, the maximum or minimum number of elements allowed on each side of the relationship. The maximum cardinality can be 1:1, 1:N, N:1, or N:M. The minimum cardinality can be optional/optional, optional/mandatory, mandatory/optional, or mandatory/mandatory.

Insertion anomaly

In a relation, a condition that exists when, to add a complete row to a table, one must add facts about two or more logically different themes.

Minimum cardinality

In a relationship between tables, the minimum number of rows to which a row of one table can relate in the other table.

Identifier

In an entity, a group of one or more attributes that determine entity instances. See also nonunique identifier and unique identifier.

Discriminator

In the entity-relationship model, an attribute of a supertype entity that determines which subtype pertains to the supertype.

Degree

In the entity-relationship model, the number of entities participating in a relationship.

Business intelligence (BI) systems

Information systems that assist managers and other professionals in analyzing current and past activities and in predicting future events. Two major categories of BI systems are reporting systems and data mining systems.

Determinant

One or more attributes that functionally determine another attribute or attributes. In the functional dependency (A, B) → D, C, the attributes (A, B) are the determinant.

SQL: JOIN

SELECT <columns> FROM <Table1> <JOIN TYPE> <Table2> ON <Table1.columnA = Table2.columnB> [AND Table1.columnY = Table2.columnZ] [AND Table1.columnX = Table2.columnY] WHERE <record match conditions> GROUP BY <columns> HAVING <group match conditions>;

Entity

Something of importance to a user that needs to be represented in a database. In the entity-relationship model, entities are restricted to things that can be represented by a single table. See also strong entity and weak entity.

SQL: SELECT

Syntax: SELECT <list of column expressions> FROM <list of tables and join operations> WHERE <list of logical expressions for rows> GROUP BY <list of grouping columns> HAVING <list of logical expressions for groups> ORDER BY <list of sorting specifications>

SQL: Numerical Operators

The WHERE/HAVING clause match criteria may include Equals "=" Not Equals "<>" Greater than ">" Less than "<" Greater than or Equal to ">=" Less than or Equal to "<=" BETWEEN value1 and value2 / NOT BETWEEN value1 and value2

Denormalization

The process of intentionally designing a relation that is not normalized. Denormalization is done to improve performance or security.

Data mining application

The use of statistical and mathematical techniques to find patterns in database data.

SQL: UPDATE

UPDATE TableName SET column1 = value1, column2 = value2, .... WHERE <match criteria>;

Drill down

User-directed disaggregation of data used to break higher-level totals into components.

SQL: Logical Operators

WHERE/HAVING: AND - an intersection of the data sets OR - a union of the data sets WHERE only: LIKE 'text pattern' / NOT LIKE 'text pattern' IN (list of values) / NOT IN (list of values) IS NULL / IS NOT NULL


Kaugnay na mga set ng pag-aaral

Ms. Bermudez Unit 11 Quiz Water Cycle & Ocean Currents E.S.S

View Set

Anatomy Test Chapter 7 Multiple Choice

View Set

Heritage Studies 6 Ancient India

View Set

Unit 3 Vocabulary: Fractions and Decimals

View Set

Health Portion Chapter 6 Health Insurance Policy Provisions

View Set