ITP 487

Ace your homework & exams now with Quizwiz!

Subquery Limitation

The selected data can only come from the top-level table. We cannot use a subquery to obtain data that arise from more that one table.

IN/NOT IN

Creating conditions to list values within the list

Subqueries and joins are used...

to query multiple tables

HAVING

(Condition) The HAVING operator restricts the groups that are presented in the result.

WHERE

(Condition)

Generic Algorithms

"Survival of the fittest". Create initial population of individuals. Evaluate fitness of individuals. Select the individuals. Apply genetic operations: Crossover and Mutation. Repeat until complete.

Evolution Phase/Enabling Tech (1980s)

1. Data Access 2. Online Transaction Processing (Relational Databases)

Evolution Phase/Enabling Tech (1960s)

1. Data Collection 2. Data Processing Applications

Evolution Phase/Enabling Tech (today)

1. Data Mining 2. Advanced Algorithms/Increased computing power

Evolution Phase/Enabling Tech (1990s)

1. Data Warehousing & Decision Support 2. Online Analytical Processing (OLAP)

Data Mining Process

1. Data model needs to be defined 2. Mining data store needs to be filled 3. Mining runs need to be defined 4. The results need to be interpreted 5. Results need to be applied

Less than or equal

<=

Not equal

<>

Greater than or equal

>=

Transitive Dependency

A --> B, B --> C, then A --> C transitively.

Primary Key

A candidate key selected as the primary means of identifying rows in a relation: -There is one and only one primary key per relation -The primary key may be a composite key -The ideal primary key is short, numeric and never changes

Keys

A combination of one or more columns that is used to identify rows in a relation

Regression

A data mining technique used to predict a range of numeric values (also called continuous values), given a particular dataset. For example, regression might be used to predict the cost of a product or service, given other variables. Regression is used across multiple industries for business and marketing planning, financial forecasting, environmental modeling and analysis of trends.

SQL

A data sublanguage for creating and processing database data and metadata. Structured Query Language

Composite Determinant

A determinant of a functional dependency that consists of more than one attribute

Composite Key

A key that consists of two or more columns

Candidate Key

A key that determines all of the other columns in a relation

Neural Networks

A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases a neural network is an adaptive system that changes its structure during a learning phase. Neural networks are used to model complex relationships between inputs and outputs or to find patterns in data.

Data Warehouse

A process and architecture that requires robust planning to implement a platform, which consists of the selection, conversion, transformation, consolidation, integration, cleansing and mapping of data (i.e., recent and historical) from multiple operational data sources to a target DBMS that supports an enterprise's decision-making processes and BI systems

Decision Trees

A structure that includes a root node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in the tree is the root node.

Relation

A two-dimensional table that has the following characteristics: -Rows contain data about an entity -Columns contain data about attributes of the entity -All entries in a column are of the same kind -Each column has a unique name -Cells of the table hold a single value -The order of the columns is unimportant -The order of the rows is unimportant -No two rows may be identical

Logical Operators

AND, OR, NOT

Ascending values

ASC (default setting)

Steps to create business intelligence

Access data --> analyze data --> derive insights --> draw conclusions --> make decisions

% or *

Any set of one or more characters

Application Server

Application Layer

Special Operators

BETWEEN, NULL, LIKE, IN, EXISTS

Data Warehouse Objective

Become the single source of truth. Single point of access to all enterprise information

Synonyms for field

Column, Attribute

AND/OR

Creating two different conditions

Descending values

DESC

SQL Subcategories

DML, DDL

Value Chain

Data --> Information --> Knowledge --> Wisdom --> Decision

DDL

Data Definition Language Used for creating tables, relationships, and other structures.

Database Server

Data Layer

DML

Data Manipulation Language Used for queries and data modification (We will concentrate on DML for this course!!!)

NULL

Doesn't show up or has a value of 0

__

Exactly one character

E (ETL)

Extract data from individual systems

ETL

Extract, Transform, and Load. Used to standardize data across systems, that allow it to be queried.

In general, place WHERE before _______

GROUP BY

Drivers of Business Intelligence

Globalization Decentralization Smaller margins Faster reaction to market Increasing importance of service

GROUP BY

Group BY is used to group the data based upon a certain characteristic or trait

SQL always places WHERE before ______

HAVING

Data Collection

Hard to access

Knowledge Management

How do you capture and share knowledge within employees of a company

How to make Data Mining model more intelligent

Increasing the amount of factors to incorporate/consider

Join is performed...

Join is performed when data are retrieved from more than one table at a time

DISTINCT

Limits values to unique values (only if they are different from one another)

Review Slide on ER Notation

Line with one dash - one Line with three branches - Many Line with two dashes - one and only one Line with circle and + - Zero or one Line with dash + branches - One or many Line with circle and branches - Zero or many

*

Lists everything

ORDER BY

Lists output in a specific order

L (ETL)

Load all data into the warehouse

Cardinality

Max number of times an instance in one entity can be associated with instances in the related entity

Ordinality

Min number of times an instance in one entity can be associated with instances in the related entity

INNER JOIN

Most common join! Will return all records in the left table (A) that have a matching recording in the right table (B) SELECT "column name" FROM left table a INNER JOIN right table b ON a.key = b.key

Functional Dependency

Occurs when the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s) A, B --> C, but A alone can't --> C by itself

Operational Data Source

One or more sources of data that will be used to populate the data warehouse

OLAP

Online Analytical Processing Supports managerial decision making (historical point-in-time snapshots and predictions)

OLTP

Online Transactional Processing Runs the business on a current basis (current representation of the state of business)

Data Warehousing Components

Operational data source Data conversion and extraction (ETL) Datawarehouse DBMS Datawarehouse administration BI tools

Client

Presentation Layer

Three-Tier Architecture

Presentation layer, application layer, data layer

Data Conversion and extraction

Provide the capabilities to perform the complex task of integrating data from multiple sources to create a consolidated view of the data, as well as the transformation of data for use by BI applications.

BI Tools

Queries Reports Graphs and charts Dashboard and cockpits

Words are put into...

Quotes (but numbers aren't)

Synonym for row

Record

Regression v. Classification

Regression is used to predict a numeric or continuous value while classification assigns data into discrete categories.

Data Mining Techniques

Regression, Decision Trees, Neural Networks, Association Analysis, Clustering/Classification, Generic Algorithms

Synonym for table

Relation

Natural JOIN

Returns all rows that have matching values in matching columns (eliminates duplicates). Used when tables share one or more common attributes with common names

JOIN USING

Returns only rows with matching values in the column indicated in the USING clause

RIGHT OUTER JOINS

Returns rows matching join condition Returns rows in right side table with unmatched values

LEFT OUTER JOINS

Returns rows matching the join condition Returns rows in left side table with unmatched values

Data Access Platforms

SAP, Oracle Summarizing sales, adding sales

Order of Syntax in statement

SELECT columnlist FROM tablelist [WHERE conditionlist] [GROUP BY columnlist] [HAVING conditionlist] [ORDER BY columnlist [ASC | DESC] ] ;

RIGHT JOIN

Same as left join, but will show all values regardless from (B) but will only have matching values from (A) SELECT "column name" FROM left table a RIGHT JOIN right table b ON a.key = b.key

LIKE

Search (similar to)

Data Warehouse/Decision Support

Slice and dice data, mobility to analyze by dimension

Entity

Some identifiable thing that users want to track (customers, products, sales orders)

Determinant

The attribute on the left side of the functional dependency. The determinant is the attribute that provides information about the attributes in the reset of the row

DBMS

System software for creating and managing databases

Self JOIN

Table is joined within itself

Datawarehouse DBMS

The RDBMS vendors (e.g., Oracle, Teradata, and Microsoft) have significantly increased the amount of research and development to improve support for data warehousing

Foreign Key

The primary key of one relation that is placed in another relation to form a link between the relations: -A foreign key can be a single column or a composite key -The term refers to the fact that key values are foreign to the relation in which they appear as foreign key values

Data Mining

The process of discovering meaningful correlations, patterns, and trends by sifting through large amounts of data stored in repositories.

Association Analysis

To analyze the strength and correction of relationships. Example: likelihood to survive Titanic sinking based on sex, age, and class.

T (ETL)

Transform all data into the same format

SELECT

Used to retrieve data from tables, generating information

Business Intelligence

User-centered process of exploring data, data relationships and trends - thereby helping to improve overall decision making for enterprises.

FULL JOIN

Will return all records from both tables, joining records from (A) that match from (B)

LEFT JOIN

Will return all records in the left table (A) no matter what, but will also return any values that have matching records from the right table (B) SELECT "column name" FROM left table a LEFT JOIN right table b ON a.key = b.key

Datawarehouse Administration

With the increasing number of subject areas and the increasing amount of historical data (i.e., more than five years) typically maintained, a data warehouse requires significant amounts of disk storage, memory and processing power. Someone makes sure the processes run every night

Alias

You add it to identify table source without typing out the entire thing. Add alias after table name in FROM clause

End SQL statements with a...

semi-colon (;)


Related study sets

Chapter 13 - Labor and Birth Processes

View Set

Chapter 7 Health Insurance Underwriting

View Set

Sociology Chapter 12: Life at Home Families and Relationships

View Set

EMT chapter 13- BLS Resuscitation

View Set