ITP 487
Subquery Limitation
The selected data can only come from the top-level table. We cannot use a subquery to obtain data that arise from more that one table.
IN/NOT IN
Creating conditions to list values within the list
Subqueries and joins are used...
to query multiple tables
HAVING
(Condition) The HAVING operator restricts the groups that are presented in the result.
WHERE
(Condition)
Generic Algorithms
"Survival of the fittest". Create initial population of individuals. Evaluate fitness of individuals. Select the individuals. Apply genetic operations: Crossover and Mutation. Repeat until complete.
Evolution Phase/Enabling Tech (1980s)
1. Data Access 2. Online Transaction Processing (Relational Databases)
Evolution Phase/Enabling Tech (1960s)
1. Data Collection 2. Data Processing Applications
Evolution Phase/Enabling Tech (today)
1. Data Mining 2. Advanced Algorithms/Increased computing power
Evolution Phase/Enabling Tech (1990s)
1. Data Warehousing & Decision Support 2. Online Analytical Processing (OLAP)
Data Mining Process
1. Data model needs to be defined 2. Mining data store needs to be filled 3. Mining runs need to be defined 4. The results need to be interpreted 5. Results need to be applied
Less than or equal
<=
Not equal
<>
Greater than or equal
>=
Transitive Dependency
A --> B, B --> C, then A --> C transitively.
Primary Key
A candidate key selected as the primary means of identifying rows in a relation: -There is one and only one primary key per relation -The primary key may be a composite key -The ideal primary key is short, numeric and never changes
Keys
A combination of one or more columns that is used to identify rows in a relation
Regression
A data mining technique used to predict a range of numeric values (also called continuous values), given a particular dataset. For example, regression might be used to predict the cost of a product or service, given other variables. Regression is used across multiple industries for business and marketing planning, financial forecasting, environmental modeling and analysis of trends.
SQL
A data sublanguage for creating and processing database data and metadata. Structured Query Language
Composite Determinant
A determinant of a functional dependency that consists of more than one attribute
Composite Key
A key that consists of two or more columns
Candidate Key
A key that determines all of the other columns in a relation
Neural Networks
A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases a neural network is an adaptive system that changes its structure during a learning phase. Neural networks are used to model complex relationships between inputs and outputs or to find patterns in data.
Data Warehouse
A process and architecture that requires robust planning to implement a platform, which consists of the selection, conversion, transformation, consolidation, integration, cleansing and mapping of data (i.e., recent and historical) from multiple operational data sources to a target DBMS that supports an enterprise's decision-making processes and BI systems
Decision Trees
A structure that includes a root node, branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in the tree is the root node.
Relation
A two-dimensional table that has the following characteristics: -Rows contain data about an entity -Columns contain data about attributes of the entity -All entries in a column are of the same kind -Each column has a unique name -Cells of the table hold a single value -The order of the columns is unimportant -The order of the rows is unimportant -No two rows may be identical
Logical Operators
AND, OR, NOT
Ascending values
ASC (default setting)
Steps to create business intelligence
Access data --> analyze data --> derive insights --> draw conclusions --> make decisions
% or *
Any set of one or more characters
Application Server
Application Layer
Special Operators
BETWEEN, NULL, LIKE, IN, EXISTS
Data Warehouse Objective
Become the single source of truth. Single point of access to all enterprise information
Synonyms for field
Column, Attribute
AND/OR
Creating two different conditions
Descending values
DESC
SQL Subcategories
DML, DDL
Value Chain
Data --> Information --> Knowledge --> Wisdom --> Decision
DDL
Data Definition Language Used for creating tables, relationships, and other structures.
Database Server
Data Layer
DML
Data Manipulation Language Used for queries and data modification (We will concentrate on DML for this course!!!)
NULL
Doesn't show up or has a value of 0
__
Exactly one character
E (ETL)
Extract data from individual systems
ETL
Extract, Transform, and Load. Used to standardize data across systems, that allow it to be queried.
In general, place WHERE before _______
GROUP BY
Drivers of Business Intelligence
Globalization Decentralization Smaller margins Faster reaction to market Increasing importance of service
GROUP BY
Group BY is used to group the data based upon a certain characteristic or trait
SQL always places WHERE before ______
HAVING
Data Collection
Hard to access
Knowledge Management
How do you capture and share knowledge within employees of a company
How to make Data Mining model more intelligent
Increasing the amount of factors to incorporate/consider
Join is performed...
Join is performed when data are retrieved from more than one table at a time
DISTINCT
Limits values to unique values (only if they are different from one another)
Review Slide on ER Notation
Line with one dash - one Line with three branches - Many Line with two dashes - one and only one Line with circle and + - Zero or one Line with dash + branches - One or many Line with circle and branches - Zero or many
*
Lists everything
ORDER BY
Lists output in a specific order
L (ETL)
Load all data into the warehouse
Cardinality
Max number of times an instance in one entity can be associated with instances in the related entity
Ordinality
Min number of times an instance in one entity can be associated with instances in the related entity
INNER JOIN
Most common join! Will return all records in the left table (A) that have a matching recording in the right table (B) SELECT "column name" FROM left table a INNER JOIN right table b ON a.key = b.key
Functional Dependency
Occurs when the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s) A, B --> C, but A alone can't --> C by itself
Operational Data Source
One or more sources of data that will be used to populate the data warehouse
OLAP
Online Analytical Processing Supports managerial decision making (historical point-in-time snapshots and predictions)
OLTP
Online Transactional Processing Runs the business on a current basis (current representation of the state of business)
Data Warehousing Components
Operational data source Data conversion and extraction (ETL) Datawarehouse DBMS Datawarehouse administration BI tools
Client
Presentation Layer
Three-Tier Architecture
Presentation layer, application layer, data layer
Data Conversion and extraction
Provide the capabilities to perform the complex task of integrating data from multiple sources to create a consolidated view of the data, as well as the transformation of data for use by BI applications.
BI Tools
Queries Reports Graphs and charts Dashboard and cockpits
Words are put into...
Quotes (but numbers aren't)
Synonym for row
Record
Regression v. Classification
Regression is used to predict a numeric or continuous value while classification assigns data into discrete categories.
Data Mining Techniques
Regression, Decision Trees, Neural Networks, Association Analysis, Clustering/Classification, Generic Algorithms
Synonym for table
Relation
Natural JOIN
Returns all rows that have matching values in matching columns (eliminates duplicates). Used when tables share one or more common attributes with common names
JOIN USING
Returns only rows with matching values in the column indicated in the USING clause
RIGHT OUTER JOINS
Returns rows matching join condition Returns rows in right side table with unmatched values
LEFT OUTER JOINS
Returns rows matching the join condition Returns rows in left side table with unmatched values
Data Access Platforms
SAP, Oracle Summarizing sales, adding sales
Order of Syntax in statement
SELECT columnlist FROM tablelist [WHERE conditionlist] [GROUP BY columnlist] [HAVING conditionlist] [ORDER BY columnlist [ASC | DESC] ] ;
RIGHT JOIN
Same as left join, but will show all values regardless from (B) but will only have matching values from (A) SELECT "column name" FROM left table a RIGHT JOIN right table b ON a.key = b.key
LIKE
Search (similar to)
Data Warehouse/Decision Support
Slice and dice data, mobility to analyze by dimension
Entity
Some identifiable thing that users want to track (customers, products, sales orders)
Determinant
The attribute on the left side of the functional dependency. The determinant is the attribute that provides information about the attributes in the reset of the row
DBMS
System software for creating and managing databases
Self JOIN
Table is joined within itself
Datawarehouse DBMS
The RDBMS vendors (e.g., Oracle, Teradata, and Microsoft) have significantly increased the amount of research and development to improve support for data warehousing
Foreign Key
The primary key of one relation that is placed in another relation to form a link between the relations: -A foreign key can be a single column or a composite key -The term refers to the fact that key values are foreign to the relation in which they appear as foreign key values
Data Mining
The process of discovering meaningful correlations, patterns, and trends by sifting through large amounts of data stored in repositories.
Association Analysis
To analyze the strength and correction of relationships. Example: likelihood to survive Titanic sinking based on sex, age, and class.
T (ETL)
Transform all data into the same format
SELECT
Used to retrieve data from tables, generating information
Business Intelligence
User-centered process of exploring data, data relationships and trends - thereby helping to improve overall decision making for enterprises.
FULL JOIN
Will return all records from both tables, joining records from (A) that match from (B)
LEFT JOIN
Will return all records in the left table (A) no matter what, but will also return any values that have matching records from the right table (B) SELECT "column name" FROM left table a LEFT JOIN right table b ON a.key = b.key
Datawarehouse Administration
With the increasing number of subject areas and the increasing amount of historical data (i.e., more than five years) typically maintained, a data warehouse requires significant amounts of disk storage, memory and processing power. Someone makes sure the processes run every night
Alias
You add it to identify table source without typing out the entire thing. Add alias after table name in FROM clause
End SQL statements with a...
semi-colon (;)