Database Exam 4 Ch. 13
Performance tuning the Star Schema: Facts tables
*Denormalize* to improve data access performance and save data storage space
Performance tuning the Star Schema: Dimension tables
*Normalized* to achieve semantic simplicity and facilitate end-user navigation through the dimensions
Data-mining tools functions
1. Analyze data 2. Uncover problems or opportunities hidden in data relationships 3. Form computer models based on their findings 4. Use models to predict business behavior
Data-mining phases
1. Data preparation (identify, clean, and integrate data set) 2. Data analysis and classification (Classification analysis, clustering and sequence analysis, link analysis, trend and deviation analysis) 3. Knowledge acquisition (Select and apply algorithms, decision trees, clustering, neural networks, etc.) 4. Prognosis (modeling, forecasting, and prediction)
Decision Support Database Requirements
1. Database schema - Complex data representations -Aggregated and summarized data -Queries extract multidimensional time slices 2. Data extraction and loading -Supports different data sources (flat files, hierarchical, network, and relational databases) -Checks for inconsistent data 3. Database size -DBMS must support very large databases (VLDBs)
The Data Warehouse
1. Integrated 2. Subject-oriented 3. Time-variant 4. Nonvolate collection of data (never deleted) for improved decision making *Usually only a read-only DB optimized for data analysis and query processing
4 Techniques to optimize data warehouse design
1. Normalizing dimensional tables (achieves semantic simplicity and allows end-user navigation through the dimensions) 2. Maintaining multiple fact tables to represent different aggregation levels 3. Denormalizing fact tables (improves data access performance and saves data storage space) 4. Partitioning and replicating tables
*Focus of Business Intelligence Architecture
1. Strategic, 2. Tactical use of information Example: Using KPI's *MULTIPLE tools from different vendors can be integrated into a SINGLE BI framework
Benefit of Data Mining
Automates the analysis of operational data to find previously unknown data characteristics, relationships, dependencies, and trends
How does Business Intelligence transform data, information, and knowledge?
BI Transforms: Data -> information Information -> knowledge Knowledge -> wisdom
Decision Support Data
Cannot rely on operational data for decision support tasks, must reformat into decision support data. **BI effectiveness depends on QUALITY of data gathered at OPERATIONAL level
*Star Schemas components
Components: 1. Facts 2. Dimensions 3. Attributes 4. Attribute hierarchies
Primary functions of Business Intelligence
Comprehensive, cohesive, integrated tools and processes which: Capture, collect, integrate, store, and analyze data
Fact table
Contains facts linked through dimensions. Found in the center of the star *Primary key of fact table is a composite primary key; many fact rows relate directly to each dimension row.
Areas of use for Predictive Analytics Models
Customer relationships, customer service, customer retention, fraud detection, targeted marketing, and optimized pricing (security and marketing things)
Storage location of attribute hierarchies
DBMS's data dictionary.
**Decision Support Data vs. operational data differences
DSS Differs from operational data in: 1. time span 2. granularity 3. dimensionality
*Data Warehouse vs. data mart
Data marts: 1. Contain more manageable data set than data warehouses 2. Typically lower cost and lower implementation time
Business Intelligence Architecture composed of:
Data, people, processes, technology, and management of components
*Star Schemas
Data-modeling technique that maps multidimensional decision support data into relational database
Predictive Analytics
Employs mathematical and statistical algorithms, neural networks, artificial intelligence, and other advanced modeling tools *Create actionable predictive models based on available data generated in the data-mining phase
Metrics
Facts computed at run time, opposite of stored facts
True/False: There is a mathematical limit to the number of dimensions
False: No mathematical limit exists to the number of dimensions
Slice and dice
Focus on slices of the data cube for more (diced) detailed analysis
*OLAP: Online Analytical Processing
HISTORICAL data for DAY TO DAY operations; accurate up to one point in time. Three Main characteristics: 1. Multidimensional data analysis techniques (Most distinctive feature) 2. Advanced database support 3. Easy-to-use interfaces
Data Analysis provides:
Information about short-term tactical evaluations and strategies
Benefit of Business Intelligence
Main goal: Improved decision making Other benefits: Improved organizational performance, a common user interface for data reporting and analysis, and a common data repository
Replication
Makes COPY of table and places it in different location
KPI: Key Performance Indicators
Measurements that assess company's effectiveness or success in reaching goals *Must be quantifiable!
*OLTP: Online Transaction Processing
Must be BLAZINGLY FAST and 100% reliable. REAL TIME, current data, always up-to-date. Example: Storage of credit card sales
*Star Schema FACT component
Numerical measurements that represent specific business aspects or activities (quantifiable information) *Normally stored in fact table Fact table = center of star Ex: Sales figures, units, costs, prices, and revenues
Strategies should be developed to meet:
Organizational goals using OPERATIONAL databases
Primary function of Decision Support Database
Primary function of DSD: Specialized DBMS tailored to provide fast answers to complex queries over a long historical period of time
Two main purposes of attribute hierarchies
Provide top-down data organization for: 1. Aggregation 2. Drill-down/roll-up data analysis Determines how the data is extracted and represented Used by OLAP tool to access warehouses properly
Benefit of data analytics
Provides advanced data analysis tools to extract knowledge from business data
*Star Schema DIMENSIONS component
Qualifying characteristics/categories that provide additional perspectives to given facts *Study facts through dimensions *Dimensions are stored in dimension tables which are smaller than fact tables. **Links to the fact table Ex: Sales can be compared by product from region to region from one time period to the next. Fact = sales #'s. Dimensions = product, location, and/or time.
Partitioning
SPLITS TABLE into subsets of rows or columns *Note the difference from replication
Data Marts
Small, single-subject data warehouses. Provides decision support to SMALL GROUP of people
Operational data- location and primary use?
Stored in relational database Optimized to support transactions representing daily operations, constantly updated
Requirements of the Data Warehouse
To create a Data Warehouse, requires: 1. Time 2. Money 3. Considerable managerial effort
True/False: Decision support data can be viewed in relation to other data
True: Decision support data is ALMOST ALWAYS viewed in relation to other data *This is why dimensions are of significance
True/False: Fact tables are bigger than dimension tables
True: Dimension tables are smaller than fact tables
True/False: Operational data is not suited for decision support
True: Operational data cannot be used for decision support
*Star Schema ATTRIBUTE component
Used to search, filter, and classify facts. *Dimensions provide descriptive characteristics about the facts through their attributes. Example attributes for location: Region, state, city, store, etc. *Must be able to slice and dice