Data Warehouse Chapters 1-3
KPI Object: trend value
used to evaluate the current trend of the value expression compared to the goal expression - Helps the user to quickly determine whether the value expression is becoming better or worse relative to the goal expression.
Processing
will load data from the source Data Warehouse into the cube structure
Conformed Dimension
- Can exists as a single dimension table that relates to multiple fact tables within the same data warehouse, or as identical dimension tables in separate data marts. - Example. Date is a conformed dimension because its attributes have the same meaning when joined to any fact table
The Factless fact table - Describing conditions
- Also called coverage table - Used to support negative analysis reports
Key Performance Indicator (KPI) (Business Terminology)
A quantifiable measurement for gauging business success.
User Hierarchies
A user hierarchy is a predefined multilevel hierarchy - Provide easier access for the end user in the frontend
The Factless fact table - Capturing events
An event establishes the relationship among the dimension members from various dimensions. - No measured value - The existence of the relationship itself is the fact - Useful for generating reports as you are able to count the number of occurences with the various criteria
Dimensional modeling
Another way of calling the designing of the data warehouse
Example of kimbell modelling steps
Building a data warehouse for the sales part of a business that contains a chain of supermarkets. 1. Choose the business process - The business process is sales 2. Identify the Grain - Grain is to have one record per sales transaction 3. Design the dimension - Per customer, per date, per store, per product 4. Designing facts - Fact Sales
BISM
Business Intelligence semantic model Data model split into two technologies: 1. OLAP cube (file based data storage) 2. In-memory tabular model. (Both come under the SSAS)
HOLAP
Combines technology advantages of MOLAP and ROLAP. - Able to drill through from the cube into underlying relational data
Snowflake schema
Contains dimenstions that relates to the fact through another dimension. - Requires more joins to respond to an analytical query - Slower than Start
Data Warehouse
DATABASE designed for the purpose of DATA ANALYSIS and REPORTING. This indicates that designing a data warehouse is different from modeling a transactional database.
Calculated members
Help perform calculation based on measures, dimension, hierarchies, and MDX functions. - Helpful for covering business requirements with the help of a cube
OLAP
Online Analystical Processing - Meanings loading data from a data warehouse into a file structure. - Allows faster access to measures in different granular levels - Provide better navigation through hierarchial structures - Easier slicing and dicing
SCD Type 1
Replace attribute value - Eg. A name that was wrongly typed may need to be changed and replaced with the change
SCD Type 2
Replace attribute value but also maintain old/historical values - Eg. FromDate and ToDate. Refer to slide 24 for image
The most important factor in designing an OLAP system
Reporting requirements
Extract Transform Load (ETL)
Requirement for data consolidation that has a process that extracts data from different sources and transforms it into the shape that fits into the data warehouse, and finally, loads it into the data warehouse. ETL is this process.
SSAS
SQL Server Analysis Service Two different types of modeling: 1. Multi-dimensional modeling: Based on the OLAP cube and is fitted with measures and dimensions (file-based data storage) 2. Tabular model: Based on a new in-memory engine for tables. The in-memory engine loads all data rows from tables into the memory and responds to queries directly from memory. (Memory based, efficient in terms of response)
A microsoft solution/example of ETL
SQL Server Integration Service (SSIS)
Star Schema
Single-level relationship between fact and dimensions. - Preferred design for data warehouse
The Fact Table
Table that contains a list of related facts and measures with foreign keys pointing to surrogate keys of the dimension tables. - Fact tables usually store a large number of recors. - Most of the data warehouse space is filled by them (80%)
Visualization
The frontend of a BI system that the users can see.
What changes the design of the database?
The purpose of the data warehouse
Slow changing dimensions (SCD)
Types : 0 , 1 , 2
Deployment
Will load metadata from an existing Analysis Services project into the SSAS Server, but it won't load data.
In analysis services KPI is
a collection of calculations that are associated with a measure group in a cube that are used to evaluate business success.
Calculations in KPI for analysis services
- A combo of multidim expressions (MDX) expressions or calculated members - Additional metatdata that provides information about how client applications should display the results of the KPI's calculations - A KPI handles information about a goal set, the actual formula of the performance recorded in the cube, and measurement to show the trend and the status of the performance
A group of properties includes
- A display folder - a parent KPI IF the KPI is computed from other KPIs - The current member IF there is one - the weight of the KPI IF it has one - Description of the KPI
OLAP tools
- BOARD - SSAS - SAP (Business Intelligence Software) - SAge Live - Domo
A simple KPI is composed of:
- Basic information(name and description of the KPI) - the goal - the actual value achieved - a status value - a trend value - a folder where the KPI is viewed
Data ware house PROS & CONS
- De-normalization and flattening data are applied in order to make it easier to query data as much as possible - Simple, fast queries. One select statement.
Kimbell dimension of modelling steps
1. Choose the business process 2. Identify the Grain 3. Design the dimension 4. Designing facts
Types of Dimensions
1. Conformed dimension 2. Role playing dimension 3. Junk dimension 4. Degenerate dimension 5. Slow changing dimensions (SCD)
How OLAP cube is names
1. Dimensions are the axes of cube 2. The measures are connection points of it. - Real world structure of cues will have more than 3 dimensions
3 Storage modes in tabular
1. In memory 2. DirectQuery 3. DirectQuery
Methodologies used to build data warehouses
1. Inmon - Top-up 2. Kimball - Bottom-up
Two main kinds of DAX
1. Row Context ( current row , writing expressions) 2. Filter Context (set of records filtered by criteria)
Schemas for creating a relationship between fact and dimensions.
1. Star Schema 2. Snowflake Schema
Types of Fact tables
1. Transaction Fact Table 2. Snapshot Fact Table 3. Factless Fact table - The bridge table
Types of Hierarchies in SSAS Multidimensional
1. User Hierarchies 2. Attribute Hierarchies
Building the datawarehouse
1. What is the goal of designing a data warehouse? 2. What kind of analytical reports would be required as the result of the BI system. Decide on the Grain ** Important
DAX
Data analysis expression - functions and operators that can be combined to build formulas and expressions
MOLAP
Data is stored in Multidimensional cube - Advantage: -Can perform complex calculations -Optimal for slicing and dicing
What is the core of the BI system?
Data warehouse
Grain
Defines the level of detail that stores the Fact table. Eg. A data warehouse for sales can be built in which grain is the most detailed level of transactions in the retail shop. one record per each transaction in the specific date and time for customer and sales person.
Dimension
Dimension tables are tables that contain descriptive information such as a customers name, job title etc. - Each table contains a list of columns (attributes). - Each dimension has a primary key (surrogate key) - Surrogate key is usually an increment integer value - Primary key of the source system is stored in the dimension talbe as Business key.
SCD Type 0
Do not accept any change for this attribute - Eg. Employee Number. A business key importing attribute for ETL because ETL distinguishes new employees or existing.
Attribute hierarchies
Each attribute creates a single-level hierarchy that can be used in designer/browswer with a combination of other attributes to create a multi-level hierarchy
Business Intelligence
Includes: 1. Applications 2. Infrastructure and tools 3. Best practices that enable access to and analysis of information to improve and optimize decisions and performance.
KPI Object: The actual value
MDX expression that evaluates to a number
KPI Object: The goal
MDX expression that evaluates to a number - Normalized value in the range of -1 to +1 where -1 is very bad and +1 is very good
KPI Object: The status
MDX expressions that evaluate to a number - Analysis services uses to evaluate the current status of the value expression compared to the goal expression.
ROLAP
Manipulating data stored in relational database to give appearance of traditional OLAP slice and dice. - SQL -Advantage: - Handle large amounts of data - leverage functionalities Disadvantage - Slow performance , large queries
Example of ETL
Many tables thata provide sale transaction data. It can be demoralized and build 1 or 2 tables. The ETL process will extract the data and transform( combine, match....) into the model of data warehouse tables
MDM
Master Data Management - The process of maintaining the single version of truth for master data entities throught multiple systems. E.g. Organization may receive customer info from online web app form or retail spreadsheets.
MDX
Multi Dimensional eXpression is the query and calculation language of SSAS 1 . The client tool queries data from the SSAS cube using MDX queries 2. MDX will be compiled, parsed, and executed in the SSAS engine
What is required in tabular project?
SSDT
Snapshot Fact Table
- Each record will be an aggregation of some transactional records for a snapshot period of time. - Eg. Financial periods, create a snapshot fact table with one record for each period and details of the transactions will be aggregated into that record. - Provide as very fast response for dashboards and aggregated queries. - Dont cover detailed transactional records
The Factless Fact Table - Bridge table
- Fact table without any measure. - Usually used to show the non-existence of a fact. - Has FKs for dimensions - Intersection of dimensions - 2 types: 1. Capturing an event 2. Describing conditions
Fact or Measure
- Facts are numeric and additive values in the business process. - Additive means you can add values of same records together and it provides meaning Eg. Sales amount, discount amount or quantity of items sold.
OLAP Cube
- File based storage that loads data from a data warehouse into a cube model. - Contains descriptive information such as dimensions(eg. customer and product) and cells (eg.facts and measures such as sales and discount)
Transaction Fact Table
- Has one record per transaction. - Usually has the most detailed Grain - Transactional facts are a good source for detailed and atomic reports - Good for aggregations and dashboards
Transactional/Operational Database PROS & CONS
- Normalization is applied to reduce the redundancy and dependency - Easier maintenance efforts - Complex, slow queries. Need to join two tables
Degenerate dimension
- Not a separate dimension table - Does not have a table and it sits directly inside the fact table. -Eg. Transaction number, to be stored as degenerate dimension instead of assigning surrogate key and use that key in fact table. That will create a dimension that is the same grain as the fact table (# of records for sales transaction = fact table, not good)
Processing time depends on
- Structure - Quality - Amount of data
Junk Dimension
- Used for dimensions with very narrow member values (records) that will be in use for almost one data mart (not conformed) - Eg. Status dimensions - Solution to combine narrow dimensions together and create a bigger dimension
Role playing dimension
- When the same dimension is used more than once with different names in the cube - Solution: Create one DB dimension when designing cube and the others can use the same dimension but with different name
Methods of Visualizing Information
- strategic and tactical dashboards - Key Performance Indications (KPIs) - detailed or consolidated reports
Storage Types/Mode
-OLAP -MOLAP -HOLAP
SSRS
SQL Server Reporting Service
Microsoft solution for data Visualization
SQL Server Reporting Services (SSRS)
Where will metadata and structure of cubes reside?
SSAS Server