Data Warehouse Chapters 1-3

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

KPI Object: trend value

used to evaluate the current trend of the value expression compared to the goal expression - Helps the user to quickly determine whether the value expression is becoming better or worse relative to the goal expression.

Processing

will load data from the source Data Warehouse into the cube structure

Conformed Dimension

- Can exists as a single dimension table that relates to multiple fact tables within the same data warehouse, or as identical dimension tables in separate data marts. - Example. Date is a conformed dimension because its attributes have the same meaning when joined to any fact table

The Factless fact table - Describing conditions

- Also called coverage table - Used to support negative analysis reports

Key Performance Indicator (KPI) (Business Terminology)

A quantifiable measurement for gauging business success.

User Hierarchies

A user hierarchy is a predefined multilevel hierarchy - Provide easier access for the end user in the frontend

The Factless fact table - Capturing events

An event establishes the relationship among the dimension members from various dimensions. - No measured value - The existence of the relationship itself is the fact - Useful for generating reports as you are able to count the number of occurences with the various criteria

Dimensional modeling

Another way of calling the designing of the data warehouse

Example of kimbell modelling steps

Building a data warehouse for the sales part of a business that contains a chain of supermarkets. 1. Choose the business process - The business process is sales 2. Identify the Grain - Grain is to have one record per sales transaction 3. Design the dimension - Per customer, per date, per store, per product 4. Designing facts - Fact Sales

BISM

Business Intelligence semantic model Data model split into two technologies: 1. OLAP cube (file based data storage) 2. In-memory tabular model. (Both come under the SSAS)

HOLAP

Combines technology advantages of MOLAP and ROLAP. - Able to drill through from the cube into underlying relational data

Snowflake schema

Contains dimenstions that relates to the fact through another dimension. - Requires more joins to respond to an analytical query - Slower than Start

Data Warehouse

DATABASE designed for the purpose of DATA ANALYSIS and REPORTING. This indicates that designing a data warehouse is different from modeling a transactional database.

Calculated members

Help perform calculation based on measures, dimension, hierarchies, and MDX functions. - Helpful for covering business requirements with the help of a cube

OLAP

Online Analystical Processing - Meanings loading data from a data warehouse into a file structure. - Allows faster access to measures in different granular levels - Provide better navigation through hierarchial structures - Easier slicing and dicing

SCD Type 1

Replace attribute value - Eg. A name that was wrongly typed may need to be changed and replaced with the change

SCD Type 2

Replace attribute value but also maintain old/historical values - Eg. FromDate and ToDate. Refer to slide 24 for image

The most important factor in designing an OLAP system

Reporting requirements

Extract Transform Load (ETL)

Requirement for data consolidation that has a process that extracts data from different sources and transforms it into the shape that fits into the data warehouse, and finally, loads it into the data warehouse. ETL is this process.

SSAS

SQL Server Analysis Service Two different types of modeling: 1. Multi-dimensional modeling: Based on the OLAP cube and is fitted with measures and dimensions (file-based data storage) 2. Tabular model: Based on a new in-memory engine for tables. The in-memory engine loads all data rows from tables into the memory and responds to queries directly from memory. (Memory based, efficient in terms of response)

A microsoft solution/example of ETL

SQL Server Integration Service (SSIS)

Star Schema

Single-level relationship between fact and dimensions. - Preferred design for data warehouse

The Fact Table

Table that contains a list of related facts and measures with foreign keys pointing to surrogate keys of the dimension tables. - Fact tables usually store a large number of recors. - Most of the data warehouse space is filled by them (80%)

Visualization

The frontend of a BI system that the users can see.

What changes the design of the database?

The purpose of the data warehouse

Slow changing dimensions (SCD)

Types : 0 , 1 , 2

Deployment

Will load metadata from an existing Analysis Services project into the SSAS Server, but it won't load data.

In analysis services KPI is

a collection of calculations that are associated with a measure group in a cube that are used to evaluate business success.

Calculations in KPI for analysis services

- A combo of multidim expressions (MDX) expressions or calculated members - Additional metatdata that provides information about how client applications should display the results of the KPI's calculations - A KPI handles information about a goal set, the actual formula of the performance recorded in the cube, and measurement to show the trend and the status of the performance

A group of properties includes

- A display folder - a parent KPI IF the KPI is computed from other KPIs - The current member IF there is one - the weight of the KPI IF it has one - Description of the KPI

OLAP tools

- BOARD - SSAS - SAP (Business Intelligence Software) - SAge Live - Domo

A simple KPI is composed of:

- Basic information(name and description of the KPI) - the goal - the actual value achieved - a status value - a trend value - a folder where the KPI is viewed

Data ware house PROS & CONS

- De-normalization and flattening data are applied in order to make it easier to query data as much as possible - Simple, fast queries. One select statement.

Kimbell dimension of modelling steps

1. Choose the business process 2. Identify the Grain 3. Design the dimension 4. Designing facts

Types of Dimensions

1. Conformed dimension 2. Role playing dimension 3. Junk dimension 4. Degenerate dimension 5. Slow changing dimensions (SCD)

How OLAP cube is names

1. Dimensions are the axes of cube 2. The measures are connection points of it. - Real world structure of cues will have more than 3 dimensions

3 Storage modes in tabular

1. In memory 2. DirectQuery 3. DirectQuery

Methodologies used to build data warehouses

1. Inmon - Top-up 2. Kimball - Bottom-up

Two main kinds of DAX

1. Row Context ( current row , writing expressions) 2. Filter Context (set of records filtered by criteria)

Schemas for creating a relationship between fact and dimensions.

1. Star Schema 2. Snowflake Schema

Types of Fact tables

1. Transaction Fact Table 2. Snapshot Fact Table 3. Factless Fact table - The bridge table

Types of Hierarchies in SSAS Multidimensional

1. User Hierarchies 2. Attribute Hierarchies

Building the datawarehouse

1. What is the goal of designing a data warehouse? 2. What kind of analytical reports would be required as the result of the BI system. Decide on the Grain ** Important

DAX

Data analysis expression - functions and operators that can be combined to build formulas and expressions

MOLAP

Data is stored in Multidimensional cube - Advantage: -Can perform complex calculations -Optimal for slicing and dicing

What is the core of the BI system?

Data warehouse

Grain

Defines the level of detail that stores the Fact table. Eg. A data warehouse for sales can be built in which grain is the most detailed level of transactions in the retail shop. one record per each transaction in the specific date and time for customer and sales person.

Dimension

Dimension tables are tables that contain descriptive information such as a customers name, job title etc. - Each table contains a list of columns (attributes). - Each dimension has a primary key (surrogate key) - Surrogate key is usually an increment integer value - Primary key of the source system is stored in the dimension talbe as Business key.

SCD Type 0

Do not accept any change for this attribute - Eg. Employee Number. A business key importing attribute for ETL because ETL distinguishes new employees or existing.

Attribute hierarchies

Each attribute creates a single-level hierarchy that can be used in designer/browswer with a combination of other attributes to create a multi-level hierarchy

Business Intelligence

Includes: 1. Applications 2. Infrastructure and tools 3. Best practices that enable access to and analysis of information to improve and optimize decisions and performance.

KPI Object: The actual value

MDX expression that evaluates to a number

KPI Object: The goal

MDX expression that evaluates to a number - Normalized value in the range of -1 to +1 where -1 is very bad and +1 is very good

KPI Object: The status

MDX expressions that evaluate to a number - Analysis services uses to evaluate the current status of the value expression compared to the goal expression.

ROLAP

Manipulating data stored in relational database to give appearance of traditional OLAP slice and dice. - SQL -Advantage: - Handle large amounts of data - leverage functionalities Disadvantage - Slow performance , large queries

Example of ETL

Many tables thata provide sale transaction data. It can be demoralized and build 1 or 2 tables. The ETL process will extract the data and transform( combine, match....) into the model of data warehouse tables

MDM

Master Data Management - The process of maintaining the single version of truth for master data entities throught multiple systems. E.g. Organization may receive customer info from online web app form or retail spreadsheets.

MDX

Multi Dimensional eXpression is the query and calculation language of SSAS 1 . The client tool queries data from the SSAS cube using MDX queries 2. MDX will be compiled, parsed, and executed in the SSAS engine

What is required in tabular project?

SSDT

Snapshot Fact Table

- Each record will be an aggregation of some transactional records for a snapshot period of time. - Eg. Financial periods, create a snapshot fact table with one record for each period and details of the transactions will be aggregated into that record. - Provide as very fast response for dashboards and aggregated queries. - Dont cover detailed transactional records

The Factless Fact Table - Bridge table

- Fact table without any measure. - Usually used to show the non-existence of a fact. - Has FKs for dimensions - Intersection of dimensions - 2 types: 1. Capturing an event 2. Describing conditions

Fact or Measure

- Facts are numeric and additive values in the business process. - Additive means you can add values of same records together and it provides meaning Eg. Sales amount, discount amount or quantity of items sold.

OLAP Cube

- File based storage that loads data from a data warehouse into a cube model. - Contains descriptive information such as dimensions(eg. customer and product) and cells (eg.facts and measures such as sales and discount)

Transaction Fact Table

- Has one record per transaction. - Usually has the most detailed Grain - Transactional facts are a good source for detailed and atomic reports - Good for aggregations and dashboards

Transactional/Operational Database PROS & CONS

- Normalization is applied to reduce the redundancy and dependency - Easier maintenance efforts - Complex, slow queries. Need to join two tables

Degenerate dimension

- Not a separate dimension table - Does not have a table and it sits directly inside the fact table. -Eg. Transaction number, to be stored as degenerate dimension instead of assigning surrogate key and use that key in fact table. That will create a dimension that is the same grain as the fact table (# of records for sales transaction = fact table, not good)

Processing time depends on

- Structure - Quality - Amount of data

Junk Dimension

- Used for dimensions with very narrow member values (records) that will be in use for almost one data mart (not conformed) - Eg. Status dimensions - Solution to combine narrow dimensions together and create a bigger dimension

Role playing dimension

- When the same dimension is used more than once with different names in the cube - Solution: Create one DB dimension when designing cube and the others can use the same dimension but with different name

Methods of Visualizing Information

- strategic and tactical dashboards - Key Performance Indications (KPIs) - detailed or consolidated reports

Storage Types/Mode

-OLAP -MOLAP -HOLAP

SSRS

SQL Server Reporting Service

Microsoft solution for data Visualization

SQL Server Reporting Services (SSRS)

Where will metadata and structure of cubes reside?

SSAS Server


Ensembles d'études connexes

advanced investments exam2 review

View Set

Laboratory Review 3- Chemical Composition of Cells

View Set