ITP 487 - Exam II
Dimension
- a group of characteristics which belong to the same business object (semantically) - each characteristics may contain additional attributes which are relevant for data analysis - or not!! - display attributes are InfoObjects for which data analysis does not make sense - navigational attribute are InfoObjects which are used to perform data analysis - time-dependent attributes - time-independent attributes
Transformation Rules
- algebraic rules - + - * / - string rules - relate to alphanumeric data - data and time rules - change time and date formats - programmatic or algorithm-based rules - use transformation programming languages
InfoProviders
- are data storage meta objects inside SAP BW - May store data records physically (persistent) - may also be used to access data stored outside SAP BW virtually (transparent views) - May be accessed with front end tools such as SAP BusinessObjects Analysis
Data load errors
- common cause is mismatched data - data package key and data package dimension assist in recovery when errors occur - roll back to data state prior to load - delete data load, repair rogue data, and reload
InfoCubes
- consists of a number of relational tables arranged together - fact table (max 233 key figures) - dimension tables - dimension (max 16 dimensions & 248 characteristics per dimension)
fact table
- consists of dimensional keys and key figures (facts) - values that can be added, subtracted, divided, multiplied, etc.
Data harmonization
- data are from multiple sources they need to be made consistent - involves mapping fields from source to target systems
Extraction
- data source is a collection of fields from the source system - extractors are programs that pull the data from the source system and bring them into the DW system - extracted data are staged in a PSA
Loading
- delta loads vs. full loads - data transfer process (DTP)
SAP BW Schema
- developed as an extended (snowflake) star schema as a response to problems experienced with the classic star schema - the enhancement comes from the fact that the dimension tables do not contain master data information - master data is stored in separate tables - master data tables can be used in multiple InfoCubes
Shortcomings of star schemas
- duplication of dimensional data - no support for multiple languages - slow joins due to alphanumeric keys - no historization (changing dimensional data) - hierarchies can cause anomalies
snowflake instead of star
- further normalization and expansion of the dimension tables in star schema results in a snowflake design - this reduces data redundancy which storage space but increases the number of joins - low-cardinality columns in the dimension have been removed to separate normalized tables that then link back into the original dimension table
Types of InfoObjects
- key figures = numeric measures - characteristics = master data - units = units of measure for key figures
Virtual cubes
- only represent logical views of a dataset - data resides in data source
Standard cube
- physically contains data in the database - they are data target - a type of InfoProvider - BW objects are called InfoProviders when queries can be executed based on them - Technically optimized for read access
Queries
- prefilters - presorts - restricted key figures - calculated key figures - conditions - predefined rule
dimension tables
- represent master data - provides a more detailed view of the fact
InfoObjects
- represent the structure that allows data to be stored in a BW system - used to describe business processes and information requirements (fields): asset, quantity on hand - either characteristics or key figures
Master data
- represents dimensional data - independent of any fact table - reusable in multiple fact tables - designed to supports multi-lingual systems - designed to support time dependent data - designed to improve query
fact table
- stores measures/ key figures - represents transactional data
InfoCube
- the central objects of the multi-dimensional model in SAP BW - reports and analyses are based on these - self-enclosed dataset for a business area from a reporting viewpoint - queries can be defined and executed on these
When to snowflake
- the dimension table consists of two or more sets of attributes which define information at different grains (detail) - the sets of attributes of the same dimension table are being populated by different source systems (very common in DW)
Snowflake schema
- the dimension tables are split into more tabes - reduce data redundancy - supports multiple languages, currencies & UoM - quicker joins via conversion of text keys to numbers - support historization - support cube administration
text tables
- whereas non-textual master data are stored in attribute tables, textual master data are stored in text tables - time-dependent - time-independent - language-dependent - language-independent - hierarchy
Fact Table with Currency Key
1,000 EUR 1,000 USD 1,000 ZWD EUR Euro USD US Dollar ZWD Zimbabwe Dollar
Dashboarding Process
1. identify end-user needs 2. specify the function of the dashboard 3. identify data sources 4. formulate the logic 5. create the user interface 6. bind the data 7. apply themes 8. add interactivity 9. test and deploy
Authoring Reports
1. identifying the needs of the report user 2. identifying data sources 3. building the layout for readability 4. binding analytical components to data sources 5. report structure 6. adding prompts for end-users 7. deployment
Fact Table with Unit of Measure Key
10 Kg 15 Dz 30 Gal 12 Ea Kg Kilogram Dz Dozen Gal Gallon Ea Each
InfoPackage
A "container" for selection criteria/condition Contains info about: - when - from which datasource - with which selection - full or delta
Data Store Object
A DSO is a two-dimensional storage unit which mainly stores consolidated and cleansed transaction data or master data on a lower granularity
Data Source
A collection of fields that have been extracted from the source system
Step 6
Activate data in DSO
conditional formatting
An Excel feature that enables you to specify how cells that meet one or more given conditions should be displayed.
Analysis for Microsoft office
Conduct multi-dimensional analysis of OLAP sources
Step 3
Create InfoPackage and load data into PSA
Step 5
Create and start Data Transfer Process to DSO
Step 8
Create and start data transfer process to cube
Step 2
Create dataSource
Step 4
Create transformation to DSO
Step 7
Create transformation to cube
Data transfer process
DTP is an object that determines how data is transferred between two persistent objects in SAP Netweaver BW
Transformation
Data Mapping & Data Harmonization
Data Harmonization
Data consolidation, data cleansing, data reformatting
ETL Process
Draw it!
Extraction Transformation Loading
ETL
Key figures
Fact
Historization
High End Bike (1/1/00 - 12/31/09) Deluxe Touring Bike (1/1/10-12/31/14) D-lux Touring Bike (1/1/15-12/31/9999)
Data Mapping
Mapping source fields to target fields
Persistant Storage Area
PSA
Interval-dependent hierarchy
Product DXTR1000- PRTR1000- DXTR2999 PRTR9999 1000-1999 1000-4999 2000-2999 5000-9999
SAP Crystal Reports
Provides users with a number of alternatives for report creation including wizards. These wizards guide the user through a number of input screens to create a report.
time-dependent hierarchies
Sales Territory US Spain Smith Patel Smith (1/1/12- (1/1/12- (1/1/15- 12/31/14) 12/31/99) 12/31/99)
Version-dependent hierarchy
Sales Territory (Version 1) North South Canada US Latin America Sales Territory (Version 2) North South Canada US S.America Mexico
Step 1
Select source system
Data flow in reporting
Source Systems - ERP - Data Warehouses - Databases - Flat Files - Legacy Systems - Web services - Senior data Semantic Layer - SAP Business Objects Universe Authoring - Report-authoring tool - Report design tool Report server - Report deployment End-user - Web - Mobile - Print
ETL Process
Source Systems => (Extraction) => Data Source/Staging => (Transformation) => (Load) => Cube
SAP Bex Query Designer
Tools in multidimensional analysis allow us to pre filter both key figures and characteristics for our cross tabs so that the data brought in for analysis is already targeted. Also smaller data sets are easier to analyze.
Dimension table
acts as a link between master data and fact table records
data reformatting
changing the format of data from data source to target requirements
Data consolidation
combining and integrating fata from data sources - realigning mismatched data - harmonize keys and records - combine or split data
InfoCube
consists of fact and dimensional tables
characteristic
dimension attributes
dimension
dimension table
Fact table
fact table
Report wizard
guides the user through choosing a data source and linking database tables. It also helps the user add fields and specify the grouping, summarization (totals), and sorting criteria to be used. The wizard also assists in the creation of a chart and record selection.
OLAP cube report wizard
guides the user to create a report in which OLAP data is displayed as a grid object which includes dimensions from the OLAP data source
mailing label report wizard
guides the user to create a report that is formatted to print on any size mailing label.
Cross-tab report wizard
guides users through the creation of a report in which the data is displayed as a cross-tab object
Star Schema
includes 1 fact table and multiple dimension tables
data cleansing
removing noisy data, replacing missing values, realigning formats - identification of signal and noise - getting rid of noise and "dirty" or missing data to avoid GIGO - null values can be misleading - identification and elimination of outliers - elimination of redundant data - reformatting of data
master data tables
text, attributes, hierarchy
Loading
the movement of data from source systems to their intermediate or final destinations
Extraction
the process of identifying data sources and source fields and acquiring, or sourcing, the data required for analysis
Transformation
the process of mapping and harmonizing - that is, making certain the data are consistent, cleansed and reliable - from their sources to the targets
Surrogate ID
used to convert textual keys to numeric keys to create links and provide for faster queries