Week 4

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Data Load Errors

- Common cause is mismatched data - text field going into numeric field -Data package key and data package dimension assist in recovery when errors occur -No update, No reporting -Update valid records, no reporting -Update valid records, reporting possible -Roll back to date state prior to loads -Delete data load, repair rogue data, reload

Rank

-Combines sorting and filtering -Top N, Top %, Bottom N, Bottom %

Shortcomings of Star Schema

-Duplication of dimensional Data -No support for multiple languages -Slow join due to alphanumeric keys -No historization - changing dimensional data -Hierarchies can cause anomalies Example: Sales_territory & Country

Filter

-Filter by label for textual fields -filter by value

(1) Multidimensional Modeling - Star Schema

-Includes 1 fact table and multiple dimension table

Data Warehouse Structure

-Information System -Takes large quantities of data and stores them so that we can easily access the data -Relational Database that has been denormalized. Meaning some redundant data is added back into the database to reduce the number of tables. This speeds up the queries. -Because data warehouse are "read only", modification anomalies are not relevant -Provides a "single source of truth" for analytics

InfoCubes - Virtual Cube (Virtual Provider)

-Only represent logical views of a dataset -Data resides in data source (Not the virtual cube)

InfoCubes - Standard Cube

-Physically contains data in the database -They are data targets (data can be loaded into the cubes)

Queries

-Prefilters -Presorts -Restricted key figures -Calculated key figures

Star Schema - Fact Table

-Stores measures/key figures -Represents Transactional Data

InfoCubes

A self-enclosed dataset for a business area from a reporting viewpoint -Queries can be defined and executed on the InfoCube

Calculation

Adding fields based on other fields

Techniques - Sort

Ascending, descending, alphabetical, numerical, chronological, nested

Data Reformatting

Changing the format of the data from data source to target requirements Transformation Rules -Algebraic rules - converting weights or currencies -String rules - relates to alphanumeric data such as combining or separating a name -Date and time rules - change time and date formats -Programmatic or Algorithm-Based Rules - Use transformation programming languages such as stripping off XML tags

Chapter 3: Dimensional Data Modeling

Chapter 3: Dimensional Data Modeling

Chapter 4: Extraction, Transformation, and Loading (ETL)

Chapter 4: Extraction, Transformation, and Loading (ETL)

Chapter 5: Slicing & Dicing

Chapter 5: Slicing & Dicing

Data Consolidation

Combining and integrating data from data sources Done by data wranglers who: -Realign mismatched data -Harmonize keys and records -Combine or split data

Dynamic Data

Data that are updated -prospective customer becoming buying customer Data that have different values depending on the user -Languages -Currency -Unit

Hierarchies

Drill down and roll up

Slice and Dice Mistakes

Incorrect Aggregations -Adding together ending inventory for reach month -Adding apples and oranges such as different currencies Treating Null as Zero Errors in data acquisition Errors in calculations

Slicing and Dicing Tools

Microsoft excel tables and pivot tables IBM Cognos SAP Analytics Oracle Hyperion Tableau Microsoft PowerPivot

Exceptional Aggregations

Occur when it doesn't make sense to aggregate. Instead we want results as a point in time such as the number of employees at the end of the year. We can't add employees at the end of all years to get a total. it won't work. Think of these as balance sheet accounts in financials

Conditions (Conditional Formatting)

Predefined rules

Data Cleansing

Removing noisy data, replacing missing values, realigning formats -Identification of signal and noise -Getting rid of the noise and "dirty" or missing data -Null values can be misleading - nulls can be ignored or replaced if we have the knowledge to do so -Identification and elimination of outliers -Elimination of redundant data

Star Schema - Dimension Tables

Represents master data and provides a more detailed view of the facts Examples: -Time dimension -Customer dimension -Product dimension -Sales Org dimension -Region dimension

Standard Aggregations

Standard aggregations are SUM, MIN, MAX, AVG, COUNT.

Aggregations

Sum, Avg, Min, Max, Count

Slicing and Dicing

The Manipulation of data to be able to see it from many different angles Slicing: A way to filter a large data set to smaller data sets of interest Dicing: the sliced data set creates even more granularly defined dataset -Multi-dimensional Data examined from many angles -Larger data sets filtered down to area of interest -Trends and outliers identified via sorting

Pivot Charts

The chart changes when the data is pivoted Field controls allow for filters and sorts

(2) Multidimensional Modeling - Snowflake Schema

The dimensional tables are split into more tables -Reduces data redundancy -Supports multiple languages, currencies, and UoM -Quicker joins via conversion of text keys to numbers -Supports historization - changing dimensional data -Support cube administration -Reduces data redundancy which reduces storage space but increases the number of joins -A dimension is said to be snowflaked when the low-cardinality columns in the dimension have been removed to separate normalized tables that then link back into the original dimension table.

Loading

The movement of data from source systems to their intermediate or final destinations -Delta Loads and Full Loads -Data transfer process (DTP)

Extraction

The process of identifying data sources and source fields and acquiring, or sourcing, the data required for analysis -The data source is a collection of fields from the source system -Extractors are programs that pull the data from the source system and bring them into the data warehousing system -Extracted data are staged in a persistent storage area (PSA

Transformation

The process of mapping and harmonizing - that is making certain the data are consistent, cleansed, and reliable - from their sources to the targets

ETL

The process of populating data structures such as InfoCubes, data marts, and data warehouses from one or many sources

Transformation - Data Harmonization

When data are from multiple sources they need to be made more consistent (harmonized) -Harmonization involves mapping fields from source to target systems -IE infocube 1. Data Consolidation 2. Data Cleansing 3. Data reformatting

Pivot Table

creates what is called a cross tabulated structure or crosstab The crosstab displays aggregated and summarized data based on which way you sort the columns and rows


Kaugnay na mga set ng pag-aaral

1.0 Coast Guard Authority and Jurisdiction

View Set

Introduction to Unix/Linux 801 - Test 1 Review

View Set

HESI MED SURG - Cardiovascular, Hematologic, and lymphatic

View Set

NU473 Week 5: Evolve Elsevier EAQ Diabetes - 26 Questions

View Set

Space and National Security Exam 2 Study

View Set

Chapter 1 Strategic competitiveness

View Set

Professionalism and Professionalization

View Set

Intro to Valuation: The Time Value of Money

View Set