DRM TEST 2 Study Guide

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Updating DW 2 Stratedgies

1. Audit columns in the transactional tables like date inserted or date updated so that we know which rows to extract in the next update. 2. Using Triggers on inserts and updates.

Staging Server Reasoning (2 reasons)

1. Reduce the connection time for ETL packages to the source database. This can be a very important issue for large data volumes, creating significant pressure on the source databases while the package is running. 2. Provide a solid restart point in the event of a failure. Not only we avoid accessing the source database again, but we also have a consistent image as of the extract date and time!!

Star Schema

A dimensional model is made up of a central fact table (or tables) and its associated dimensions. The dimensional model is also called a star schema because it looks like a star with the fact table in the middle and the dimensions serving as the points on the star.

Fact Table - Facts or measures

A record in a fact table is a series of measurements, or numeric values such as quantity ordered or sale amount. These numbers are called facts (or measures in Analysis Services).

Dimensional model vs the TPS model

Because of the de-normalization involved in creating the dimensions. Queries against the SQL Server relational database generally perform better — often far better — against a dimensional structure than against a fully normalized structure.

ETL 3 uses

Business Analytics ETL is the reason why more than 50% of corporate business intelligence initiatives fail. BIG Data No big data processing without knowledge of ETL Operational Systems No efficiency without ETL

Data Cleaning Process (transforming Data) 3 ways

Column screens: Test the data within a single column, for nulls, data type violations, or value violations. Structure screens: Test data relationships, for example referential integrity failures. Business rule screens: Test business logic, for example requiring that a Platinum customer is at least five years with the company and has made a purchase in the last two years.

Data Latency

Data latency describes how quickly the transactional data must be delivered to the DW/BI system. Most DW/BI systems process data on a daily basis

Dimensions

Dimensions are implemented as tables in the dimensional model. You can spot dimensions in conversation with the operational people because they are often the "by" words in a report request. For example, a user wants to see sales by month by product by date By promotion

Data warehouse bus matrix (4)

Each row in the matrix is a business process. Each column is a dimension that participates in multiple business processes. The processes down the left side of the matrix follow the organization's value chain. The bus matrix is essentially the enterprise dimensional data architecture.

Operational work: job scheduler

Every enterprise data warehouse needs a robust ETL scheduler. ETL processes should run automatically and independent of human intervention especially if we work with multiple updates a day

ETL

Extraction Transformation loading It involves accessing data from multiple heterogeneous and geographically dispersed resources, manipulating this data, and moving it to the destination system.

Fact Table - 95% of Data

For most transaction-driven organizations, fact tables are the largest tables in the data warehouse database, often making up 95 percent or more of the total relational database size.

Calculations documentation

Fully documenting calculations for functions, allocations, adjustments, and derivations.

many to many between dimension tables

In a banking example, a given account can have one or more customers as signatories, and any given customer can have one or more accounts. The bank might choose to analyze data by customer or account.

Designing the Dimensional Model (4)

It constitutes the blueprint for the development of the DW database. It is the plan on which we reflect for changes and improvements. We use it in our communication with the operational users. We go through multiple iterations of its desing

Error Tables (transforming data)

It might be useful to create error tables to hold rows that have failed a critical data screening.

Transforming Data

Manipulating data through a series of ETL packages to improve quality and merging data from multiple heterogeneous sources

slowly changing dimension (scd)

Most attribute values like date of birth are fixed. However, other attributes like an employee's title might change over time. Changing attributes allow us to understand the dynamics of the business and constitute one of the major reasons for the existence of the DW/BI system. Type 1 and type 2 SCDs. Type 1, we just replace values. Type 2, we keep historical records. It is a business decision to decide which SCD is type 1 or type 2.

working with duplicates(transforming data)

Often records for the dimension tables are derived from several heterogeneous sources. This is common for organizations that have many customer-facing TPS systems that create and manage separate customer data. We should have a strategy in place to identify duplicate records

Loading Data

Physically loading the data into the DW target dimensional models.

ETL 3 KEY points

Politics within the organization. Cooperation with the transactional DBAs. Access to data sources. 80% of the production cost of a DW. Analysts estimate that almost 80% of the capital required for a data warehouse project will be spent on data quality and data transfer projects. 55% of the operational cost for a DW. 55% of the total warehouse cost at run time is due to ETL costs alone.

Data quality at TPS level

Push back hard at the transaction systems operations to clean data at the source before it enters the ETL stream

operational work: ETL Security

The ETL system and staging tables should be off limits to all business users

Proposal for a Staging Server

The extract system should be decoupled (separate - independent) from the transactional (source) and destination systems. We should create separate ETL packages for the extract that simply pull the appropriate data from the source systems and stage it, untransformed, in a staging database

Fact Table - Level of detail in the fact table

The level of detail contained in the fact table is called the grain. It is recommended to build fact tables with the lowest level of detail possible. Detailed fact tables provide complete flexibility to roll up the data to any level of summary needed across any dimension

Fact Table - Primary Key

The primary key to the fact table is usually a composite key made up of a subset of the foreign keys from each dimension table

Prioritizing Business Requirements (3)

The prioritization process is a planning meeting involving the DW/BI team, the DW/BI project business sponsors, and other senior managers from across the organization. There needs to be a conclusive consensus of what is needed by the organization. Prioritizing requirements is one of the most important tasks you have.

operational work: recovery and restart system

The role of staged data After your ETL system is in production, failures can occur for countless reasons. You need staged data sets as a basis for restarting the system.

Many to many between fact and dimension tables

The standard relationship between a dimension table and fact table is one-to-many. This means one row in the dimension table will join to many rows in the fact table. For example a product has many sales. However, we can have a many-to-many relationship between the fact table and a dimension. For example when multiple sales reps handle the same order.

Surrogate keys (pro and con)

They protect the DW/BI system from changes in the keys coming from the source system. They allow the DW/BI system to integrate the same data, such as customer, from multiple source systems where they have different keys. The real cost of using surrogate keys is the burden it places on the ETL system.

4 Assumptions of DW

We have a transactional/operational system in place. We understand that a data warehouse is not an archiving system. We have specific goals and we are looking for specific information from the data warehouse initiative. There is not a standard road to follow when building the DW. It will depend on the corporate reporting needs.

Developer Skills:

We need to get the best. It is imperative to have experienced ETL developers as part of a team and know how to communicate in business terms with the operational and DW/BI users SQL and Scripting Excellent SQL skills are required as well!! Every production quality SSIS ETL system will require knowledge of SQL.

Lineage Documentation

We should be able to look at a number in a report and know exactly how and when it entered the data warehouse (or other data destination), and what transformations occurred to it along the way

Data profiling

auditing from the operational system conclusions from operational data

determining important Processes and business requirements

conduct a set of interviews to identify Processes and BI needs

Data Warehouse

it is a database we develop and use for business analytics of a corp.

Data Profiling

the analysis of the source (mostly transactional) data to describe its content and structure.

3 Bullets (operational people)

we must listen and understand the operational people. communication is a top priority we first need to understand the data in the operations/transnational system. we can achieve this by using sql the next step is to define the business requirements


Set pelajaran terkait

Multiple Choice Questions Financial Management CH4

View Set

Chapter 8 Enzyme & Collagen Disorders (Inborn orders of metabolism)

View Set

ECON Exam 1, Exam 3, ECON Exam 2

View Set

Module 07- Incident Response in Cloud - Part 2

View Set

Midterm 2 review questions (6-11)

View Set