DM - Data Warehouse and Data Mart Modeling

¡Supera tus tareas y exámenes ahora con Quizwiz!

comparing dimensional modeling and ER modeling as data warehouse/data mart design techniques

-ER modeling can be used as a conceptual data warehouse/data mart design technique, followed by relational modeling as logical data warehouse/ data mart design technique -dimensional modeling can be used both for conceptual data warehouse/data mart design and logical data warehouse/data mart design

surrogate key

-in a star schema, all dimension tables are given a simple, non-composite system-generated key -values for these are typically simple auto-increment integer values -values have no meaning or purpose except to give each dimension a new column that serves as a primary key within the dimensional model instead of the operational key

dimensionally modeled data warehouse development

1. a set of commonly used dimensions known as conformed dimensions is designed first 2. fact tables corresponding to the subjects of analysis are then subsequently added 3. a set of dimensional models is created where each fact table is connected to multiple dimensions, and some of the dimensions are shared by more than one fact table 4. in addition to the originally created set of conformed dimensions, additional dimensions are included as needed 5. the result is a data warehouse that is a collection of intertwined dimensionally modeled data marts, i.e. a constellation of stars

for every dimensions under consideration, two must questions must be answered

1. can the dimension table be useful for the analysis of the chosen subject? 2. can the dimension table be created based on the existing data sources?

three major components of the data warehouse

1. source systems 2. extraction-transformation-load (ETL) infrastructure 3. the data warehouse itself and the front-end applications

Azure

Microsoft azure is a cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through Microsoft-managed data centers

snowflake model

star schema that contains the dimensions that are normalized -usually not used in dimensional modeling: non-normalized (not snowflaked) dimensions provide for simpler analysis, normalization is usually not necessary for analytical databases

in a star schema

the chosen subject of analysis is represented by a fact table -designing this involves considering which dimensions to use with the fact table representing the chosen subject

typical dimension in a star schema contains

-attributes whose values do not change such as store size and customer gender -attributes whose values change occasionally and sporadically over time, such as customer zip and employee salary

data warehouse/data mart modeling approaches

-normalized data warehouse -dimensionally modeled data warehouse -independent data marts

characteristics of dimensions and facts

-typical dimension contains relatively static data, while in a typical fact table, records are added continually, and the table rapidly grows in size -in a typical dimensionally modeled analytical database, dimension tables have orders of magnitude few records than in fact tables

operational data

typically reflects detailed data -typically represents the current state of affairs in the real world

dimensional modeling

a modeling technique tailored specifically for analytical database design purposes -regularly used in practice for modeling data warehouses and data marts

ER modeling

a predominant technique for visualizing database requirements, used extensively for conceptual modeling of operational databases -can be used during the development of data warehouses and data marts

analytical data

both detailed and summarized -can represent both the current situation and snapshots from the past

dimensionally modeled data warehouse as a source

can be used as a source for dependent data marts and other views, subsets, and/or extracts

type 1

changes the value in the dimension's record - new value replaces the old -no history is preserved -simplest approach, used most often when a change in a dimension is the result of an error

row indicator

column that provides a quick indicator of whether the record is currently valid

timestamps

columns that indicate the time interval for which the values in the records are applicable

fact tables

contain measures relating to the subject of analysis and the foreign keys (associating fact tables with dimension tables) -measures in this are typically numeric and are intended for mathematical computation and quantitative analysis

dimension tables (dimensions)

contains descriptions of the business, organization, or enterprise to which the subject of analysis belongs -columns in dimension tables contain descriptive information that is often textual but can also be numeric -provides a basis for analysis of the subject

type 2

creates a new additional dimension records using a new value for the surrogate key every time a value in a dimension record changes -used in cases where history should be preserved -can be combined with the use of timestamps and row indicators

primary purpose of a data warehouse

data analysis

dimensional modeling use

data design methodology used for designing subject-oriented analytical databases, such as data warehouses or data marts -in addition to using the regular relational concepts (PKs, FKs, integrity constraints, etc) this distinguishes between two types of tables -dimensions -facts

data mart

data store based on the same principles as a data warehouse, but with a more limited scope

dimensionally modeled data warehouse

data warehouse as a collection of dimensionally modeled intertwined data marts (i.e. constellation of dimensional models) that integrates analytically useful information from the operational data sources -same as the normalized data warehouse approach when it comes to the utilization of operational data sources and the ETL process

granularity of the fact tables

describes what is depicted by one row in the fact table -coarser granularity aggregated fact tables are quicker to query than detailed fact talbes -coarser granularity tables are limited in terms of what information can be retrieved from them

slowly changing dimension

dimension that contains attributes whose values can change -three types

dependent data mart

does not have its own source systems -data comes from the data warehouse

detailed fact table

each record refers to a single fact

transaction-level detailed fact table

each row represents a particular transaction

multiple facts in a dimensional model

when multiple subjects of analysis can share the same dimensions, a dimensional model contains more than one fact table -referred to as a constellation or galaxy of starts -enables quicker development of analytical databases for multiple subjects of analysis, because dimensions are re-used instead of duplicated -straightforward cross-fact analysis

aggregated fact table

each record summarizes multiple facts -has less rows than a detailed fact table

line-item detailed fact table

each row represents a line item of a particular transaction

normalized data warehouse

envisions a data warehouse as an integrated analytical database modeled by using the traditional database modeling techniques of ER modeling and relational modeling, resulting in a normalized relational database schema -populated with the analytically useful data from the operational data sources via the ETL process -serves as a source of data for dimensionally modeled data marts and for any other non-dimensional analytically useful data sets

additional possible fact attributes

fact table contains: -FKs connecting the fact table to the dimension tables -the measures related to the subject of analysis -in certain cases, fact tables can contain other attributes that are not measures -two of the most typical additional attributes are: 1. transaction identifier 2. transaction time

detailed vs aggregated fact tables

fact tables in a dimensional model can contain either detailed data or aggregated data

detailed fact tables granularity

have a fine level of granularity because each record represents a single fact

Relational modeling

standard method for logical modeling of operational databases -can be used during the development of data warehouses and data marts

aggregated fact tables granularity

have a coarser level of granularity than detailed fact tables as records in aggregated fact tables always represent summarizations of multiple facts

operational systems

have a shorter time horizon of data than analytical systems

type 3

involves creating a "previous" and "current" column in the dimension table for each column where changes are anticipated -applicable in cases in which there is a fixed number of changes possible per column of a dimension, or in cases when only a limited history is recorded -can be combined with the use of timestamps

end users of analytical databases

only able to retrieve data, and updates of the data by the end users are not allowed

data in operational systems

regularly updated by the users

star schema

result of dimensional modeling is a dimensional schema containing facts and dimensions

independent data mart

stand-alone data mart, created in the same fashion as the data warehouse. -has its own source systems and ETL infrastructure

independent data marts

stand-alone marts are created by various groups within the organization, independent of other stand-alone data marts in the organization -consequently, multiple ETL systems are created and maintained


Conjuntos de estudio relacionados

ExamFX NC Chapter North Carolina Statues and Regulations Pertinent to Personal Lines Quiz

View Set

Study Module 21: Electric Charge - Force - Electric Fields

View Set

Chapter 5: How Sociologists Do Research

View Set

BP CH 2 - Intro to Java Application

View Set

CompTIA PTO-002 PenTest+ WGU D153 Penetration Testing and Vulnerability Analysis

View Set

Combo with "Ch. 9 Class & Global Inequality" and 3 others

View Set

Ch. 36: Mgmt of Pts w/ Immune Deficiency Disorders

View Set

SB Chapter 19: Advertising, Public Relations, and Sales Promotions

View Set