CHAPTER 8 - Data Warehouse and Data Mart Modeling

¡Supera tus tareas y exámenes ahora con Quizwiz!

STAR SCHEMA In the star schema, the chosen subject of analysis is represented by a fact table Designing the star schema involves considering which dimensions to use with the fact table representing the chosen subject

-

CHARACTERISTICS OF DIMENSIONS AND FACTS •A typical dimension contains relatively static data, while in a typical fact table, records are added continually, and the table rapidly grows in size. •In a typical dimensionally modeled analytical database, dimension tables have orders of magnitude fewer records than fact tables

-

DETAILED VERSUS AGGREGATED FACT TABLES •Fact tables in a dimensional model can contain either detailed data or aggregated data •In detailed fact tables each record refers to a single fact •In aggregated fact tables each record summarizes multiple facts

-

SNOWFLAKE MODEL •A star schema that contains the dimensions that are normalized

Snowflaking is usually not used in dimensional modeling oNot-normalized (not snowflaked) dimensions provide for simpler analysis oNormalization is usually not necessary for analytical databases

ADDITIONAL POSSIBLE FACT ATTRIBUTES •A fact table contains oForeign keys connecting the fact table to the dimension tablesoThe measures related to the subject of analysis •In addition to the measures related to the subject of analysis, in certain cases fact tables can contain other attributes that are not measures

Two of the most typical additional attributes that can appear in the fact table are: oTransaction identifier oTransaction time

Dimensional modeling

•A modeling technique tailored specifically for analytical database design purposes •Regularly used in practice for modeling data warehouses and data marts

Row indicator:

column that provides a quick indicator of whether the record is currently valid

Timestamps:

columns that indicates the time interval for which the values in the records are applicable

Three of the most common data warehouse and data mart modeling approaches:

oNormalized data warehouse oDimensionally modeled data warehouse oIndependent data marts

SLOWLY CHANGING DIMENSION: TYPE 2

•Creates a new additional dimension record using a new value for the surrogate key every time a value in a dimension record changes •Used in cases where history should be preserved •Can be combined with the use of timestamps and row indicators

DIMENSIONALLY MODELED DATA WAREHOUSE

•Data warehouse as a collection of dimensionally modeled intertwined data marts (i.e. constellation of dimensional models) that integrates analytically useful information from the operational data sources •Same as the normalized data warehouse approach when it comes to the utilization of operational data sources and the ETL process

INDEPENDENT DATA MARTS

•Independent data marts are considered an inferior strategy Inability for straightforward analysis across the enterprise The existence of multiple unrelated ETL infrastructures •In spite of obvious disadvantages, a significant number of corporate analytical data stores are developed as a collection of independent data marts

SLOWLY CHANGING DIMENSION: TYPE 3

•Involves creating a "previous" and "current" column in the dimension table for each column where changes are anticipated •Applicable in cases in which there is a fixed number of changes possible per column of a dimension, or in cases when only a limited history is recorded. •Can be combined with the use of timestamps

For every dimension under consideration, two questions must be answered:

•Question 1: Can the dimension table be useful for the analysis of the chosen subject? •Question 2: Can the dimension table be created based on the existing data sources?

Star schema

•The result of dimensional modeling is a dimensional schema containing facts and dimensions •The dimensional schema is often referred to as the star schema

MULTIPLE FACTS IN A DIMENSIONAL MODEL •When multiple subjects of analysis can share the same dimensions, a dimensional model contains more than one fact table •A dimensional model with multiple fact tables is referred to as a constellation or galaxy of stars

•This approach enables :oQuicker development of analytical databases for multiple subjects of analysis, because dimensions are re-used instead of duplicated oStraightforward cross-fact analysis

•Typical dimension in a star schema contains: oAttributes whose values do not change (or change extremely rarely) such as store size and customer gender oAttributes whose values change occasionally and sporadically over time, such as customer zip and employee salary.

Dimension that contains attributes whose values can change referred to as a slowly changing dimension •Most common approaches to dealing with slowly changing dimensions oType 1 oType 2 oType 3

SLOWLY CHANGING DIMENSION: TYPE 1

•Changes the value in the dimension's recordoThe new value replaces the old value. •No history is preserved •The simplest approach, used most often when a change in a dimension is the result of an error

ER modeling

A predominant technique for visualizing database requirements, used extensively for conceptual modeling of operational databases

Transaction-level detailed fact table:

Each row represents a particular transaction

GRANULARITY OF THE FACT TABLES •Granularity describes what is depicted by one row in the fact table •Detailed fact tables have fine level of granularity because each record represents a single fact •Aggregated fact tables have a coarser level of granularity than detailed fact tables as records in aggregated fact tables always represent summarizations of multiple facts

GRANULARITY OF THE FACT TABLES •Due to their compactness, coarser granularity aggregated fact tables are quicker to query than detailed fact tables •Coarser granularity tables are limited in terms of what information can be retrieved from them •One way to take advantage of the query performance improvement provided by aggregated fact tables, while retaining the power of analysis of detailed fact tables, is to have both types of tables coexisting within the same dimensional model, i.e. in the same constellation

Line-item detailed fact table:

Each row represents a line item of a particular transaction

Dimensional modeling

•A data design methodology used for designing subject-oriented analytical databases, such as data warehouses or data marts •Commonly, dimensional modeling is employed as a relational data modeling technique •In addition to using the regular relational concepts (primary keys, foreign keys, integrity constraints, etc.) dimensional modeling distinguishes two types of tables: oDimensions oFacts

Dimension tables (dimensions)

•Contain descriptions of the business, organization, or enterprise to which the subject of analysis belongs •Columns in dimension tables contain descriptive information that is often textual (e.g., product brand, product color, customer gender, customer education level), but can also be numeric (e.g., product weight, customer income level) •This information provides a basis for analysis of the subject

Fact tables

•Contain measures related to the subject of analysis and the foreign keys (associating fact tables with dimension tables) •The measures in the fact tables are typically numeric and are intended for mathematical computation and quantitative analysis

NORMALIZED DATA WAREHOUSE

•Envisions a data warehouse as an integrated analytical database modeled by using the traditional database modeling techniques of ER modeling and relational modeling, resulting in a normalized relational database schema •Populated with the analytically useful data from the operational data sources via the ETL process •Serves as a source of data for dimensionally modeled data marts and for any other non-dimensional analytically useful data sets

INDEPENDENT DATA MARTS

•Stand-alone data marts are created by various groups within the organization, independent of other stand-alone data marts in the organization •Consequently, multiple ETL systems are created and maintained

Relational modeling

•Standard method for logical modeling of operational databases

DIMENSIONALLY MODELED DATA WAREHOUSE •Can be used as a source for dependent data marts and other views, subsets, and/or extracts

•A set of commonly used dimensions known as conformed dimensions is designed first •Fact tables corresponding to the subjects of analysis are then subsequently added •A set of dimensional models is created where each fact table is connected to multiple dimensions, and some of the dimensions are shared by more than one fact table •In addition to the originally created set of conformed dimensions, additional dimensions are included as needed •The result is a data warehouse that is a collection of intertwined dimensionally modeled data marts, i.e. a constellation of stars


Conjuntos de estudio relacionados

Chapter 16: Giving Birth: McKinney

View Set

Chapter 6 The Neonatal Period: Changes During the First Month of Life

View Set

Saunders Mental Health and Pharmacology Psychiatric

View Set