infs ch 8
type 1
-Changes the value in the dimension's record --The new value replaces the old value. -No history is preserved -The simplest approach, used most often when a change in a dimension is the result of an error
detailed vs aggregated fact tables
Fact tables in a dimensional model can contain either detailed data or aggregated data In detailed fact tables each record refers to a single fact In aggregated fact tables each record summarizes multiple facts
type 2
-Creates a new additional dimension record using a new value for the surrogate key every time a value in a dimension record changes -Used in cases where history should be preserved -Can be combined with the use of timestamps and row indicators Timestamps - columns that indicates the time interval for which the values in the records are applicable Row indicator - column that provides a quick indicator of whether the record is currently valid
line-item vs transaction-level detailed fact table
Line-item detailed fact table -Each row represents a line item of a particular transaction Transaction-level detailed fact table -Each row represents a particular transaction
dimensions
dimension tables; -Contain descriptions of the business, organization, or enterprise to which the subject of analysis belongs -Columns in dimension tables contain descriptive information that is often textual (e.g., product brand, product color, customer gender, customer education level), but can also be numeric (e.g., product weight, customer income level) -This information provides a basis for analysis of the subject For example, if the subject of the business analysis is sales, it can be analyzed by dimension columns such as product brand, customer gender, customer income level, and so on. there is always a CALENDAR dimension
normalized data warehouse
-Envisions a data warehouse as an integrated analytical database modeled by using the traditional database modeling techniques of ER modeling and relational modeling, resulting in a normalized relational database schema -Populated with the analytically useful data from the operational data sources via the ETL process -Serves as a source of data for dimensionally modeled data marts and for any other non-dimensional analytically useful data sets Data warehouse as a normalized integrated analytical database was first proposed by Bill Inmon, and hence, the normalized data warehouse approach is often referred to as the Inmon approach.
star schema
-The result of dimensional modeling is a dimensional schema containing facts and dimensions -The dimensional schema is often referred to as the star schema -In the star schema, the chosen subject of analysis is represented by a fact table -Designing the star schema involves considering which dimensions to use with the fact table representing the chosen subject -For every dimension under consideration, two questions must be answered: --Question 1: Can the dimension table be useful for the analysis of the chosen subject? --Question 2: Can the dimension table be created based on the existing data sources?
multiple facts in a dimensional model
-When multiple subjects of analysis can share the same dimensions, a dimensional model contains more than one fact table -A dimensional model with multiple fact tables is referred to as a constellation or galaxy of stars This approach enables: -Quicker development of analytical databases for multiple subjects of analysis, because dimensions are re-used instead of duplicated -Straightforward cross-fact analysis
slowly changing dimension
Typical dimension in a star schema contains: -Attributes whose values do not change (or change extremely rarely) such as store size and customer gender -Attributes whose values change occasionally and sporadically over time, such as customer zip and employee salary. Dimension that contains attributes whose values can change referred to as a slowly changing dimension -Most common approaches to dealing with slowly changing dimensions -Type 1 -Type 2 -Type 3
characteristics of dimensions and facts
a typical dimension contains relatively static data, while in a typical fact table, records are added continually, and the table rapidly grows in size. In a typical dimensionally modeled analytical database, dimension tables have orders of magnitude fewer records than fact tables
snowflake model
-A star schema that contains the dimensions that are normalized snowflaking is usually not used in dimensional modeling -Not-normalized (not snowflaked) dimensions provide for simpler analysis -Normalization is usually not necessary for analytical databases Analytical databases are typically read only. Hence, no danger of update anomalies.
type 3
-Involves creating a "previous" and "current" column in the dimension table for each column where changes are anticipated -Applicable in cases in which there is a fixed number of changes possible per column of a dimension, or in cases when only a limited history is recorded. -Can be combined with the use of timestamps
data warehouse/ data mart modeling approaches
Three of the most common data warehouse and data mart modeling approaches: -Normalized data warehouse -Dimensionally modeled data warehouse -Independent data marts
surrogate key
Typically, in a star schema all dimension tables are given a simple, non-composite system-generated key, also called a surrogate key Values for the surrogate keys are typically simple auto-increment integer values Surrogate key values have no meaning or purpose except to give each dimension a new column that serves as a primary key within the dimensional model instead of the operational key For example, instead of using the primary key ProductID as the primary key of the PRODUCT dimension, a new surrogate key column ProductKey is created.
dimensional modeling
-A modeling technique tailored specifically for analytical database design purposes -Regularly used in practice for modeling data warehouses and data marts -A data design methodology used for designing subject-oriented analytical databases, such as data warehouses or data marts -Commonly, dimensional modeling is employed as a relational data modeling technique -In addition to using the regular relational concepts (primary keys, foreign keys, integrity constraints, etc.) dimensional modeling distinguishes two types of tables: 1) Dimensions 2) Facts As a relational modeling technique, dimensional modeling, just like standard relational modeling, designs relational tables that have primary keys and are connected to each other via foreign keys, while conforming to the standard relational integrity constraints
granularity of the fact tables
-Granularity describes what is depicted by one row in the fact table -Detailed fact tables have fine level of granularity because each record represents a single fact -Aggregated fact tables have a coarser level of granularity than detailed fact tables as records in aggregated fact tables always represent summarizations of multiple facts -Due to their compactness, coarser granularity aggregated fact tables are quicker to query than detailed fact tables -Coarser granularity tables are limited in terms of what information can be retrieved from them -One way to take advantage of the query performance improvement provided by aggregated fact tables, while retaining the power of analysis of detailed fact tables, is to have both types of tables coexisting within the same dimensional model, i.e. in the same constellation Aggregation is requirement-specific while a detailed granularity provides unlimited possibility for analysis.
facts
fact tables; -Contain measures related to the subject of analysis and the foreign keys (associating fact tables with dimension tables) -The measures in the fact tables are typically numeric and are intended for mathematical computation and quantitative analysis For example, if the subject of the business analysis is sales, one of the measures in the fact table sales could be the sale's dollar amount. The sale amounts can be calculated and recalculated using different mathematical functions across various dimension columns. For example, the total and average sale can be calculated per product brand, customer gender, customer income level, and so on. A fact table contains: -Foreign keys connecting the fact table to the dimension tables -The measures related to the subject of analysis In addition to the measures related to the subject of analysis, in certain cases fact tables can contain other attributes that are not measures Two of the most typical additional attributes that can appear in the fact table are: -Transaction identifier -Transaction time
dimensionally modeled data warehouse
-Data warehouse as a collection of dimensionally modeled intertwined data marts (i.e. constellation of dimensional models) that integrates analytically useful information from the operational data sources -Same as the normalized data warehouse approach when it comes to the utilization of operational data sources and the ETL process Dimensionally modeled data warehouse approach was championed by Ralph Kimball, and hence, it is often referred to as the Kimball approach. -A set of commonly used dimensions known as conformed dimensions is designed first -Fact tables corresponding to the subjects of analysis are then subsequently added -A set of dimensional models is created where each fact table is connected to multiple dimensions, and some of the dimensions are shared by more than one fact table -In addition to the originally created set of conformed dimensions, additional dimensions are included as needed -The result is a data warehouse that is a collection of intertwined dimensionally modeled data marts, i.e. a constellation of stars For example, in a retail company, conformed dimensions CALENDAR, PRODUCT, STORE can be designed first, as they will be commonly used by subjects of analysis. can be used as a source for dependent data marts and other views, subsets, and/or extracts
independent data marts
-Stand-alone data marts are created by various groups within the organization, independent of other stand-alone data marts in the organization -Consequently, multiple ETL systems are created and maintained Independent data marts are considered an inferior strategy -Inability for straightforward analysis across the enterprise -The existence of multiple unrelated ETL infrastructures In spite of obvious disadvantages, a significant number of corporate analytical data stores are developed as a collection of independent data marts This strategy does not result in a data warehouse, but in a collection of unrelated independent data marts. While the independent data marts within one organization may end up as a whole, containing all the necessary analytical information, such information is scattered and difficult or even impossible to analyze as one unit
comparing dimensional modeling and ER modeling as data warehouse/ data mart design techniques
ER modeling can be used as a conceptual data warehouse/data mart design technique, followed by relational modeling as logical data warehouse/data mart design technique Dimensional modeling can be used both for conceptual data warehouse/data mart design and logical data warehouse/data mart design Both ER modeling and dimensional modeling are viable alternatives for modeling data warehouses/data marts, and can be used within the same project For example, dimensional modeling can be used during the requirements collection process for collecting, refining, and visualizing initial requirements. Based on the resulting requirements visualized as a collection of facts and dimensions, an ER model for a normalized physical data warehouse can be created if there is a preference for a normalized data warehouse. Once a normalized data warehouse is created, a series of dependent data marts can be created using dimensional modeling.