Data and Info. Management Chapter 8 continued (Marist College Eitel Lauria)
Querying the star schema: All queries are executed/formulated in the same way: What 3 steps?
1. Use filtering conditions to select rows from dimension tables 2. Find corresponding rows in the fact table 3. Group by fields in the dimension tables
Snowflake model
A star schema that contains the dimensions that are normalized
Typical dimension in a DW contains what 2 things?
Attributes whose values do not change (or change extremely rarely) Attributes whose values change occasionally and sporadically over time, such as customer zip
Type 3: Can be combined with the use of what?
timestamps
Designing the star schema involves considering what with what?
which dimensions to use with the fact table representing the chosen subject
An identifier
A data element in the operational systems that is neither a fact nor strictly dimension attributes
Role-playing dimension
A single dimension which is expressed differently in a fact table using views is called a role-playing dimension
Customer: The Monster Dimension
Critical element for effective CRM The most challenging dimension for any data warehouse
What feed data into data mining applications?
Data warehouses
Building a Dimensional Model from a Normalized Database
Determine the purpose of the mart Identify an association table as the central fact table Determine facts to be included Replace all keys with surrogate keys Promote foreign keys in related tables to the fact table Add time dimension Refine the dimension tables
slowly changing dimensions
Dimensions that contain attributes whose values can change
POS transaction number was a PK in the operational system. Where should we place it in the dimensional model? Two options.
Discard the dimension Use a "degenerate dimension"
A dimension table typically has fewer or more records or rows than the fact table?
Fewer
Table Deep, Not Wide: Typically a fact table contains fewer or more attributes than a dimension table?
Fewer
Dimensional Modeling: Additional possible fact attributes: A fact table contains what 2 things?
Foreign keys connecting the fact table to the dimension tables The measures related to the subject of analysis
Large dimensions: Rapidly or not rapidly changing?
Rapidly
Data Grain
The level of detail for the measurements or metrics.
Two of the most typical additional attributes that can appear in the fact table are what 2 things?
Transaction identifier Transaction time
Is it possible for a dimension table to have more than one usually disjoint hierarchy?
Yes
junk dimension
a convenient grouping of typically low-cardinality flags and indicators
Dimension tables are "snow-flaked" when the low cardinality fields are removed to separate tables and linked back with what?
artificial key
Lowest grain: facts or metrics are where?
at the lowest possible level at which they could be captured from the operational systems
Junk Dimensions: no real what capability? can significantly increase the size of what?
browse entry the fact table
Aggregated fact tables have a what level of granularity than detailed fact tables as records in aggregated fact tables always represent what?
coarser summarizations of multiple facts
Row indicator
column that provides a quick indicator of whether the record is currently valid
Timestamps
columns that indicates the time interval for which the values in the records are applicable
Concatenated Key: A row in the fact table relates to a what from all the dimension tables?
combination of rows
Querying the star schema: When a query is made against the data warehouse, the results of the query are produced by what?
combining or joining one of more dimension tables with the fact table
A very large number of dimensions typically is a sign that several dimensions are not what and should be what?
completely independent and should be combined into a single dimension.
A dimensional model with multiple fact tables is referred to as a what or what?
constellation or galaxy of stars
If our design has 15+ dimensions, we should look for ways to combine what into a single dimension?
correlated dimensions
In the star schema, the chosen subject of analysis is represented by a what?
fact table
2 Challenges of Rapidly Changing Customer Dimensions
It generally takes too long to constrain or browse among the relationships in such a big table It is difficult to use previously described techniques for tracking changes in these large dimensions
2 Challenges of Rapidly Changing Customer Dimensions. A solution to deal with them?
One solution is to break off frequently analyzed or frequently changing attributes into a separate dimension, referred to as a minidimension
Constellation approach enables what 2 things?
Quicker development of analytical databases for multiple subjects of analysis, because dimensions are re-used instead of duplicated Straightforward cross-fact analysis
Not-normalized (not snowflaked) dimensions provide for simpler or more complex analysis? Normalization is usually necessary or not necessary for analytical databases?
Simpler Not neccessary
Type 1: The simplest or least simplest approach? Used most often when a change in a dimension is the result of what?
Simplest an error
The minidimension terminology refers to when the what key is part of the fact table composite key?
demographics
Type 2: Creates a new additional what record using a new value for the what key every time a value in a dimension record change?
dimension surrogate
Junk Dimensions: Make a flag into it's own what? may greatly increase what, increasing what? can what and what the design?
dimension the number of dimensions, increasing the size of the fact table clutter and confuse
Type 1: The old value in the source system needs to be what?
discarded.
The fact table does or does not contain rows that do not have values for the measures? Understand that the fact table could have what?
does not gaps.
When multiple subjects of analysis can share the same dimensions, a dimensional model contains more than one what?
fact table
Type 3: Applicable in cases in which there is a what number of changes? possible per column of a dimension, or in cases when only a what history is recorded?
fixed limited
One way to take advantage of the query performance improvement provided by aggregated fact tables, while retaining the power of analysis of detailed fact tables, is to do what?
have both types of tables coexisting within the same dimensional model, i.e. in the same constellation
Type 2: Used in cases where what should be preserved?
history
Snowflaking is usually used or not used in dimensional modeling?
not used
Coarser granularity tables are what in terms of what information can be retrieved from them?
limited
In addition to the measures related to the subject of analysis, in certain cases fact tables can contain other attributes that are not what?
measured
The change in the source system needs or need not be preserved in the data warehouse?
needs be preserved in the data warehouse
Three suggestions for slowly changing dimensions: Type 3
new "current" fields, legitimate need to track both old and new states "Original" and "current" values; Intermediate Values are lost
It is a dimensional modeling mistake to represent elements of a hierarchy as what in the fact table?
separate dimensions
Type 1: Sometimes the change in the source system has no what?
significance.
Type 2: Can be combined with the use of what and what?
timestamps and row indicators
4 Advantages of keeping the fact table at the lowest grain
The users can drill down to the lowest level of detail from the data warehouse without the need to go to the operational systems themselves. Base level fact tables must be at the natural lowest levels of all corresponding dimensions. (By doing this, queries for drill down and roll up can be performed efficiently.) Granular fact tables serve as natural destinations for current operational data: Less Transformations requirements Data mining applications need details at the lowest grain.
Detailed fact tables have fine level of granularity because each record represents what
a single fact
Three suggestions for slowly changing dimensions: Type 2 -- do what?
create new record at time of change; partitioning the history (old and new description);
POS transaction number was a PK in the operational system. Where should we place it in the dimensional model? Two options. If you use a "degenerate dimension" then you store the dimension identifier where? Don't create a separate what? This yields a dimension identifier without a what? Becomes part of the what in the fact table?
directly in the fact table dimension table dimension table (hence a degenerate dimension) composite Primary Key
Why is the Customer Dimension the most challenging dimension for any data warehouse (3 things)?
extremely deep (with millions of rows) extremely wide (with dozens or even hundreds of attributes) sometimes subject to rather rapid changing
Degenerate Dimensions (in Kimball's terminology): Occasionally a dimension is merely an what, without any what?
identifier, without any interesting attributes
Junk Dimensions: Leave the flags where? likely what data?
in the fact tables sparse data
Granularity describes what is depicted by what?
one row in the fact table
Three suggestions for slowly changing dimensions: Type 1 -- do what? Is it hard to implement?
overwrite/erase old values; no accurate tracking of history needed; No, it is easy to implement
POS transaction number was a PK in the operational system. Where should we place it in the dimensional model? Two options. If you discard the dimension then fact table will lack what? Discarding the dimension is a good option if the dimension isn't needed for what?
primary key, but that's OK analysis
Due to their compactness, coarser granularity aggregated fact tables are quicker or slower to query than detailed fact tables?
quicker
Combine all relevant flags, etc. into a what? The number of possibilities remain what? What is retained?
single dimension finite information
Junk Dimensions: Remove what from the design? potentially what will be lost? If they provide no what, do what?
the attributes critical information relevance, remove them
Snowflake model: Interferes with user what and what browsing performance?
understandability cross-attribute (cross-dimension)