Data and Info. Management Chapter 8 continued (Marist College Eitel Lauria)

¡Supera tus tareas y exámenes ahora con Quizwiz!

Querying the star schema: All queries are executed/formulated in the same way: What 3 steps?

1. Use filtering conditions to select rows from dimension tables 2. Find corresponding rows in the fact table 3. Group by fields in the dimension tables

Snowflake model

A star schema that contains the dimensions that are normalized

Typical dimension in a DW contains what 2 things?

Attributes whose values do not change (or change extremely rarely) Attributes whose values change occasionally and sporadically over time, such as customer zip

Type 3: Can be combined with the use of what?

timestamps

Designing the star schema involves considering what with what?

which dimensions to use with the fact table representing the chosen subject

An identifier

A data element in the operational systems that is neither a fact nor strictly dimension attributes

Role-playing dimension

A single dimension which is expressed differently in a fact table using views is called a role-playing dimension

Customer: The Monster Dimension

Critical element for effective CRM The most challenging dimension for any data warehouse

What feed data into data mining applications?

Data warehouses

Building a Dimensional Model from a Normalized Database

Determine the purpose of the mart Identify an association table as the central fact table Determine facts to be included Replace all keys with surrogate keys Promote foreign keys in related tables to the fact table Add time dimension Refine the dimension tables

slowly changing dimensions

Dimensions that contain attributes whose values can change

POS transaction number was a PK in the operational system. Where should we place it in the dimensional model? Two options.

Discard the dimension Use a "degenerate dimension"

A dimension table typically has fewer or more records or rows than the fact table?

Fewer

Table Deep, Not Wide: Typically a fact table contains fewer or more attributes than a dimension table?

Fewer

Dimensional Modeling: Additional possible fact attributes: A fact table contains what 2 things?

Foreign keys connecting the fact table to the dimension tables The measures related to the subject of analysis

Large dimensions: Rapidly or not rapidly changing?

Rapidly

Data Grain

The level of detail for the measurements or metrics.

Two of the most typical additional attributes that can appear in the fact table are what 2 things?

Transaction identifier Transaction time

Is it possible for a dimension table to have more than one usually disjoint hierarchy?

Yes

junk dimension

a convenient grouping of typically low-cardinality flags and indicators

Dimension tables are "snow-flaked" when the low cardinality fields are removed to separate tables and linked back with what?

artificial key

Lowest grain: facts or metrics are where?

at the lowest possible level at which they could be captured from the operational systems

Junk Dimensions: no real what capability? can significantly increase the size of what?

browse entry the fact table

Aggregated fact tables have a what level of granularity than detailed fact tables as records in aggregated fact tables always represent what?

coarser summarizations of multiple facts

Row indicator

column that provides a quick indicator of whether the record is currently valid

Timestamps

columns that indicates the time interval for which the values in the records are applicable

Concatenated Key: A row in the fact table relates to a what from all the dimension tables?

combination of rows

Querying the star schema: When a query is made against the data warehouse, the results of the query are produced by what?

combining or joining one of more dimension tables with the fact table

A very large number of dimensions typically is a sign that several dimensions are not what and should be what?

completely independent and should be combined into a single dimension.

A dimensional model with multiple fact tables is referred to as a what or what?

constellation or galaxy of stars

If our design has 15+ dimensions, we should look for ways to combine what into a single dimension?

correlated dimensions

In the star schema, the chosen subject of analysis is represented by a what?

fact table

2 Challenges of Rapidly Changing Customer Dimensions

It generally takes too long to constrain or browse among the relationships in such a big table It is difficult to use previously described techniques for tracking changes in these large dimensions

2 Challenges of Rapidly Changing Customer Dimensions. A solution to deal with them?

One solution is to break off frequently analyzed or frequently changing attributes into a separate dimension, referred to as a minidimension

Constellation approach enables what 2 things?

Quicker development of analytical databases for multiple subjects of analysis, because dimensions are re-used instead of duplicated Straightforward cross-fact analysis

Not-normalized (not snowflaked) dimensions provide for simpler or more complex analysis? Normalization is usually necessary or not necessary for analytical databases?

Simpler Not neccessary

Type 1: The simplest or least simplest approach? Used most often when a change in a dimension is the result of what?

Simplest an error

The minidimension terminology refers to when the what key is part of the fact table composite key?

demographics

Type 2: Creates a new additional what record using a new value for the what key every time a value in a dimension record change?

dimension surrogate

Junk Dimensions: Make a flag into it's own what? may greatly increase what, increasing what? can what and what the design?

dimension the number of dimensions, increasing the size of the fact table clutter and confuse

Type 1: The old value in the source system needs to be what?

discarded.

The fact table does or does not contain rows that do not have values for the measures? Understand that the fact table could have what?

does not gaps.

When multiple subjects of analysis can share the same dimensions, a dimensional model contains more than one what?

fact table

Type 3: Applicable in cases in which there is a what number of changes? possible per column of a dimension, or in cases when only a what history is recorded?

fixed limited

One way to take advantage of the query performance improvement provided by aggregated fact tables, while retaining the power of analysis of detailed fact tables, is to do what?

have both types of tables coexisting within the same dimensional model, i.e. in the same constellation

Type 2: Used in cases where what should be preserved?

history

Snowflaking is usually used or not used in dimensional modeling?

not used

Coarser granularity tables are what in terms of what information can be retrieved from them?

limited

In addition to the measures related to the subject of analysis, in certain cases fact tables can contain other attributes that are not what?

measured

The change in the source system needs or need not be preserved in the data warehouse?

needs be preserved in the data warehouse

Three suggestions for slowly changing dimensions: Type 3

new "current" fields, legitimate need to track both old and new states "Original" and "current" values; Intermediate Values are lost

It is a dimensional modeling mistake to represent elements of a hierarchy as what in the fact table?

separate dimensions

Type 1: Sometimes the change in the source system has no what?

significance.

Type 2: Can be combined with the use of what and what?

timestamps and row indicators

4 Advantages of keeping the fact table at the lowest grain

The users can drill down to the lowest level of detail from the data warehouse without the need to go to the operational systems themselves. Base level fact tables must be at the natural lowest levels of all corresponding dimensions. (By doing this, queries for drill down and roll up can be performed efficiently.) Granular fact tables serve as natural destinations for current operational data: Less Transformations requirements Data mining applications need details at the lowest grain.

Detailed fact tables have fine level of granularity because each record represents what

a single fact

Three suggestions for slowly changing dimensions: Type 2 -- do what?

create new record at time of change; partitioning the history (old and new description);

POS transaction number was a PK in the operational system. Where should we place it in the dimensional model? Two options. If you use a "degenerate dimension" then you store the dimension identifier where? Don't create a separate what? This yields a dimension identifier without a what? Becomes part of the what in the fact table?

directly in the fact table dimension table dimension table (hence a degenerate dimension) composite Primary Key

Why is the Customer Dimension the most challenging dimension for any data warehouse (3 things)?

extremely deep (with millions of rows) extremely wide (with dozens or even hundreds of attributes) sometimes subject to rather rapid changing

Degenerate Dimensions (in Kimball's terminology): Occasionally a dimension is merely an what, without any what?

identifier, without any interesting attributes

Junk Dimensions: Leave the flags where? likely what data?

in the fact tables sparse data

Granularity describes what is depicted by what?

one row in the fact table

Three suggestions for slowly changing dimensions: Type 1 -- do what? Is it hard to implement?

overwrite/erase old values; no accurate tracking of history needed; No, it is easy to implement

POS transaction number was a PK in the operational system. Where should we place it in the dimensional model? Two options. If you discard the dimension then fact table will lack what? Discarding the dimension is a good option if the dimension isn't needed for what?

primary key, but that's OK analysis

Due to their compactness, coarser granularity aggregated fact tables are quicker or slower to query than detailed fact tables?

quicker

Combine all relevant flags, etc. into a what? The number of possibilities remain what? What is retained?

single dimension finite information

Junk Dimensions: Remove what from the design? potentially what will be lost? If they provide no what, do what?

the attributes critical information relevance, remove them

Snowflake model: Interferes with user what and what browsing performance?

understandability cross-attribute (cross-dimension)


Conjuntos de estudio relacionados

Dr Jekyll and Mr Hyde Annotations

View Set

Cells, synapses, and Neurotransmitters

View Set

AMT General Prep ----- Materials and Processes

View Set

Programming Output and Input Quiz

View Set

Artificial Intelligence: A Modern Approach Chapter 2 Intelligent Agents

View Set