Intro: Data warehousing and multidimensional modeling

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Yes. For example, we might have two hierarchies for time; one for calendar year and one for fiscal year. The requirement is that these two hierarchies share one or more lowest levels (e.g. Day) and group these into multiple levels higher up (e.g. Fiscal Year, Calendar Year).

Can a dimension have more than one hierarchy represented?

Only if it makes sense. We can also have measureless facts that are simply just collections of dimension values.

Does a fact always have a numerical value associated with it?

No. Dimensions usually only make up for 1-5% of the storage in data warehouses and dimension updates are handled centrally - this gives us the possibility of ensuring data consistency.

Does data redundancy (e.g. in star schemas) cause space-problems or update performance problems?

We only create the table once. We create a view of the table for every role.

How are Role-playing dimensions handled in ROLAP?

We don't make the dimension as that would be pointless. Example: We have a degenerate dimension Order that only contains an order number. We simply include this number directly in the fact table.

How do we implement degenerate dimensions in ROLAP?

A hierarchical structure of levels of detail. For example, T -> Year -> Month -> Day.

How is a dimension organised?

It can be. Space is not much of an issue, and some redundancy can support simpler formulation of queries and faster query performance.

Is it a good idea to have data redundancy?

1. Choose business processes to model (e.g. sales). 2. Choose the granularity of these processes. 3. Design the dimensions. 4. Choose the measures.

What are the four sub-processes in multidimensional modelling?

Dimensions, facts, and measures.

What are the principal elements of a cube?

Additive, semi-additive, and non-additive.

What are the three classes of measures?

1. Navigational query: Examine one dimension 2. Aggregation query: Summarise fact data

What are the two kinds of OLAP queries?

Event facts and snapshot (state) facts.

What are the two kinds of facts?

On-Line Analytical Processing.

What does OLAP abbreviate?

On-Line Transaction Processing.

What does OLTP abbreviate?

Combining two cubes by means of on or more shared dimensions. Corresponds to a full outer join.

What is "Drilling Across"?

The opposite of rolling up. For example, when we have data regarding the genre level of the book dimension, but are actually interested in the page count or specific title. Drill down into more detail.

What is "Drilling Down"?

When drilling down, we get higher levels of detail by going down the dimension levels. When drilling out, we acquire more detail by including other dimensions. For example, before we only had the genre level of the book dimension, but we can acquire more detail by including the time dimension.

What is "Drilling Out"?

To reduce some dimension of a cube to a higher dimension value. For example, when we are only interested in the genre level of the book dimension.

What is "Rolling Up"?

A dimension playing several roles. For example, the Time dimension may be used for both Shipping Date and Order date.

What is a Role-playing dimension?

A mutlidimensional data warehouse.

What is a collection of related cubes?

A multidimensional data structure where a cell (intersection between dimensions) contains a fact.

What is a cube?

A subset of a data warehouse.

What is a data mart?

A dimension that only contains a single numerical value (order number, for example). Say we want to keep track of specific orders made by customers. We need an Order dimension. We really only need an order number. Order is a degenerate dimension.

What is a degenerate dimension?

Querying can take a long time since a lot of joins are required over all the sub-dimension tables.

What is a drawback of snowflake schemas?

They are prone to data redundancy. For example, we may have a table for the time dimension. Each row would contains the day, month, and year. Each day would, as such, be defined by all three values. This is unnecessary.

What is a drawback of star schemas?

The thing of interest; what we want to keep track of. It is also the intersection of all dimensions in the cube.

What is a fact?

A dimension containing all combinations of "trash" attributes that we don't know where else to put (e.g. Shipping Mode, wrapping, etc.). Only include binary flags and low cardinality values.

What is a junk dimension?

A numerical property of a fact. Formally, a measure is defined by two constituents; a numerical property and a formula (function for combining measures, e.g. SUM).

What is a measure?

A measure that cannot be meaningfully aggregated over any dimension, usually caused by the formula. For example, when averages for lower level values cannot be combined into averages for higher-level values.

What is a non-additive measure?

A measure that cannot be aggregated meaningfully along one or more dimensions. Usually occur with snapshot facts. It does not make sense to sum the snapshot of inventory levels over periods of time. It does however make sense to sum inventory levels over location.

What is a semi-additive measure?

Has a fact table. For each dimension, there is a table for every level of the dimension, "snowflaking" out from the fact table. Tables for lower levels contain a key for the containing level. Lower dimension levels are closer to the fact table while higher dimension levels are at the outer rim of the snowflake. Hierarchies are explicit in snowflake schemas.

What is a snowflake schema?

Relational representation of a cube. One table for every dimension. Every table has a key column and a column for every dimension level. Also, if level properties exist, these have one column each as well. Also has a fact table that contains one row for each fact. Fact table has one column for each dimension. Fact table also has a column for the measure, if it exists.

What is a star schema?

In a fact table, we have foreign keys to all dimension tables. These keys do not carry any semantic information, and are just "dumb" foreign keys, or surrogate keys.

What is a surrogate key?

A measure that can be meaningfully aggregated over any dimension. For example, total sales can be aggregated meaningfully over time, location, and ware.

What is an additive measure?

Hierarchies are explicit, there is little to no redundancy.

What is an advantage of snowflake schemas?

Fast queries and fast updates.

What is an advantage of star schemas?

A dimension referenced by another dimension. Example: A Customer dimension references a Profile dimension. The profile dimension is an outrigger.

What is an outrigger?

We slice when we choose a specific dimension value, for example "Consider sales in 2009". We dice when we further slice a slice, for example by choosing a specific Book dimension value: "Consider sales in 2009 and only books written by Jane Austen."

What is slicing and dicing?

There is a focus on gathering data with as much context as possible. While in OLTP, when a ware is sold, we might decrement an integer of wares in stock, but in OLAP we will track the exact ware sold, when it was sold, who sold it, where it was sold, etc.

What is the difference between OLAP and OLTP?

A subject oriented, integrated, time variant, non-volatile collection of data in support of management's decision making process.

What is the formal definition of a data warehouse?

The granularity depends entirely on the level of detail of all its dimensions. Typically represented as "Sales By Shop By Hour" - here, we see that the lowest levels of our two dimensions is "shop" and "hour", respectively.

What is the granularity of a fact?


Set pelajaran terkait

Chapter 1 - EMS Systems (Vocabulary)

View Set

AP Human Geography - Flashcards - Development of Agriculture

View Set

PHR: Workforce Planning and Employment

View Set

Human Resource Management: Summary

View Set