Chapter 7 - Data Warehouse Databases

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

data mart

A _____ is a smaller database that is tailored /specialized for a specific part of a company and not the entire company itself. There is no separate ETL process for this DB and the data typically comes from the corporate data warehouse.

dimension

A _____ is a way to give context to data by allowing you to further describe the data. They're groupings or categories that describe the fact.

fact

A ____ will almost always be a numeric value that is aggregated (summed, counted etc) from the operational DB and then loaded into the Data Warehouse. This is used to aid us in decision making.

start schema

A _____ is a design where the FACT to be reported is defined (or placed into context) via one or more dimensions.

fact, measure.

A _____ is some piece of data which we want to analyze with the goal of being able to derive some business knowledge that will aid us in our decision making process. This can also be called a _____ .

It uses data in the data warehouse to search for patterns or previously hidden knowledge.

Describe Data Mining.

This is what happens to ALL data that gets entered into a Data Warehouse. It is the process of retrieving data (extraction), formatting the data into a format that the Data Warehouse uses (transform) and then loading that data into the Data Warehouse (load). For example if you purchase data from a company they may format dates like mm/dd/yyyy and the Data Warehouse wants this data in the format yyyy/dd/mm. This date data will be reformatted in the ETL process.

Describe the extraction, load, and transform (ETL) process.

Data WH's are typically summarized while operational DB's are detailed. Data WH's are historical data while operational DB's store current data. Data WH's have multi-dimensional data while operational DB's store single dimension data. Data WH's are tuned for queries while operational DB's must also process INSERT's, UPDATE's, and DELETE's.

Here are some characteristics of Data Warehouses compared to Operational Databases.

• The FACT in the schema will always be normalized • There is almost always a dimension for time in the data warehouse application. • The primary key in the FACT table will always be a combination of the primary keys of all the dimension tables. • Each part of the FACT's primary key is a foreign key into a dimension table. • Sometimes a data warehouse will have more than one FACT table. This will be done for performance reasons or when you want to track another FACT with the same dimensions.

Here are some facts about star schema design.

This is called a DICE operation.

The technical definition of this operation is when you fix or set the value of all dimensions in your query report. What operation is this? HINT: This operation selects a sub cube of data from the dimensional data warehouse.

This is true. Roll-ups show less detail and Drill-downs show more detail.

T/F Roll-ups show less and less details. Drill-downs show more and more detail?

true.

T/F. The intersection of a column and a row is called a cell that contains fact data.

This is called a SLICE operation. You use this to select a subset of data from the dimensional data warehouse.

The technical definition of this operation is when you fix or set the value of one dimension in your query report. What operation is this? HINT: You can think of this operation as taking a sliver off of the dimensional cube.

This is called a PIVOT.

This operation is used to rotate the dimensional cube to display data from a different perspective. What operation is this called?

This is called a ROLL-UP operation. This is often done to investigate "what if" scenarios. For example we could see what the yearly effect would be if we increase weekly sales by 5%. We can update weekly quantities by %5 then ROLL-UP to see new monthly totals. Or ROLL-UP to see new yearly totals. You get it.

This operation is used when you aggregate (sum or count) the quantities for subcategories into subtotals for super categories. For example, aggregating daily totals to weekly totals to monthly totals to yearly totals etc. What is this operation called?

This is called a DRILL-DOWN operation. This is often used to research problems. Management can use this operation to hone in on the year that sales were bad, then DRILL-DOWN to the quarter, month, week, day etc. that the sales were bad. This is accomplished by using subcategories of a dimension to display additional levels of detail.

This operation is used when you want to investigate something in more detail. For example when you're looking at monthly totals and you want to view weekly, or daily, or hourly totals etc. What is this operation called?

Business Intelligence (BI) Systems.

What are O.L.A.P. and Data Mining referred to as?

Online Analytical Processing

What does OLAP stand for?

A central repository of mainly historical data that is used to make management decisions.

What is a Data Warehouse?

Data Mart

What is a small specialized data warehouse that covers a single topic?

Data Mining allows the user to discover new trends or patterns that are hidden in the data, its detailed data. Data mining use an optimzed relational DB. OLAP is not detailed data. OLAP is summarized data and usually uses dimensional cubes to represent data.

What is the main difference between OLAP and Data Mining?

To support management's analysis of the business and their business decisions.

What is the main purpose of BI Systems?

This is used to normalize a dimension table because, remember, only the FACT table is normalized, the dimension tables are not normalized.

What is the purpose of a snowflake table?

The fact table

What table in a star schema always has the largest number of rows?

O.L.A.P. and Data Mining.

What two types of reporting use Data Warehouses?

It can come from the Operational Database, the purchase of Data from other companies, or it can come from calculations performed on the Operational Database.

Where do Data Warehouses mainly get there data from?

dimensions

_____ are almost always nouns from from the operational database that were used to aggregate the fact. These can often be further aggregated into higher levels. For example day to week to month to quarter to year.


Set pelajaran terkait

Cognitive Psychology Exam 2- Chapter 6

View Set

2020 NISSAN SENTRA CERTIFICATION

View Set

Acute Respiratory Failure and Acute Respiratory Distress Syndrome

View Set

Morphology and Homologous, Analogous, and Vestigial Structures

View Set

Foundations of Cybersecurity - Module 2

View Set