potential CIS 356 - Exam 3

Ace your homework & exams now with Quizwiz!

What are the steps in the development of data warehouses?

1. Data Warehouse Requirements-what capabilities/functionalities do I need? 2. Data Warehouse Modeling- creation of ERDs 3. Data Warehouse Creation- the "how"; using database management software and implementing the data warehouse model 4. Data Warehouse Deployment-loading, testing, and training of the data warehouse to end users 5. Data Warehouse Usage-retrieval of data; direct/indirect usage 6. Data Warehouse Maintenance-solving technical issues, updating security, implementing backup and recovery procedures.

What is a constellation of stars?

A constellation of stars is a star schema that contains multiple fact tables.

What is a Data Mart?

A data mart is a data store based on the same principles as a data warehouse, but with a more limited scope. As a result, it is easier to handle and less expensive since it has fewer data sources and often has a narrow focus.

What is the difference between a dependent and independent data mart?

A dependent data mart does not have its own source system, its data comes from the data warehouse. An independent data mart has its own ETL systems and source systems created in the same way as the data warehouse.

Describe the data update difference between operational and analytical data.

Data in the operational systems regularly updated by users. This entails inserting, modifying, and deleting. Analytical databases are only allowed to retrieve data and updates by end users are not allowed.

What is the difference between a detailed fact table and an aggregated fact table?

In a detailed fact table, each record refers to a single fact, while in a aggregated fact table, each record summarizes multiple facts.

Why is a data warehouse created as a separate data store?

It is created as a separate data store mainly to support analytical queries. The performance of the operational day-to-day tasks can have major issues if those tasks have to compete for computing resources with analytical queries and it's usually impossible to structure an operational database for analysis.

Describe the data time-representation difference between operational and analytical data.

Operational data accounts for the current state of affairs, while analytical data can represent both current and snapshots of the past. For example, in a bank's operational database, a client will have one value that is their checking balance and this value will change to the new value depending on a withdrawal or deposit. In their analytical database, the current balance can be stored along with the values over the span of the last month.

Describe the data time-horizon difference between operational and analytical data.

Operational data has a time-horizon of 60-90 days, after which it is removed from the operational system and usually archived. Analytical data has a time-horizon of years to provide trend and pattern analysis.

Describe the data orientation difference between operational and analytical data.

Operational data is used to support applications that help facilitate operations. These databases are referred to as application-oriented. Analytical data is used to analyze one or more business area, such as sales, costs, or profit. These databases are referred to as subject-oriented.

Describe the data level-of-detail difference between operational and analytical data.

Operational data reflects detailed data, as in the details of every individual transaction. This data could be the calling number, called number, time of call, and call duration for a phone call recorded in a telecom's database. Summarized data is derived in this case since it is frequently changing. Analytical data can contain both detailed and summarized data that is physically stored. The summarizations are physically stored and pre-computed in the analytical database because they are not subject to change and it would take more time to recalculate the totals for every analytical query.

Describe the data redundancy difference between operational and analytical data.

Operational databases' main goal is to reduce the redundancy of data to eliminate the chance of update anomalies. Since analytical database are not allowed to update the data, there are no efforts to eliminate redundancy.

Describe the difference in queried amounts of data and frequency of queries between operational and analytical data.

Operational queries handle much smaller amounts of data but are performed frequently by many users. Analytical queries process greater amounts of data but are executed less frequently.

Why is Snowflaking usually not used in dimensional modeling?

Snowflaking is usually not used in dimensional modeling because not-normalized dimensions generate simpler analysis. If the schema is normalized, the number of tables to consider increases and makes it more complex. Another reason is, because dimensional modeling is used to design analytical databases and analytical databases are "read-only", they can't be affected by update anomalies so normalization is not needed.

What does granularity of the fact table refer to?

The granularity of the fact table refers to the information depicted by one row in the fact table. Therefore, detailed fact tables have a fine level of granularity while aggregated fact tables have a coarser level of granularity.

What are the major components of a data warehousing system?

The major components are source system, ETL process, data warehouse, and front-end applications.

What is the primary purpose of a data warehouse?

The primary purpose of a data warehouse is to perform data analysis.

What is the role of dimensional tables in the dimensional model?

The role of dimensional tables in a dimensional model is containing descriptive information that provides a basis for analysis of the subject.

What is the role of fact tables in the dimensional table?

The role of fact tables in the dimensional model is to contain measures related to the subject that are used for mathematical computation and quantitative analysis.

How does the use of a dimensional model simplify analytical questions?

The use of a dimensional model simplifies analytical questions by allowing the researcher to create dimensions depending on what they want to analyze.

Explain the following parts of the data warehouse definition:

• Structured repository: organized, structured repository • Integrated: ties together useful data from a variety of sources • Subject oriented: analyze business functions rather than a specific operation • Enterprise wide: organization view of data • Historical: refers to the larger time horizon in the data warehouse than in the operational databases • Time-variant: contains slices or snapshots of data from different periods of time across its time horizon • Developed for the retrieval of analytical information: the only available functionality to users is retrieval; data in data warehouse is not subject to modifications, insertions, or deletions by end users • May include the data at the fine level of detail or summary data or both.

What are the most common approaches for handling slowly changing dimensions?

• Type 1- Changes the value in the dimension's record so the new one replaces the old one. The simplest approach and is used most often when a change in a dimension is an error. • Type 2- Creates a new additional dimension record using a new value for the surrogate key every time a value in a dimension record changes. This approach is used in cases where history should be preserved. Most commonly used. • Type 3- Creating a "previous" and "current" column in the dimension table for each column where changes are anticipated. Applicable in cases where there is a fixed number of changes possible per column of a dimension, or if only a limited history is recorded.


Related study sets

Sociology Quiz Questions (Ed. 8) Final

View Set