cis 356
What are the steps in the development of data warehouse
-Start with business requirements to gather what it needs to do and what questions to answer and how to use -Doing local development aka ERD (will look different for analytical and operational) -Building the actual relational tables, convert them to SQL ---In parallel can develop indirect access (application to hit database) ---In parallel ETO process to understand how to convert from operational to format organizational structure that matched relational tables developed -Deployment (test it) -Maintain it (backup and perform index)
What is a constellation of stars
A dimensional model with multiple fact tables
What is the role of dimension tables in the dimensional model
Contain descriptions of the business, organization, or enterprise to which the subject of the analysis belongs
What is the role of fact tables in the dimensional model
Contain measures related to the subject of analysis. They contain foreign keys
What is the difference between a dependents and independent data mart
Dependent: does not have its own source system, data comes from data warehouse Independent: Stand alone data mart created in the same fashion as data warehouse
What does granularity of the fact table refer to
Describes what is depicted by one row in the fact table
What is the difference between a detailed fact table and an aggregated fact table
Detailed: each record refers to a single fact Aggregated: each record summarized multiple facts
Describe the redundancy difference between operational and analytical data
Eliminating redundancy is not as critical in analytical databases as it is in operational Reducing the redundancy of information is a major goal of operational databases. One of the main reasons for minimizing the redundancy in operational databases is to eliminate the possibility of update anomalies. However, the updating of data by the end users is not allowed in analytical databases, therefore there is no particular danger of update anomalies.
Why is a data warehouse create as a separate data store
If run data analytics against operation it will be slowed down because can't handle the huge data The data in the operational systems are useful. The issue is that the data is not organized in an efficient manner to perform analytics against the operational systems
How does the use of a dimensional model simplify analytical queries
It reduces the number of tables that have to be considered and provides a singular place for analysis, which eliminates the need for data preparation on behalf of the analyst.
Why is snowflaking usually not used in dimensional modeling
Not-normalized (non snowflaked) dimensions provide simpler analysis Normalization is usually not necessary for analytical databases Because it is not necessary (no danger of update anomalies) and it would only needlessly complicate analytical queries.
Describe the data time-level-of-detail between operational and analytical data
Operational reflects detailed data (every individual transaction is recorded) On the other hand, analytical data is both detailed and summarized.
Describe the data time horizon difference between operational and analytical data
Operational systems (day to day) have short time horizon of data than analytical systems. Operational the horizon could weeks or months where analytical could be a span over years of data.
Describe the difference in queried amounts of data and frequency of queries between operational and analytical data
Operational typically process smaller amounts of data and are typically issued much more often and by more users than analytical
What are the major components of a data warehousing system
Source Systems, Extraction-transformational Load (ETL) infrastructure, and the data warehouse itself
What are the most common approaches for handling slowly changing dimensions
Type 1: New dimension value and replaces the old, no history Type 2: Keep track of all changes, additional tables required Type 3: Keep current and last only