OIM297A Test Two
What is the difference between Type 1, 2 and 3 options, when handling slowly changing dimensions?
A Type 1 approach changes the value in the dimension's record, where the new value replaces the old value. A Type 2 approach creates a new additional dimension record using a new value for the surrogate key every time a value in the dimension changes. A Type 3 approach is applicable in cases in which there is a fixed number of changes possible per column of a dimension, or in cases when only a limited history is recorded. Previous and Current columns are required.
Describe the Inmom and Kimball approaches to modeling a data warehouses, which one would you prefer and why?
The Inmom approach means that the data warehouse is filled with analytical data and is simply just used as a source of data to be used by dimensionally modeled data marts and other analytical data sets. The normalized data warehouse stores the analytically useful data so that it can be accessed by other data sets. The Kimball approach views a data warehouse as a collection of dimensionally modeled intertwined data marts. The Kimball approach uses conformed dimensions that are designed first. These are commonly used for data analysis.
Describe MOLAP and ROLAP and explain their differences.
MOLAP (multidimensional online analytical processing) is an architecture that stores the data from a warehouse in a multidimensional cube. The complexity of the data is hidden from the MOLAP user so they can use common functions. A MOLAP server usually contains a limited amount of data and allows for very fast analysis by precalculating outcomes and storing them in cubes. ROLAP (relational online analytical processing) is a tool that translates the queries into SQL statements. It is different than MOLAP because it has no limitations on the size of the database or kind of analysis that can be performed. It is not as fast as MOLAP but can handle larger quantities of data.
What is the difference between OLTP and OLAP?
OLTP (online transaction processing) refers to the updating (inserting, modifying, deleting), querying, and presenting data from databases for operational purposes. It encompasses all the everyday update transactions done on the operational database system. OLAP (online analytical processing) refers to querying and presenting data from data warehouses and/or data marts for analytical purposes. The difference is that OLTP is used with traditional databases for day to day use and OLAP relates to the use of the data from data warehouses and data marts.
Describe the three common features of OLAP/BI tools and explain their differences.
The three common features are slice and dice, pivot (rotate), drill down and drill up. Slice and dice operations adds, replaces, or eliminates specified dimension attributes (or certain values of the dimension attributes) from the already displayed result. Pivot operation does not change the values displayed in the original query like the slice and dice operation. Pivot simply recognizes them. The drill down operation is to make the granularity of the data in the query result finer, and the drill up is to make it coarser.
Why does an organization need a data warehouse?
To have effective performance of day to day tasks. If operational tasks had to involve computing resources which takes away from analytical queries than it slows performance for both functions. Another reason is it is nearly impossible to create a database that is suited for both operational and analytical purposes. Most data warehouses are set up and designed to serve analytical queries.
Why do we need data cleansing during the transformation part of the ELT process?
We need data cleansing during the transformation part of the ETL process because every underlying operational source doesn't necessarily contain the highest quality data. The transformation part of the ETL process is responsible for data quality control and data quality improvement.
What is the difference between a star schema and a snowflake model?
A star schema creation involves facts that may reference any number of dimension tables. The snowflake design is centralized facts which are connected to multiple dimensions.
What is the difference between an independent and dependent data mart?
An independent data mart is a stand-alone data mart, made in the same way as the data warehouse. It contains its own source systems and ETL infrastructure. A dependent data mart does not have its own source systems. Its data comes from the data warehouse. It is a way to provide users with a subset of the data from the warehouse in cases where users or applications do not want, need, or are not allowed to have access to all the data in the entire data warehouse.
What is the difference between detailed and summarized data in a data warehouse?
The difference between detailed data and a summarized data in a warehouse is that detailed data is usually reflected by operational data and it is very detailed. Detailed data records every individual transaction in reference to the data. This may also be referred to as a fine detail level. Summarized data is calculated from the detailed data. It is the total recorded data from one source such as the total duration of all customer's phone calls during a particular week.
What is the difference between alpha, beta and production release?
An alpha release involves a data warehouse and the associated front-end applications being deployed internally to the members of the development team for initial testing of its functionalities. A beta release occurs after the alpha release and involves the data warehousing system being deployed to a selected group of users to test the usability of the system. The production release is the actual deployment of a functioning data warehousing system to the end users.
What is the difference between a dimension table and a fact table?
Dimension tables contain descriptions of the business, organization or enterprise to which the subject of analysis belongs. Information can be textual and numeric. Fact tables contain measures related to the subject of analysis. The measures are numeric and are intended for mathematical computation and quantitative analysis. There are also foreign keys associating them with dimension tables in fact tables.
Describe the granularity of the fact table and explain the main difference between a line-item detailed fact table and a transaction-level detailed fact table.
The granularity of the fact table describes what is depicted by one row in the fact table. Detailed fact tables have a fine level of granularity because each record represents a single fact. Aggregated fact tables have a coarser level of granularity than detailed fact tables, as records in aggregated fact tables always represent summarizations of multiple facts. In a line-item fact table, each row represents a line item of a particular transaction. In a transaction-level detailed fact table, each row represents a particular transaction.