CHAPTER 9 - Data Warehouse Implementation and Use
Alpha release:
Internal deployment of a system to the members of the development team for initial testing of its functionalities
Drill Up:
Makes the granularity of the data in the query result coarser
Drill Down:
Makes the granularity of the data in the query result finer
Data warehouse deployment:
Releasing the created and populated data warehouse and its front-end (BI) applications for use by the end-users
Pivot (Rotate):
Reorganizes the values displayed in the original query result by moving values of a dimension column from one axis to another
BASIC OLAP/BI TOOL FEATURES
Slice and Dice Pivot (Rotate) Drill Down / Drill Up
Beta release:
Deployment of a system to a selected group of users to test the usability of the system
Production release:
The actual deployment of a functioning system
Online analytical processing (OLAP):
querying and presenting data from data warehouses and/or data marts for analytical purposes
Data cleansing (scrubbing)
the detection and correction of low-quality data
ETL: Load
Loading the extracted, transformed, and quality assured data into the target data warehouse •A batch process that inserts the data into the data warehouse tables in an automatic fashion without user involvement •The initial load (first load), populates initially empty data warehouse tablesoIt can involve large amounts of data, depending on what is the desired time horizon of the data in the newly initiated data warehouse
ETL: Extraction
The retrieval of analytically useful data from the operational data sources that will eventually be loaded into the data warehouse •The data to be extracted is data that is analytically useful in the data warehouse •What to extract is determined within the requirements and modeling stages oRequirements and modeling stages of the data warehouse include the examination of the available sources oDuring the process of creation of the ETL infrastructure the data model provides a blueprint for the extraction procedures
ETL: Transformation
Transforming the structure of extracted data in order to fit the structure of the target data warehouse model •E.g. adding surrogate keys The data quality control and improvement are included in the transformation process •Commonly, some of the data in the data sources exhibit data quality problems •Data sources often contain overlapping information
OLAP/BI TOOLS
-Allow users to query fact and dimension tables by using simple point-and-click query-building applications -Based on the point-and-click actions by the user of the OLAP/BI tool, the tool writes and executes the code in the language of the data management system (e.g. SQL) that hosts the data warehouse or data mart that is being queried Require dimensional organization of underlying data for performing basic OLAP operations (slice, pivot, drill) Many OLAP/BI tools are web-based
OLAP: DRILL HIERARCHY
Set of attributes within a dimension where an attribute is related to one or more attributes at a lower level but only related to one item at a higher level •For example: StoreRegionName → StoreZip → StoreID Used for drill down/drill up operations
ETL INFRASTRUCTURE
Typically, the process of creating the ETL infrastructure includes using specialized ETL software tools and/or writing code Due to the amount of detail that has to be considered, creating ETL infrastructure is often the most time and resource consuming part in the data warehouse development process Although labor intensive, the process of creating the ETL infrastructure is essentially predetermined by the results of the requirements collection and data warehouse modeling processes which specify the sources and the target
Online transaction processing (OLTP):
updating (i.e. inserting, modifying and deleting), querying and presenting data from databases for operational purposes
Data Warehouse/Data Mart Front-End (BI) Applications:
•Provide access for indirect use
Slice and Dice:
Adds, replaces, or eliminates specified dimension attributes (or particular values of the dimension attributes) from the already displayed result
Creating a data warehouse
•Involves using the functionalities of database management software to implement the data warehouse model as a collection of physically created and mutually connected database tables •Most often, data warehouses are modeled as relational databases Consequently, they are implemented using a relational DBMS
Every subsequent load is referred to as a refresh load
•Refresh cycle is the period in which the data warehouse is reloaded with the new data (e.g. hourly, daily) -Determined in advance, based on the analytical needs of the business users of the data warehouse and the technical feasibility of the system •In active data warehouses the loads occur in micro batches that occur continuously
OLAP/BI Tools - two purposes:
•Ad-hoc direct analysis of dimensionally modeled data •Creation of front-end (BI) applications
Additional OLAP/BI Tool functionalities:
•Graphically visualizing the answers •Creating and examining calculated data •Determining comparative or relative differences •Performing exception analysis, trend analysis, forecasting, and regression analysis •Number of other analytical functionsterm-18
Executive dashboard
•Intended for use by higher level decision makers within an organization •Contains an organized easy-to-read display of a number of critically important queries describing the performance of the organization •In general, the usage of executive dashboards should require little or no effort or training •Executive dashboards can be web-based
OLAP/BI TOOLS
-Designed for analysis of dimensionally modeled data -Regardless of which data warehousing approach is chosen, the data that is accessible by the end user is typically structured as a dimensional model •OLAP/BI tools can be used on analytical data stores created with different modeling approaches