INFS CH 9

¡Supera tus tareas y exámenes ahora con Quizwiz!

EXTRACTION

Extraction refers to the retrieval of analytically useful data from the operational data sources that will eventually be loaded into the data warehouse. The issue of what to extract is dealt within the requirements and modeling stages. Requirements and modeling stages of the data warehouse include the examination of the available sources.

Active transformations

produce a different number of rows in the output data as compared to the incoming data extracted from sources. One of the typical reasons for active transformations that change the number of rows are quality issues, such as dealing with the uniqueness problem discussed above.

Online transaction processing (OLTP)

refers to the updating (i.e., inserting, modifying, and deleting), querying, and presenting data from databases for operational purposes. In addition to the retrievals of data from operational databases, OLTP encompasses all the everyday update transactions done on the operational database systems, such as a transaction reflecting a withdrawal from a checking account or a transaction creating an airline reservation. An often-used technical term for an operational database is an "OLTP system."

Pivot (Rotate)

Unlike the slice and dice operation, the pivot (rotate) operation does not change the values displayed in the original query, it simply reorganizes them.

OLAP/BI Tools

The data in a data warehouse can be accessed directly by the users via the language of the database management software (e.g., SQL). However, a more common method of direct use of this data is access and analysis via online analytical processing OLAP tools. OLAP tools are also known as business intelligence (BI) tools. In this book, we will refer to these tools as OLAP/BI tools to capture both naming alternatives. OLAP/BI tools are designed for the analysis of dimensionally modeled data. As we discussed in Chapter 8, regardless of which data warehousing approach is chosen, the data that is accessible by the end user is typically structured as a dimensional model. Therefore, OLAP/BI tools can be used on analytical data stores created with different modeling approaches, as illustrated in Figure 9.8. OLAP/BI tools allow users to query fact and dimension tables by using simple point-and-click query-building applications. Based on the point-and-click actions by the user of the OLAP/BI tool, the tool writes and executes the code in the language (e.g., SQL) of the database management system that hosts the data warehouse or data mart that is being queried.

Executive Dashboard

The executive dashboard, as the term suggests, is intended for use by higher-level decision makers within an organization. It contains an organized, easy-to-read display of a number of critically important queries describing the performance of the organization. In general, the use of executive dashboards requires little or no effort or training. As is the case with other front-end data warehouse applications, executive dashboards can also be Web based.

Drill Down and Drill Up

The purpose of the drill down operation is to make the granularity of the data in the query result finer; in the case of drill up, it is to make it coarser. The drill down operation allows users to drill through hierarchies within dimensions. A hierarchy is a set of attributes within a dimension where an attribute is related to one or more attributes at a lower level but only related to one item at a higher level.

TRANSFORMATION

Transformation refers to the process of transforming the structure of extracted data in order to fit the structure of the target data warehouse model. In addition to changing the structure of the extracted data, the transformation part of the ETL process also includes data quality control and, if needed, data quality improvement. Even though it would be ideal if every underlying operational source contained the data of the highest quality, in reality it is quite common that some of the data in the data sources exhibit data quality problems, such as the ones described in Chapter 6. In those cases, the process of transformation has to enable the detection and correction of the low-quality data. This part of the transformation process is often referred to as data cleansing (scrubbing). In general, there are two major kind of transformations: active transformations and passive transformations.

Passive transformations

do not affect the number of rows, and incoming and outgoing rows have same counts. An example of a passive transformation that does not change the number of rows is generating surrogate keys for dimensions. Another example of a passive transformation is creating stored derived attributes, such as DollarsSold in SALES Fact table in Figure 9.7.

Slice and Dice

The slice and dice operation adds, replaces, or eliminates specified dimension attributes (or particular values of the dimension attributes) from the already displayed result. For example, the user using the OLAP/BI tool can modify OLAP Query 1 so that only the results for stores 1 and 2 are displayed. In other words, the results showing sales in store 3 will be "sliced out" from the original query shown in Figure 9.11. The modified query is specified as follows:

Additional OLAP/BI Tools Functionality Note

Closer analysis of the above examples reveals that the dimensional model is essential for OLAP. The underlying data must be organized dimensionally, with a fact table in the center connected to a number of dimension tables. If the data were not organized in a dimensional way, the three basic OLAP operations could not be performed effectively or properly. Along with the ability to perform slice and dice, pivot, and drill down/drill up, many other capabilities are found in various contemporary OLAP tools. In modern OLAP/BI tools, the ability to visualize the answers in a number of graphical ways is a standard feature. In addition to the standard operations (pivot, slice and dice, drill up/drill down) and various graphical visualization capabilities, OLAP/BI tools are able to create and examine calculated data; determine comparative or relative differences; conduct exception analysis, trend analysis, forecasting, and regression analysis; and perform a number of other useful analytical functions. However, those functionalities are found in other non-OLAP applications, such as statistical tools or spreadsheet software. What truly distinguishes OLAP/BI tools from other applications is their capability to easily interact with the dimensionally modeled data marts and data warehouses and, consequently, their capability to perform OLAP functions on large amounts of data. Contemporary spreadsheet software packages include functionalities for performing basic OLAP functions on the limited amount of data they can store. Spreadsheet packages can store significantly less data than the typical amount of data stored in a data warehouse or data mart. However, some spreadsheet tools can be configured to access the data directly from dimensionally modeled data marts and data warehouses hosted by DBMS tools. The spreadsheet tools that can be configured in such a fashion can be effectively considered OLAP/BI tools. Many existing OLAP/BI tools are Web based, which eliminates the need for installing software at the end user's workstation. Instead, the query construction space, such as the one shown in Figure 9.9, is provided to the end user via a Web browser. The OLAP/BI tools can be based on several different architectural approaches. At the end of this chapter, we give a brief note outlining the most common OLAP/BI tools architectures.

OLAP/BI Tools Purpose

OLAP/BI tools can be used for the following two purposes: for ad hoc direct analysis of dimensionally modeled data and for the creation of front-end applications for indirect access of dimensionally modeled data. Ad hoc direct analysis occurs when a user accesses the data in the data warehouse via an OLAP/BI tool and performs such actions as pivoting, slicing, and drilling. This process is also sometimes referred to as "data interrogation," because often the answers to one query lead to new questions. For example, an analyst may create a query using an OLAP/BI tool. The answer to that query may prompt an additional question, which is then answered by pivoting, slicing, or drilling the result of the original query. In addition to ad hoc direct analysis, OLAP/BI tools are often used in the process of creating data warehouse front-end applications. The data warehouse front-end application can simply be a collection of the OLAP/BI tool queries created by OLAP/BI tool expert users. Such a collection of queries can be accompanied by a menu structure for navigation and then provided to the users of the data warehouse who do not have the access privilege, expertise, and/or the time to use OLAP/BI tools directly.

ETL: Extraction, Transformation, Load

Once a data warehouse is modeled, the collection of empty data warehouse tables is created using DBMS software, as illustrated above. The next step is to insert data into these tables. A set of processes is used to populate the data warehouse tables with the appropriate relevant data retrieved from the operational databases. This set of processes is known as ETL (extraction, transformation, and load).

Data Warehouse/Data Mart Front-end (BI) Applications

In most cases, a portion of the intended users of a data warehouse (often a majority of the users) lack the time and/or expertise to engage in open-ended direct analysis of the data. It is not reasonable to expect everyone who needs to use the data from the data warehouse as a part of their workplace responsibilities to develop their own queries by writing the SQL code or engaging in the ad hoc use of OLAP/BI tools. Instead, many of the data warehouse and data mart end users access the data through front-end applications. Data warehouse/data mart front-end applications are also commonly referred to as "BI applications." Figure 9.19 shows a data warehousing system with front-end applications. The data warehouse/data mart front-end applications are collections of predeveloped queries organized for simple access, navigation, and use by the end user.

OLAP/BI Tools Functionalities

There are numerous OLAP/BI tools (by vendors such as Tableau, Qlik, or Microsoft with its Power BI product) available in the marketplace today. Here, we give an overview of functionalities that are common for all OLAP/BI tools. The three basic OLAP/BI tool features regularly used by analysts are commonly referred to as: • slice and dice • pivot (rotate) • drill down and drill up

Data Warehouse Deployment

Data warehouse deployment refers to allowing the end user access to the newly created and populated data warehouse and to its front-end applications. Once the data warehousing system is deployed, the end user can start using it for the tasks that require the data from the data warehouse. Typically, the deployment does not occur all at once for the entire population of intended end users. Instead, the deployment process is more gradual. Like most other information systems, the data warehousing system often goes through a testing phase before it is fully deployed to all of its users. During the so-called alpha release, a data warehouse and the associated front-end applications are deployed internally to the members of the development team for initial testing of its functionalities. In the subsequent release (the so-called beta release), the data warehousing system is deployed to a selected group of users to test the usability of the system. The feedback from the testing phases may result in making modifications to the system prior to the actual deployment of a functioning data warehousing system. The actual deployment of a functioning data warehousing system is also known as the production release.

LOAD

Load refers to the process of loading the extracted, transformed, and quality assured data into the target data warehouse. The ETL infrastructure facilitates the load as a batch process that inserts the data into the data warehouse tables in an automatic fashion without user involvement. The initial load, which populates initially empty data warehouse tables, is known as the first load. The first load can involve large amounts of data, depending on the desired time horizon of the data in the newly initiated data warehouse. Every subsequent load is referred to as a refresh load. The refresh cycle refers to the frequency with which the data warehouse is reloaded with new data. The refresh cycle is determined in advance, based on the analytical needs of the business users of the data warehouse and the technical feasibility of the system. For example, a data warehouse can be refreshed once every day or once every few hours. In the case of so-called active data warehouses, the loads occur in micro batches that happen continuously, ensuring that the data in the data warehouse is updated close to real time (enabling analysis of the latest data). The data warehouse is loaded for the first time with the data from the sources and sources' archives. The data warehouse is then refreshed according to the chosen refresh cycle, with the new data from the sources. The ETL processes are facilitated by the ETL infrastructure. Typically, the process of creating the ETL infrastructure includes using specialized ETL software tools (such as Informatica or Data Stage) and/or writing code. Due to the amount of detail that has to be considered, creating ETL infrastructure is often the most time- and resource-consuming part of the data warehouse development process. Although labor intensive, the properly undertaken process of creating the ETL infrastructure is essentially predetermined by the results of the requirements collection and data warehouse modeling processes that specify the sources and the target.

Online Analytical Processing (OLAP)

The term OLAP was devised as a contrast to another acronym, OLTP, which stands for online transaction processing. Before we explain the term OLAP, we will first briefly clarify the term OLTP. Online analytical processing (OLAP) refers to querying and presenting data from data warehouses and/or data marts for analytical purposes. While OLTP is the term used in conjunction with traditional databases for operational (day-to-day) purposes, the term OLAP relates to the use of the data from data warehouses and data marts. Another difference between OLAP and OLTP is that OLTP systems are capable of updating, querying, and presenting data, whereas OLAP tools can only engage in querying and presenting data. While OLTP systems routinely perform transactions that insert, modify, and delete data from databases, OLAP tools are read only. OLAP tools are used exclusively for the retrieval of data from analytical repositories to be used in the decision-making process. Users of OLAP tools can quickly read and interpret data that is gathered and structured specifically for analysis and subsequently make fact-based decisions. The terms OLTP and OLAP predate the internet era. The word "online," used in both of these terms, is not associated with the internet. Instead, "online" refers to a type of computer processing in which the computer responds immediately (or at least very quickly) to user requests. In today's world, we are accustomed to the fact that computers perform the processing, updating, and retrieving of data instantaneously. However, at the time the term OLTP was created, many computers still used pre-hard-drive devices, such as magnetic tapes and punch-card readers. The word "online" was meant to underscore the immediacy of the results, in which database systems used a direct access type of storage, such as a hard drive, instead of a sequential (slow) access storage device, such as a magnetic tape.


Conjuntos de estudio relacionados

MONTGOMERY BUS BOYCOTT 1955-1956

View Set

Chapter 20 Immunologic Emergencies

View Set

Chapter 28: Care of the High-Risk Mother, Newborn, and Family with Special Needs

View Set

Chapter 7 Quiz - Long-term Memory: Encoding, Retrieval, & Consolidation

View Set

chapter 1 principles of marketing

View Set

Chapter 3 - Canvas Quiz Answers (plus written questions)

View Set