before dedupe

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

OLAP/BI TOOLS

-Allow users to query fact and dimension tables by using simple point-and-click query-building applications -Based on the point-and-click actions by the user of the OLAP/BI tool, the tool writes and executes the code in the language of the data management system (e.g. SQL) that hosts the data warehouse or data mart that is being queried Require dimensional organization of underlying data for performing basic OLAP operations (slice, pivot, drill) Many OLAP/BI tools are web-based

OLAP/BI TOOLS

-Designed for analysis of dimensionally modeled data -Regardless of which data warehousing approach is chosen, the data that is accessible by the end user is typically structured as a dimensional model •OLAP/BI tools can be used on analytical data stores created with different modeling approaches

How many entities are involved in a unary relationship? Question options: 1 2 More than 2 Between 1 and 2

1

How many entities are depicted by the following requirements? (School XYZ keeps track of its 100 student, 10 teachers and 5 classrooms) Question options: 3 4 115 116

3

Which of the following is true according to the ABC Retailer ER diagram? Question options: A brand can have many products but it does not have to have any products A brand can have many products and it has to have at least one product Each brand is associated with exactly one product A brand is not associated with any products

A brand can have many products and it has to have at least one product

Which one of the following is true according to the ABC retailer ER diagram? Question options: A product can be associated with many promotions and it has to be associated with at least one promotion A product can be associated with many promotions but it does not have to be associated with any promotions Each product is associated with exactly one promotions A product is not associted with any promotions

A product can be associated with many promotions but it does not have to be associated with any promotions

Which of the following is true according to the ABC Retailer ER diagram? Question options: A promotion can be associated with many products and it has to be associated with at least one product A promotion can be associated with many products but it does not have to be associated with any products Each promotion is associated with exactly one product A promotion is not associated with any products

A promotion can be associated with many products and it has to be associated with at least one product

Deleting an extracted row from the set of extracted rows (because an identical row was retrieved from another data source) before the data is loaded into the data warehouse, is an example of ____.

Active transformation

Slice and Dice:

Adds, replaces, or eliminates specified dimension attributes (or particular values of the dimension attributes) from the already displayed result

DBMS is software used for the following purpose ___________________. Question options: Creation of databases Insertion, storage, retrieval, update, and deletion of the data in the database Maintenance of databases All of the above

All of the above

Which of the following is an example of a daily operational procedure or task supported by an operational database? Question options: Deducting the correct amount of money from a customer's checking account upon an ATM withdrawal Issuing a correct bill to a customer who purchased an airline ticket Both A and B None of the above

Both A and B

Additional OLAP/BI Tools Functionality Note

Closer analysis of the above examples reveals that the dimensional model is essential for OLAP. The underlying data must be organized dimensionally, with a fact table in the center connected to a number of dimension tables. If the data were not organized in a dimensional way, the three basic OLAP operations could not be performed effectively or properly. Along with the ability to perform slice and dice, pivot, and drill down/drill up, many other capabilities are found in various contemporary OLAP tools. In modern OLAP/BI tools, the ability to visualize the answers in a number of graphical ways is a standard feature. In addition to the standard operations (pivot, slice and dice, drill up/drill down) and various graphical visualization capabilities, OLAP/BI tools are able to create and examine calculated data; determine comparative or relative differences; conduct exception analysis, trend analysis, forecasting, and regression analysis; and perform a number of other useful analytical functions. However, those functionalities are found in other non-OLAP applications, such as statistical tools or spreadsheet software. What truly distinguishes OLAP/BI tools from other applications is their capability to easily interact with the dimensionally modeled data marts and data warehouses and, consequently, their capability to perform OLAP functions on large amounts of data. Contemporary spreadsheet software packages include functionalities for performing basic OLAP functions on the limited amount of data they can store. Spreadsheet packages can store significantly less data than the typical amount of data stored in a data warehouse or data mart. However, some spreadsheet tools can be configured to access the data directly from dimensionally modeled data marts and data warehouses hosted by DBMS tools. The spreadsheet tools that c

Production release takes place during _______.

Deployment

Beta release:

Deployment of a system to a selected group of users to test the usability of the system

Which of the following is true according to the ABC Retailer ER diagram? Question options: A product can belong to multiple brands but it must belong to at least one brand Each product belongs to multiple brands Each product belongs to exactly one brand Each product belongs to either one brand or no brand at all

Each product belongs to exactly one brand

Which of the following is true according to the ABC Retailer ER diagram? Question options: Each promotion has a unique promotion name Each brand has a unique brand name Each product has a unique product ID All of the above

Each product has a unique product ID

EXTRACTION

Extraction refers to the retrieval of analytically useful data from the operational data sources that will eventually be loaded into the data warehouse. The issue of what to extract is dealt within the requirements and modeling stages. Requirements and modeling stages of the data warehouse include the examination of the available sources.

Archived data from source systems is never present in the data warehouse.

FALSE

The process of deciding what to extract during the extraction process begins after the requirements and modeling stages are completed.

FALSE

The process of transformation cannot involve standardizing different versions of the same data present in different data sources.

FALSE

The word online in the term OLAP refers to the use of the Internet or other networks.

FALSE

Every refresh load will always take more time than the first load.

False

The data in fact tables is always located faster in relational implementations than in the multidimensional implementations.

False

The multidimensional database cube always has three dimensions.

False

Data Warehouse/Data Mart Front-end (BI) Applications

In most cases, a portion of the intended users of a data warehouse (often a majority of the users) lack the time and/or expertise to engage in open-ended direct analysis of the data. It is not reasonable to expect everyone who needs to use the data from the data warehouse as a part of their workplace responsibilities to develop their own queries by writing the SQL code or engaging in the ad hoc use of OLAP/BI tools. Instead, many of the data warehouse and data mart end users access the data through front-end applications. Data warehouse/data mart front-end applications are also commonly referred to as "BI applications." Figure 9.19 shows a data warehousing system with front-end applications. The data warehouse/data mart front-end applications are collections of predeveloped queries organized for simple access, navigation, and use by the end user.

Which of the following is an example of analytical information? Question options: Information about the amount of money in a checking account Information showing which airline routes in the United States have the most sales Information about the cost of a product Information showing the destination and origin of a particular airline route in the United States

Information showing which airline routs in the United States have the most sales

Alpha release:

Internal deployment of a system to the members of the development team for initial testing of its functionalities

LOAD

Load refers to the process of loading the extracted, transformed, and quality assured data into the target data warehouse. The ETL infrastructure facilitates the load as a batch process that inserts the data into the data warehouse tables in an automatic fashion without user involvement. The initial load, which populates initially empty data warehouse tables, is known as the first load. The first load can involve large amounts of data, depending on the desired time horizon of the data in the newly initiated data warehouse. Every subsequent load is referred to as a refresh load. The refresh cycle refers to the frequency with which the data warehouse is reloaded with new data. The refresh cycle is determined in advance, based on the analytical needs of the business users of the data warehouse and the technical feasibility of the system. For example, a data warehouse can be refreshed once every day or once every few hours. In the case of so-called active data warehouses, the loads occur in micro batches that happen continuously, ensuring that the data in the data warehouse is updated close to real time (enabling analysis of the latest data). The data warehouse is loaded for the first time with the data from the sources and sources' archives. The data warehouse is then refreshed according to the chosen refresh cycle, with the new data from the sources. The ETL processes are facilitated by the ETL infrastructure. Typically, the process of creating the ETL infrastructure includes using specialized ETL software tools (such as Informatica or Data Stage) and/or writing code. Due to the amount of detail that has to be considered, creating ETL infrastructure is often the most time- and resource-consuming part of the data warehouse development process. Although labor intensive, the properly undertaken process of creating the

ETL: Load

Loading the extracted, transformed, and quality assured data into the target data warehouse •A batch process that inserts the data into the data warehouse tables in an automatic fashion without user involvement •The initial load (first load), populates initially empty data warehouse tablesoIt can involve large amounts of data, depending on what is the desired time horizon of the data in the newly initiated data warehouse

An associative entity is used as an alternative way of depicting: Question options: 1:M relationships 1:1 relationships M:N relationships Multivated attributes

M:N relationships

Drill Up:

Makes the granularity of the data in the query result coarser

Drill Down:

Makes the granularity of the data in the query result finer

The description of the structure and the properties of the data fits into which category? Question options: Information Data Metadata DBMS

Metadata

The following attribute can contain more than one value for each entity instance: Question options: Unique attribute Composite attribute Multivalued attribute Derived attribute

Multivalued attribute

OLAP/BI Tools Purpose

OLAP/BI tools can be used for the following two purposes: for ad hoc direct analysis of dimensionally modeled data and for the creation of front-end applications for indirect access of dimensionally modeled data. Ad hoc direct analysis occurs when a user accesses the data in the data warehouse via an OLAP/BI tool and performs such actions as pivoting, slicing, and drilling. This process is also sometimes referred to as "data interrogation," because often the answers to one query lead to new questions. For example, an analyst may create a query using an OLAP/BI tool. The answer to that query may prompt an additional question, which is then answered by pivoting, slicing, or drilling the result of the original query. In addition to ad hoc direct analysis, OLAP/BI tools are often used in the process of creating data warehouse front-end applications. The data warehouse front-end application can simply be a collection of the OLAP/BI tool queries created by OLAP/BI tool expert users. Such a collection of queries can be accompanied by a menu structure for navigation and then provided to the users of the data warehouse who do not have the access privilege, expertise, and/or the time to use OLAP/BI tools directly.

ETL: Extraction, Transformation, Load

Once a data warehouse is modeled, the collection of empty data warehouse tables is created using DBMS software, as illustrated above. The next step is to insert data into these tables. A set of processes is used to populate the data warehouse tables with the appropriate relevant data retrieved from the operational databases. This set of processes is known as ETL (extraction, transformation, and load).

Relationship attributes may be necessary in a relationship of what type? Question options: One-to-one One-to-many Zero-to-many Many-to-many

One-to-many

Which of the following is a possible type of relationship (maximum cardinality-wise)? Question options: One-to-one Zero-to-zero Zero-to-one Zero-to-many

One-to-one

Information can be used for ____________. Question options: Operational purposes only Analytical purposes only Operational and analytical purposes Neither for operational nor for analytical purposes

Operational and analytical purposes

OLAP/BI tools are:

Read-only tools

Data warehouse deployment:

Releasing the created and populated data warehouse and its front-end (BI) applications for use by the end-users

Pivot (Rotate):

Reorganizes the values displayed in the original query result by moving values of a dimension column from one axis to another

Tuple is a synonym for Question options: Column Field Row Table

Row

OLAP: DRILL HIERARCHY

Set of attributes within a dimension where an attribute is related to one or more attributes at a lower level but only related to one item at a higher level •For example: StoreRegionName → StoreZip → StoreID Used for drill down/drill up operations

BASIC OLAP/BI TOOL FEATURES

Slice and Dice Pivot (Rotate) Drill Down / Drill Up

Creating a data warehouse involves using the functionalities of database management software to implement the data warehouse model as a collection of physically created and mutually connected database tables.

TRUE

Developing front-end applications includes creating a set of predeveloped queries and interfaces to those queries.

TRUE

OLAP/BI tools are designed for the analysis of dimensionally modeled data.

TRUE

The purpose of the drill down operation is to make the granularity of the data in the query result finer.

TRUE

The purpose of the drill up operation is to make the granularity of the data in the query result coarser.

TRUE

Production release:

The actual deployment of a functioning system

OLAP/BI Tools

The data in a data warehouse can be accessed directly by the users via the language of the database management software (e.g., SQL). However, a more common method of direct use of this data is access and analysis via online analytical processing OLAP tools. OLAP tools are also known as business intelligence (BI) tools. In this book, we will refer to these tools as OLAP/BI tools to capture both naming alternatives. OLAP/BI tools are designed for the analysis of dimensionally modeled data. As we discussed in Chapter 8, regardless of which data warehousing approach is chosen, the data that is accessible by the end user is typically structured as a dimensional model. Therefore, OLAP/BI tools can be used on analytical data stores created with different modeling approaches, as illustrated in Figure 9.8. OLAP/BI tools allow users to query fact and dimension tables by using simple point-and-click query-building applications. Based on the point-and-click actions by the user of the OLAP/BI tool, the tool writes and executes the code in the language (e.g., SQL) of the database management system that hosts the data warehouse or data mart that is being queried.

Executive Dashboard

The executive dashboard, as the term suggests, is intended for use by higher-level decision makers within an organization. It contains an organized, easy-to-read display of a number of critically important queries describing the performance of the organization. In general, the use of executive dashboards requires little or no effort or training. As is the case with other front-end data warehouse applications, executive dashboards can also be Web based.

Drill Down and Drill Up

The purpose of the drill down operation is to make the granularity of the data in the query result finer; in the case of drill up, it is to make it coarser. The drill down operation allows users to drill through hierarchies within dimensions. A hierarchy is a set of attributes within a dimension where an attribute is related to one or more attributes at a lower level but only related to one item at a higher level.

ETL: Extraction

The retrieval of analytically useful data from the operational data sources that will eventually be loaded into the data warehouse •The data to be extracted is data that is analytically useful in the data warehouse •What to extract is determined within the requirements and modeling stages oRequirements and modeling stages of the data warehouse include the examination of the available sources oDuring the process of creation of the ETL infrastructure the data model provides a blueprint for the extraction procedures

Slice and Dice

The slice and dice operation adds, replaces, or eliminates specified dimension attributes (or particular values of the dimension attributes) from the already displayed result. For example, the user using the OLAP/BI tool can modify OLAP Query 1 so that only the results for stores 1 and 2 are displayed. In other words, the results showing sales in store 3 will be "sliced out" from the original query shown in Figure 9.11. The modified query is specified as follows:

Which of the following is true according to the ABC Retailer ER diagram? Question options: We cannot keep track of products that are not associated with multiple promotions We must keep track of products that are not associated with any promotions We can keep track of products that are not associated with any promotions We cannot keep track of products that are not associated with any promotions

We can keep track of products that are not associated with any promotion

Which of the following is true according to the ABC Retailer ER diagram? Question options: We cannot keep track of brands that do now have multiple products We must keep track of brands that do not have any products We can keep track of brands that do not have any products We cannot keep track of brands that do not have any products

We cannot keep track of brands that do not have any products

Which of the following is true according to the ABC Retailer ER diagram? Question options: We cannot keep track of products that belong to any brand We must keep track of products that do not belong to any brand We can keep track of products that do now belong to any brand We cannot keep track of products that do not belong to any brand

We cannot keep track of products that do not belong to any brand

Which of the following is true according to the ABC Retailer ER diagram? Question options: We cannot keep track of promotions that are not associated with multiple products We must keep track of promotions that are not associated with any products We can keep track of promotinos that are not associated with any products We cannot keep track of promotions that are not associated with any products

We cannot keep track of promotions that are not associated with any products

Passive transformations

do not affect the number of rows, and incoming and outgoing rows have same counts. An example of a passive transformation that does not change the number of rows is generating surrogate keys for dimensions. Another example of a passive transformation is creating stored derived attributes, such as DollarsSold in SALES Fact table in Figure 9.7.

If applicable, the archived data from operational sources is loaded in the data warehouse during the:

first load

Active transformations

produce a different number of rows in the output data as compared to the incoming data extracted from sources. One of the typical reasons for active transformations that change the number of rows are quality issues, such as dealing with the uniqueness problem discussed above.

Online analytical processing (OLAP):

querying and presenting data from data warehouses and/or data marts for analytical purposes

Online transaction processing (OLTP)

refers to the updating (i.e., inserting, modifying, and deleting), querying, and presenting data from databases for operational purposes. In addition to the retrievals of data from operational databases, OLTP encompasses all the everyday update transactions done on the operational database systems, such as a transaction reflecting a withdrawal from a checking account or a transaction creating an airline reservation. An often-used technical term for an operational database is an "OLTP system."

Data cleansing (scrubbing)

the detection and correction of low-quality data

Online transaction processing (OLTP):

updating (i.e. inserting, modifying and deleting), querying and presenting data from databases for operational purposes

OLAP/BI Tools - two purposes:

•Ad-hoc direct analysis of dimensionally modeled data •Creation of front-end (BI) applications

Additional OLAP/BI Tool functionalities:

•Graphically visualizing the answers •Creating and examining calculated data •Determining comparative or relative differences •Performing exception analysis, trend analysis, forecasting, and regression analysis •Number of other analytical functionsterm-18

Executive dashboard

•Intended for use by higher level decision makers within an organization •Contains an organized easy-to-read display of a number of critically important queries describing the performance of the organization •In general, the usage of executive dashboards should require little or no effort or training •Executive dashboards can be web-based

Creating a data warehouse

•Involves using the functionalities of database management software to implement the data warehouse model as a collection of physically created and mutually connected database tables •Most often, data warehouses are modeled as relational databases Consequently, they are implemented using a relational DBMS

Data Warehouse/Data Mart Front-End (BI) Applications:

•Provide access for indirect use

Every subsequent load is referred to as a refresh load

•Refresh cycle is the period in which the data warehouse is reloaded with the new data (e.g. hourly, daily) -Determined in advance, based on the analytical needs of the business users of the data warehouse and the technical feasibility of the system •In active data warehouses the loads occur in micro batches that occur continuously

A regular entity must have at least one of the following attributes: Question options: Unique attribute Composite attribute Multivalued attribute Derived attribute

Unique attribute

Pivot (Rotate)

Unlike the slice and dice operation, the pivot (rotate) operation does not change the values displayed in the original query, it simply reorganizes them.

The following attribute represents a collection of attributes: Question options: Unique attribute Composite attribute Multivalued attribute Derived attribute

Composite attribute

The following attribute represents a collection of attributes: Question options: Composite attribute Multivalued attribute Derived attribute Unique composite attribute

Composite attribute

Which of the following is a part of the database requirements step? Question options: Database deployment Logical modeling Conceptual modeling Database implementation

Conceptual Modeling

Database implementation involves using ________ to implement the database model as an actual database. Question options: Front-end applications DBMS Database system Indirect interaction

DBMS

Joe is an example of ____________. Question options: Data Metadata Data Type DBMS

Data

Which of the following is typically found in the database metadata? Question options: Employee names Data Types Student ages Product prices

Data Types

Data Warehouse Deployment

Data warehouse deployment refers to allowing the end user access to the newly created and populated data warehouse and to its front-end applications. Once the data warehousing system is deployed, the end user can start using it for the tasks that require the data from the data warehouse. Typically, the deployment does not occur all at once for the entire population of intended end users. Instead, the deployment process is more gradual. Like most other information systems, the data warehousing system often goes through a testing phase before it is fully deployed to all of its users. During the so-called alpha release, a data warehouse and the associated front-end applications are deployed internally to the members of the development team for initial testing of its functionalities. In the subsequent release (the so-called beta release), the data warehousing system is deployed to a selected group of users to test the usability of the system. The feedback from the testing phases may result in making modifications to the system prior to the actual deployment of a functioning data warehousing system. The actual deployment of a functioning data warehousing system is also known as the production release.

________ are in charge of implementing the database model as a functioning database using the DBMS software. Question options: Database analysts Database architects Database developers DBAs

Database developers

Online Analytical Processing (OLAP)

The term OLAP was devised as a contrast to another acronym, OLTP, which stands for online transaction processing. Before we explain the term OLAP, we will first briefly clarify the term OLTP. Online analytical processing (OLAP) refers to querying and presenting data from data warehouses and/or data marts for analytical purposes. While OLTP is the term used in conjunction with traditional databases for operational (day-to-day) purposes, the term OLAP relates to the use of the data from data warehouses and data marts. Another difference between OLAP and OLTP is that OLTP systems are capable of updating, querying, and presenting data, whereas OLAP tools can only engage in querying and presenting data. While OLTP systems routinely perform transactions that insert, modify, and delete data from databases, OLAP tools are read only. OLAP tools are used exclusively for the retrieval of data from analytical repositories to be used in the decision-making process. Users of OLAP tools can quickly read and interpret data that is gathered and structured specifically for analysis and subsequently make fact-based decisions. The terms OLTP and OLAP predate the internet era. The word "online," used in both of these terms, is not associated with the internet. Instead, "online" refers to a type of computer processing in which the computer responds immediately (or at least very quickly) to user requests. In today's world, we are accustomed to the fact that computers perform the processing, updating, and retrieving of data instantaneously. However, at the time the term OLTP was created, many computers still used pre-hard-drive devices, such as magnetic tapes and punch-card readers. The word "online" was meant to underscore the immediacy of the results, in which database systems used a direct access type of storage, such as a hard drive, in

OLAP/BI Tools Functionalities

There are numerous OLAP/BI tools (by vendors such as Tableau, Qlik, or Microsoft with its Power BI product) available in the marketplace today. Here, we give an overview of functionalities that are common for all OLAP/BI tools. The three basic OLAP/BI tool features regularly used by analysts are commonly referred to as: • slice and dice • pivot (rotate) • drill down and drill up

Data cleansing is a part of

Transformation

TRANSFORMATION

Transformation refers to the process of transforming the structure of extracted data in order to fit the structure of the target data warehouse model. In addition to changing the structure of the extracted data, the transformation part of the ETL process also includes data quality control and, if needed, data quality improvement. Even though it would be ideal if every underlying operational source contained the data of the highest quality, in reality it is quite common that some of the data in the data sources exhibit data quality problems, such as the ones described in Chapter 6. In those cases, the process of transformation has to enable the detection and correction of the low-quality data. This part of the transformation process is often referred to as data cleansing (scrubbing). In general, there are two major kind of transformations: active transformations and passive transformations.

ETL: Transformation

Transforming the structure of extracted data in order to fit the structure of the target data warehouse model •E.g. adding surrogate keys The data quality control and improvement are included in the transformation process •Commonly, some of the data in the data sources exhibit data quality problems •Data sources often contain overlapping information

ETL INFRASTRUCTURE

Typically, the process of creating the ETL infrastructure includes using specialized ETL software tools and/or writing code Due to the amount of detail that has to be considered, creating ETL infrastructure is often the most time and resource consuming part in the data warehouse development process Although labor intensive, the process of creating the ETL infrastructure is essentially predetermined by the results of the requirements collection and data warehouse modeling processes which specify the sources and the target


Ensembles d'études connexes

Ch 16 - Discharge Planning and Teaching

View Set

Chapter 2). Cells: The Living Units

View Set