MIS 430 Quiz 2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Due to the amount of details that have to be considered, ___________ is often the most time- and resource-consuming part of the data warehouse development process

creating ETL infrastructure

A data warehouse is created

within an organization as a separate data store whose primary purpose is data analysis.

Structured repository

The data warehouse is a database containing analytically useful information Any database is a structured repository with its structure represented in its metadata

ETL includes the following tasks:

Extracting analytically useful data from the operational data sources Transforming such data so that it conforms to the structure of the subject-oriented target data warehouse model (while ensuring the quality of the transformed data) Loading the transformed and quality assured data into the target data warehouse

Data warehouse components

Source systems Extraction-transformation-load (ETL) infrastructure Data warehouse Front-end applications

Historical

The term historical refers to the larger time horizon in the data warehouse than in the operational databases

Operational Data

Data Makeup Differences --Typical Time-Horizon: Days/Months --Detailed --Current Technical Differences --Small Amounts used in a Process --High frequency of Access --Can be Updated --Non-Redundant Functional Differences --Used by all types of employees for tactical purposes --Application Oriented

Analytical Data

Data Makeup Differences --Typical Time-Horizon: Years --Summarized (and/or Detailed) --Values over time (Snapshots) Technical Differences --Large Amounts used in a Process --Low/Modest frequency of Access --Read (and Append) Only --Redundancy not an Issue Functional Differences --Used by a narrower set of users for decision making --Subject Oriented

Two main reasons for the creation of a data warehouse as a separate analytical database

The performance of operational day-to-day tasks involving data use can be severely diminished if such tasks have to compete for computing resources with analytical queries It is often impossible to structure a database which can be used in an efficient manner for both operational and analytical purposes

Dimensional modeling

A data design methodology used for designing subject-oriented analytical databases, such as data warehouses or data marts Commonly, dimensional modeling is employed as a relational data modeling technique

Data mart

A data store based on the same principles as a data warehouse, but with a more limited scope

Retrieval of analytical information

A data warehouse is developed for the retrieval of analytical information, and it is not meant for direct data entry by the users. The only functionality available to the users of the data warehouse is retrieval The data in the data warehouse is not subject to changes. The data in the data warehouse is referred to as non-volatile, static, or read-only

Detailed and/or summarized data

A data warehouse, depending on its purpose, may include the detailed data or summary data or both A data warehouse that contains the data at the finest level of detail is the most powerful

Dimension tables (dimensions)

Contain descriptions of the business, organization, or enterprise to which the subject of analysis belongs Columns in dimension tables contain descriptive information that is often textual (e.g., product brand, product color, customer gender, customer education level), but can also be numeric (e.g., product weight, customer income level) This information provides a basis for analysis of the subject

Fact tables

Contain measures related to the subject of analysis and the foreign keys (associating fact tables with dimension tables) The measures in the fact tables are typically numeric and are intended for mathematical computation and quantitative analysis

Creating ETL infrastructure

Creating necessary procedures and code for: --Automatic extraction of relevant data from the operational data sources --Transformation of the extracted data, so that its quality is assured and its structure conforms to the structure of the modeled and implemented data warehouse --The seamless load of the transformed data into the data warehouse

In addition to using the regular relational concepts (primary keys, foreign keys, integrity constraints, etc.) dimensional modeling distinguishes two types of tables:

Dimensions Facts

Dependent data mart

Does not have its own source systems The data comes from the data warehouse

Time variant

The term time variant refers to the fact that a data warehouse contains slices or snapshots of data from different periods of time across its time horizon With the data slices, the user can create reports for various periods of time within the time horizon

The data warehouse is

a structured repository of integrated, subject-oriented, enterprise-wide, historical, and time-variant data.

Source systems

In the context of data warehousing, source systems are operational databases and other operational data repositories (in other words, any sets of data used for operational purposes) that provide analytically useful information for the data warehouse's subjects of analysis Every operational data store that is used as a source system for the data warehouse has two purposes: --The original operational purpose --As a source system for the data warehouse Source systems can include external data sources

Independent data mart

Stand-alone data mart, created in the same fashion as the data warehouse Independent data mart has its own source systems and ETL infrastructure

Integrated

The data warehouse integrates the analytically useful data from the various operational databases (and possibly other sources) Integration refers to this process of bringing the data from multiple data sources into a singular data warehouse.

Data warehouse

The data warehouse is sometimes referred to as the target system, to indicate the fact that it is a destination for the data from the source systems A typical data warehouse periodically retrieves selected analytically useful data from the operational data sources

ETL infrastructure

The infrastructure that facilitates the retrieval of data from operational databases into the data warehouses

Enterprise-wide

The term enterprise-wide refers to the fact that the data warehouse provides an organization-wide view of the analytically useful information it contains

Subject-oriented

The term subject-oriented refers to the fundamental difference in the purpose of an operational database system and a data warehouse. An operational database system is developed in order to support a specific business operation A data warehouse is developed to analyze specific business subject areas

A typical organization

maintains and utilizes a number of operational data sources.

The operational data sources include

the databases and other data repositories which are used to support the organization's day-to-day operations

Analytical information -

the information collected and used in support of analytical tasks Analytical information is based on operational (transactional) information

Operational information (transactional information) -

the information collected and used in support of day to day operational needs in businesses and other organizations

The purpose of the data warehouse is

the retrieval of analytical information. A data warehouse can store detailed and/or summarized data.


Set pelajaran terkait

Identify the research method used and outline two characteristics of the method and Suggest an alternative or additional research method giving one reason for your choice.

View Set

Property and Casualty: Terms and Concepts

View Set

Chapter 24 - The Digestive System TB

View Set

Week 2: Check Your Understanding

View Set