Chapter 8

¡Supera tus tareas y exámenes ahora con Quizwiz!

Two major factors that drive the need for data warehousing

1. A business requires an integrated, company-wide view of high-quality information: Data in operational systems are typically fragmented and inconsistent (in silos) 2. The information systems department must separate informational from operational systems to improve performance dramatically in managing company data

The need to separate operational and informational systems are based on 3 factors:

1. A data warehouse centralizes data that are scattered throughout disparate operational systems and makes them readily available for decision support applications 2. A properly designed data warehouse adds value to data by improving their quality and consistency 3. A separate data warehouse eliminates much of the contention for resources that results when informational applications are confounded with operational processing

Independent Data Mart Advantage

1. It allows for the concept of a data warehouse to be demonstrated by working on a series of small projects 2. The length of time until there is some benefit from data warehousing is reduced b/c the organization is not delayed until all data are centralized

Benefits of Real-time Data Warehousing

1. Just in time transportation 2. E-commerce --abandoned shopping cart 2. Fraud detection in credit card transactions

6 kinds of changes to a warehouse data model

1. New descriptive attributes 2. New business activity attributes 3. New classes of descriptive attributes 4. Descriptive attributes become more refined 5. Descriptive data are related to one another 6. New sources of data

Trends in organizations that encourage the need for data warehousing:

1. No single system of record 2. Multiple systems are not synchronized: Different update cycles and separate places where the same data are captured for each system 3. Organs want to analyze the activities in a balanced way: 4. Customer Relationship Management 5. Supplier Relationship Management

Independent Data Mart Disadvantage

1. Separate ETL process is developed for each data mart --> high cost, redundant data, and more processing effort 2. Data marts may not be consistent with one another b/c they are often developed with different technologies 3. There's no drill down capability or shared data repository, so limited analysis 4. Scaling costs are excessive b/c every new application that creates a separate data mart repeats all the extract and load steps

Properties of Data Warehouses

1. Subject oriented: A data warehouse is organized around key subjects (or high level entities) of the enterprise. Major subjects may include customers, patients, students, products, and time 2. Integrated: The data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources external to the organization. Holds one version of "the truth" 3. Time-variant: Data in the data warehouse contain a time dimension so that they may be used to study trends and changes. 4. Nonupdateable: Data in the data warehouse are loaded and refreshed from operational system but can't be updated by end users

Why organizations are "drowning in data but starving for information"

1. The fragmented way in which organizations have developed IS and their supporting databases 2. Most systems are developed to support operational processing, with little or no thought given to the information or analytical tools needed for decision making

Enterprise Data Warehouse

A centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications

Logical Data mart

A data mart created by a relational view of a data warehouse

Dependent Data Mart

A data mart filled exclusively from an enterprise data warehouse and its reconciled data

Data Mart

A data warehouse that is limited in scope, customized for the decision-making applications of a particular end-user group. It's contents are either obtained from independent ETL processes for an independent data mart, or are derived from the data warehouse

Star Schema

A simple database design (particularly suited to ad hoc queries) in which dimensional data (describing how data are commonly aggregated for reporting) are separated from fact or event data (describing business activities). A star schema is one version of a dimensional model. It is not suited to online transaction processing, and therefore, it is not generally used in operations systems, operational data stores, or an EDW.

Exploration Warehouse

A special version of the EDW optimized for data mining and business intelligence using advanced statistical, mathematical modeling, and visualization tools. It's a benefit of having the dependent data marts and the EDW

Operational System

A system that is used to run a business in real time, based on current data. Must process large volumes of relatively simple read/write transactions and provide fast response. Also called system of records EX: sales order processing, reservation systems..

Real-time Data Warehouse

An enterprise data warehouse that accepts near-real-time feeds of transactional data from the system of record, analyzes warehouse data, and in near-real-time relays business rules to the data warehouse and systems of record so that immediate action can be taken in response to business events

Operational Data Store

An integrated, subject-oriented, continuously updateable, current-valued (with recent history), enterprise-wide, detailed database designed to serve operational users as they do decision support processing. It doesn't contain "deep history" like the EDW. It may be fed from the database of an ERP application, and it serves as the staging area for loading data into the EDW. It may receive data immediately or with some delay from the system of records.

Informational Processing

Analysis of data or other forms of information to support decision making. It needs large 'swatches' of data from which to derive information (ex: sales of all products, over several years, from every sales region)

Corporate Information Factory (CIF)

Another name for the dependent data mart and operational data store architecture

Operational Processing (Transaction Processing)

Captures, stores, and manipulates data to support daily operations of the organization. It focuses database design on optimizing access to a small set of data related to a transaction (ex: customer, order, associated product data)

Data warehouses

Consolidate and integrate information from many internal and external sources and arrange it in a meaningful format for making accurate and timely business decisions. Data warehousing requires extracting data from existing operational systems, cleansing and transforming data for decision making, and loading them into a data warehouse (ETL).

Fact table

Contain factual or quantitative data

Derived Data

Data that have been selected, formatted, and aggregated for end-user decision support applications

Information Systems

Designed to support decision making based on historical point-in-time and prediction data. Designed for complex queries or data-mining applications. EX: sales trend analysis, customer segmentation

Reconciled Data

Detailed, current data intended to be the single, authoritative source for all decision support systems

Dimensional table

Hold descriptive data (context) about the subjects of the business. The dimensional tables are usually the source of attributes used to qualify, categorize, or summarize facts in queries, reports, or graphs. Each dimensional table has a 1:M relationship to the central fact table. Each dimension table generally has a simple PK as well as several nonkey attributes. The PK in turn is the FK in a fact table

Derived Data Layer

It is the layer with which users normally interact for their decision support applications. The source of derived data is the reconciled data, created from what can be a rather complex data process to integrate and make consistent data from many systems of record inside and outside the organization.

Status vs Event Data

Status: Most of the data stored in a database (including data warehouses). It's a summary (ex: hourly total) of transactions or events Event: Represents transactions. May be stored for a defined period, but then are deleted or archived to save storage space

Duration of the Database

The amount of history to be kept. Natural duration is 13 months or 5 calendar quarters

Grain

The level of detail in a fact table, determined by the intersection of all the components of the PL, including all foreign keys and any other PK elements. You want to use the smallest grain b/c you can't drill down past that.

Data warehousing

The process whereby organizations create and maintain data warehouses and extract meaning from and help inform decision making through the use of data in the data warehouse. It requires following proven data warehousing practices, sound project management, strong organizational commitment, and making the right technology decisions.

What triggered the development of data warehousing?

The recognition (and subsequent definition) of the fundamental differences between operational systems (aka systems of record b/c of their role in keeping the official, legal record of the organization) and informational (or decision support) systems

Transient vs Periodic Data

Transient: Transient data are data in which changes to existing records are written over previous records, thus destroying the previous data content. Records are deleted without preserving the previous contents of those records. Periodic: Periodic data are data that never physically altered or deleted once added to the store. The before and after images represent periodic data. Notice that each record contains a time stamp of when the most recent update event occurred. *Normally stored in data warehouses


Conjuntos de estudio relacionados

Government: Civil Liberties and Civil Rights

View Set

Unit 5: Agriculture, Food Production, and Rural Land Use

View Set

Life Insurance= Review Missed Questions

View Set

Microbiology Tortora/ third lecture exam

View Set

#1-31 elements of periodic table

View Set