CIS 3050: Chapter 9

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Static extract

A method of capturing a snapshot of the required source data at a point in time.

Nonupdateable

Data in the DW are loaded and refreshed from operational systems but cannot be updated by end users

Time Variant

Data in the Data Warehouse are carefully associated with a specific period of time so that they may be used to study trends and changes

Transient Data

Data in which changes to existing records are written over previous records, thus destroying the previous data content

Surrogate Key

Every key used to join the fact table with a dimension table should be a surrogate key. Like Product Code or Store Code. WHY? -because business keys always change -allows us to keep track of different nonkey attributes for the same production key over time -often simpler and shorter 0can be of same length and format for all keys despite business dimensions Surrogate keys are non intelligent, system assigned, not natural.

Logical Data Mart and Real Time Warehouse Architecture

I) has logical Data marts II) Data are moved from the DW rather than to a separate staging area in order to utilize the high performance computing power of the warehouse technology to perform the cleansing and transformation steps III) New data marts can be created quickly IV) Data marts are always up to date because data in a view are created when the view is referenced

Conformed dimension

One or more dimension tables associated with two or more fact tables for which the dimension tables have the same business meaning and primary key with each fact table.

application integration

achieved by coordinating the flow of event information between business applications

user interaction integration

achieved by creating fewer user interfaces that feed different data system

business process integration

achieved by tighter coordination of activities across business process so that applications can be shared and more application integration can occur

Derived Data

data that has been selected, formatted, and aggregated for end-user decision support applications

characteristics of data after the ETL

detailed, historical, normalized, comprehensive , timely, quality controlled the goal is to provide a single, authoritative source for data that support decision making

Corporate Information Factory

has both dependent data mart and operational data store. Considered a comprehensive view of organizational data in support of all user data requirements

3 layer data architecture

1) Derived data [Data mart metadata] 2) Reconciled data [EDW metadata] 3)Operational data [Operational metadata[ all come from the enterprise data model

Independent Data Mart Structure

1: data are extracted from the various internal and external source system files and databases. In a large organization, there may be dozens or even hundreds of such files and databases 2: The data from the various source systems are transformed and integrated before being loaded into the data marts. Transactions may be sent to the source systems to correct errors discovered in data staging. The dW is considered as the collection of data marts 3: The data warehouse is a set of physically distinct databases organized for decision support 4: Users access the DW by means of variety of query languages and analytical tools. Results may be fed back to the data warehouse and operational databases

10 rules of dimensional modeling

1: use atomic facts 2: create single process fact tables 3: include a date dimension for every fact table 4: enforce consistent grain 5: disallow null keys in fact table 6: honor hierarchies 7: decode dimension tables 8: use surrogate keys 9: conform dimensions 10: balance requirements with actual date

Enterprise Data Warehouse

A centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications

Dependent Data Mart

A data mart filled exclusively from an enterprise data warehouse and its reconciled data

Independent Data Mart

A data mart filled with data extracted from the operational environment without the benefit of a data warehouse

Logical Data Mart

A data mart that is created by a relational view of a data warehouse

Incremental extract

A method of capturing only the changes that have occurred in the source data since the last capture

Data scrubbing

A process of using pattern recognition and other artificial intelligence techniques to upgrade the quality of raw data before transforming and moving the data to the data warehouse. Also called data cleansing.

Star Schema

A simple Database Design in which dimensional data are separated from fact or event data. A dimensional model is another name for a star schema . Contains fact table and one or more dimensional tables Product, Period, Store---> SALES

Informational Systems

A system designed to support decision making based on historical point in time and prediction data for complex queries or data mining applications

Operational Systems

A system that is used to run a business in real time, based on current data. Also called a system of record

Data Federation

A technique for data integration that provides a virtual view of integrated data without actually creating one centralized database when an application wants data, a federation engine retrieves relevant data from the actual sources in real time and sends the result to the requesting application . advantage is to access current data, no delay, hides the intricacies of other applications and the way data is stored in the from a given query or application

Changed Data Capture

A technique that indicates which data have been changed since the last data integration activity

Update Mode

An approach to filling a data warehouse in which only changes in the source data are written to the data warehouse

Refresh Mode

An approach to filling a data warehouse that involves bulk rewriting of the target data at periodic intervals.

Real-time data warehouse

An enterprise data warehouse that accepts near real time feeds of transactional data from the systems of record, analyzes, warehouse data, and in near real time relays business rules to the DW and systems of record so that immediate action can be taken in response to business events need for EXCHANGE of data at a NEAR real time pace because there is a need for action to a current part of the org. to know what is happening to make desirable things happen through the operational systems

Snowflake Schema

An expanded version of a star schema in which dimension tables are normalized into several related tables.

Operational Data Store

An integrated, subject oriented and continuously updatable current valued (with recent history) enterprise wide, detailed database designed to serve operational users as they do decision support processing. Integrated source for all the operational data

Periodic Data

Data that are never physically altered or deleted once they have been added to the store

Reconciled Data

Detailed, current data intended to be the single, authoritative source for all decision support applications.

data propagation

Duplicate data across databases, with near real-time delay. data are pushed to duplicate sites as updates occur. these updates can be synchronous or asynchronous which decouples the updates to the remote copies. Advantage is the near real time cascading of data changes throughout the organization.

Dimension Table

Holds descriptive data about the subjects of the business. usually the source of attributes used to qualify or categorize or summarize facts in graphs or queries. each dimension table has a one to many relationship wo the central fact table

Need for Data Warehousing

I) A business requires an integrated, company-wide view of high quality information II) The information systems department must separate informational from operational systems to improve performance dramatically in managing company data

Benefit of real time data warehouse

I) Capture customer data at the time of a business event II) Analyze customer behavior and predict customer responses to possible actions III) Develop rules for optimizing customer interactions IV) Take immediate actions with customers at touch points on best responses to customers as determined by decision rules in order to make desirable results happen

Need to Separate informational and operational systems

I) Informational Systems centralize data that are scattered throughout disparate operational system and makes them readily available for analytical applications II) A properly designed set of informational systems adds value to data by improving their quality and consistency III) A separate set of informational systems eliminates much of the contention for resources when informational applications are confounded with operational processing

Three Layer Data Architecture

I) Operational Data are stored in the various operational systems of record throughout the organization II) Reconciled data are the type of data stored in the enterprise data warehouse and an operational data store III) Derived data are the type of data stored in data marts

Grain of the Fact

Level of detail in a fact table, determined by the intersection of all the components of the primary key, including all foreign keys and any other primary key elements. A common grain would be each business transaction

data transformation

Process of changing the data from their original form to a format suitable for performing a data analysis addressing research objectives. (format)

Data Warehousing

Process whereby organizations create and maintain data warehouses and extract meaning from and help inform decision making through the use of data in the data warehouses.

Integrated

The data housed in data warehouses are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources external to the organization. This means that the data warehouse holds the one version of the truth

Joining

The process of combining data from various sources into a single table or view

Selection

The process of partitioning data according to predefined criteria

Aggregation

The process of transforming data from a detailed level to a summarized level

Why do Businesses need Data warehousing

a business requires an integrated, company wide view of high quality info the IS department must separate informational from operational systems to improve performance dramatically in managing company data

Subject Oriented

a data warehouse is organized around key subjects or high level entities of the enterprise such as customers, patients, students, products, organizational units, and time

Data warehouse

a subject oriented, integrated, time variant, nonupdatable collection of data used in support of management decision making processes and business intelligence

Fact Table

contain factual or quantitative data. measurements that are numerical , continually valued, and additive such as units of books sold

Data Mart

is a data warehouse that is limited in scope whose data are obtained by selecting and summarizing data from a DW or from separate extract, transform, and load processes from source data systems

Independent Data mart downsides

often created for small short term business lose flexibility if want to upgrade larger scale and react to business changes due to the complexity of users having to access data in separate data marts A separate ETL process is developed for each data mart, yields redundancy. Inconsistency across different marts no capability to drill down into greater detail or have a shared repository

Dependent Data mart and operational data store

operational data store --> enterprise data warehouse [dependent data marts both logical and physical]

Logical Data Mart and Real Time Warehouse Architecture

real time data warehouse ---> data marts

Data Consolidation

summarizing data from a set of similar cell ranges

ETL Process

the goal is to provide a single, authoritative source for data that support decision making.

data integration

the integration of data from multiple sources, which provides a unified view of all data


Ensembles d'études connexes

1.10 Compare and contrast types of display devices and their features

View Set

PLessy v Ferguson, Crash Course episode 23, US HIS Industrialization

View Set

Math and Graphing Assessment with Tutorials

View Set

Quiz: Transferring a Patient From the Bed to a Chair

View Set

Lesson 3 - Budgeting (Senior Social Studies)

View Set