CS 350 Chapter 9

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Conformed dimension

One or more dimension tables associated with two or more fact tables for which the dimension tables have the same business meaning and primary key with each fact table

Describe the characteristics of a surrogate key as used in a data warehouse or data mart

- business keys change, often slowly, over time, and we need to remember old and new business key values for the same business object. Surrogate keys allow us to handle changing and unknown keys with ease - Using a surrogate key also allows us to keep track of different nonkey attribute values for the same production key over time - Surrogate keys are often simpler and shorter, especially when the production key is a composite key - Surrogate keys can be of the same length and format for all keys, no matter what business dimensions are involved in the database

ETL

- extracting, transforming, and loading - an important process in which extracts, transforms, and loads data from source systems into the data warehouse - extracting happens periodically. The data warehouse doesn't need to have current data

Trends that encourage the need for data warehousing

- no single system of record - multiple systems are not synchronized - organizations want to analyze the activities in a balanced way - customer relationship management - supplier relationship management

Two types of tables in star schema

- one fact table: contains factualy or quatitative data about a business - one or more dimensional tables: hold descriptive data about the subjects of the business

List four objectives of derived data

- provide ease of use for decision support applications - provide fast response for predefined user queries or requests for information - customize data for particular target user groups - support ad hoc queries and data mining and other analytical applications

Objectives of derived data

- provide ease of use for decision support applications - provide fast response for predefined user queries or requests for information - customize data for particular target user groups - support ad hoc queries and data mining and other analytical applications

List five claimed limitations of independent data marts

- you lose flexibility for the long term and the ability to react to changing business conditions - technical limitations for the size of the data warehouse

Two major factors that drive the need for data warehousing

1. A business requires an integrated, company wide view of the high quality information 2. The information systems department must separate informational from operation systems to improve performance dramatically in managing company data

Changes to be made to a warehouse data model

1. New descriptive attributes 2. New business activity attributes 3. New classes of descriptive attributes 4. Descriptive attributes become more refined 5. Descriptive data are related to one another 6. New source of data

Two debates as to the actual value of independent data marts

1. One debate deals with the nature of the phased approach to implementing a data warehousing environment. The essence of this debate is whether each data mart should or should not evolve in a bottom-up fashion from a subset of enterprise wide decision support data 2. The other debate deals with the suitable database architecture for analytical processing. This debate centers on the extent to which a data mart database should be normalized

Three layered architecture

1. Operational data are stored in the various operational systems of record throughout the organization 2. Reconciled data are the type of data stored in the enterprise data warehouse and an operational data store. Reconciled data are detailed, current data intended to be the single, authoritative source for all decision support applications 3. Derived data are the type of data stored in each of the data marts, Derived data are data that have been selected, formatted, and aggregated for end user decision support applications

Essential rules of dimensional modeling

1. Use atomic facts 2. Create single-process fact tables 3. Include a date dimension for every fact table 4. Enforce consistent grain 5. Disallow null keys in fact table 6. Honor hierarchies 7. Decode dimension tables 8. Use surrogate keys 9. Conform dimensions 10. Balance requirements with actual data

Factors the lead to the need to separate operational and informational systems

1. a data warehouse centralizes data that are scattered throughout disparate operational systems and makes them readily available for decision support applications 2. A properly designed data warehouse adds value to data by improving their quality and consistency 3. A separate data warehouse eliminates much of the contention for resources that results when informational applications are cofounded with operational processing

Three types of metadata

1. operational metadata: describe the data in the various operational systems that feed the enterprise data warehouse. Operational metadata typically exist in a number of different formats and unfortunately are often of poor quality 2. Enterprise data warehouse metadata: are derived from the enterprise data model. EDW metadata describe the reconciled data layer as well as the rules for extracting, transforming, and loading operational data into reconciled data 3. Data mart metadata: describe the derived data layer and the rules for transforming reconciled data to derived data

Enterprise data warehouse (EDW)

A centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications

Logical data mart

A data mart created by a relational view of a data warehouse. They are not physically separate databases but rather different relational views of one physical, slightly denormalized relational data warehouse

Dependent data mart

A data mart filled exclusively from an enterprise data warehouse and its reconciled data

Independent data mart

A data mart filled with data extracted from the operational environment, without the benefit of a data warehouse

Subject-oriented

A data warehouse is organized around the key of subjects of the enterprise. Major subjects may include customers, patients, students, products, and time

Data mart

A data warehouse that is limited in scope, whose data are obtained by selecting and summarizing data from a data warehouse or from separate extract, transform, and load processes from source data systemd

Star schema

A simple database design in which dimensional data are separated form fact or even data. A dimensional model is another name for a star schema

Data warehouse

A subject-oriented, integrated, time-variant, nonupdateable collection of data used in support of management decision-making processes

Informational system

A system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications

Operational system

A system that is used to run a business in real time, based on current data

Real-time data warehouse

An enterprise data warehouse that accepts near-real-time feeds or transactional data from the systems of record, analyzes warehouse data, and in near-real-time relays business rules to the data warehouse and systems of record so that immediate action can be taken in response to business events

Snowflake schema

An expanded version of a star schema in which dimension tables are normalized into several related tables

Operational data store (ODS)

An integrated, subject-oriented, continuously updateable, current-valued enterprise-wide, detailed database designed to serve operational users as they do decision support processing

Data warehoues vs data mart

Data Warehouse: Scope: - application independent - centralized, possibly enterprise-wide - planned Data: - historical, detailed, and summarized - lightly denormalized Subjects: - multiple subjects Other characteristics: - flexible - data oriented - long file - large - single complex structure Data Mart: Scope: - specific DSS application - Decentralized by user area - Organic, possibly not planned Data: - some history, detailed, and summarized - highly denormalized Subjects: - one central subject of concern to users Sources: - few internal and external sources Other characteristics: - restrictive - project oriented - short file - starts small, becomes large - multi, semi complex structures, together complex

Nonupdateable

Data in the data warehouse are loaded and refreshed from operational systems but cannot be updated by end users

Time-variant

Data in the data warehouse contain a time dimension so that they may be used to study trends and changes

Transient data

Data in which changes to existing records are written over previous records, thus destroying the previous data content

Periodic data

Data that are never physically altered or deleted once they have been added to the store

Derived data

Data that have been selected, formatted, and aggregated for end-user decision support application

Reconciled data

Detailed, current data intended to be the single, authoritative source for all decision support applications

Operational vs. Informational Systems

Operational: - Primary purpose: run the business on a current basis - Type of data: current representation of state of the business - Primary users: clerks, salespersons, administrators - Scope of usage: narrow, planned, and simple updates and queries - Design goal: performance - Volume: many constant updates and queries on one or a few table rows Informational: - Primary purpose: support managerial decision making - Type of data: historical point-in-time and predictions - Primary users: managers, business analysts, customers - Scope of usage: broad, ad hoc, complex queries and analysis - Design goal: ease of flexible access and use - Volume: periodic batch updates and queries requiring many or all rows

Explain the pros and cons of logical data marts

Pros: a. New data marts can be created quickly because no physical database or database technology needs to be acquired or created. Also, loading routines do not need to be written. b. Data marts are always up-to-date because data in a view are created when the view is referenced. Views can be materialized. Con: Logical data marts are only practical for moderate-sized data warehouses or when high performance data warehousing technology is used.

List four characteristics of a data warehouse

Subject-oriented, integrated, time-variant, and nonupdateable

Integrated

The data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources of external to the organization

In what ways are dimension tables often not normalized?

The fact table plays the role of a normalized n-ary associative entity that links the instances of the various dimensions, which are in second, but possibly not thid, normal form. Most experts view this as acceptable because dimensions are not updated and avoid costly joins

Grain

The level of detail in a fact table, determined by the intersection of all the components of the primary key, including all foreign keys and any other primary key elements

Briefly describe the major components of data warehouse architecture

The major components include: source data systems, data staging area, data and metadata storage area, and end-user presentation tools

Data warehousing

The process whereby organizations create and maintain data warehouses and extract meaning from and help inform decision making through the use of data in the data warehouses

Differences between transaction and reporting/analytic data warehousing

Transaction = A discrete unit of work that must be completely processed or not processed at all - May involve multiple updates - If any update fails, then all other updates must be cancelled - Transaction properties - Atomicity (works in its entirety or not at all) - Consistency (leaving the database in a consistent state) - Isolation (clearly separate from others) Durability (results kept permanently)

Is star schema a relational model? Explain why or why not

Yes because each dimension table has a one to many relationship with the central fact table


Kaugnay na mga set ng pag-aaral

2000 SAT Words with tumblrs and pictures

View Set

David Goggins on Impact Theory 1-30

View Set

QTR #1 EXAM #2 ( CHAPTERS 5,6,7,10) 50 QUESTIONS

View Set

6_International Human Resource Management

View Set