ALL OF CHAPTER 9

Ace your homework & exams now with Quizwiz!

*Data Characteristics* The characteristics of data for a (A) are different from those of data for (B)

(A) data warehouse (B) operational databases.

Dimension table keys should be (A), which is (B), because: Three reasons (C)

(A) surrogate (B) non-intelligent and non-business related (C) - Business keys may change over time - Surrogate keys are simpler and shorter - Surrogate keys can be same length and format for all keys

*Data Characteristics - Status vs Event Data* What is an event? [SLIDE 16] - Give some examples What is status data? (B)

a database action that results from a transaction, which changes status data Ex) Create, update, delete (B) Status data is typically stored in an operational database, as a result of a transaction. The transaction will cause changes to this data over time

What is an Informational system and what is it based on?

a system designed to support decision making based on historical and prediction data for complex queries or data-mining applications

What is an operational system and what is it based on? Also called a ... (B)

a system that is used to run a business in real time, based on current data (B) a system of record

*Multivalued Dimension* Helper table is an

associative entity that implements a M:N relationship between dimension and fact tables

*Dependent data mart* The EDW (Enterprise Data Warehouse) is a

centralized, integrated data warehouse, serving as the control point and single source of all data made available to end users for decision support applications

The primary key of the fact table is typically a

composite of all its foreign keys.

Derived data is In other words, derived data is (B) Its source is (C)

data that have been selected, formatted, and aggregated for end-user decision support applications (B) information instead of raw data (C) reconciled data, or data that has been integrated and transformed from the original data sources (via the ODS).

*Dependent data mart* The operational data store (ODS) is an

integrated, subject-oriented, continuously updatable, current-valued (with recent history), enterprise-wide, detailed database designed to serve operational users in decision support processing.

*Logical data mart and real time warehouse architecture* The *logical data mart and real-time data warehouse architecture* is practical for only (A) The difficulty with this approach stems from (B)

moderate-sized data warehouses or when using high-performance data warehousing technology (B) the attempt to keep the date warehouse current, which requires more-or-less continuous processing.

*Three-Layer Data Architecture* The three-layer data architecture for data warehousing involves

operational, reconciled, and derived data

Dimension hierarchies help to

provide levels of aggregation for users wanting summary information in a data warehouse.

In Slide 22, the Fact table provides

statistics for sales broken down by product, period and store dimensions

The *size of the fact table* Depends on Number of rows in fact table = (B)

the number of dimensions and the grain of the fact table (B) product of number of possible values for each dimension associated with the fact table

What are two reasons we need data warehousing? Explain

(1) *Integrated, company-wide view of high-quality information (from disparate databases)* - For decision-making purposes, it is often necessary to provide a single, corporate view of the information (2) *Separation of operational and informational systems and data (for improved performance)* - Operational systems should not need to compete for resources with data warehouses, so these should be separated

What are the 2 *Advantages of Logical data mart* - Views can be materialized if (A)

(1) *New data marts can be created quickly* because no physical database or database technology needs to be created or acquired and no loading routines need to be written. (2) *The data marts are always up to date* because data in a view are created when the view is referenced (A) a user has a series of queries and analysis that need to work off the same instance of the data mart

Dimension tables contain (A) The dimension tables are usually the source of (B) thus, dimension data are usually (C)

(A) Descriptive information, descriptions about the subjects of the business (B) attributes used to qualify, categorize, or summarize facts in queries, reports, or graphs (C) textual and discrete.

*Variations of Star Schema* What are two characteristics of multiple fact tables?

- Can improve performance - Often used to store facts for different combinations of dimensions

What is a data mart?

A mini data warehouse that is limited in scope

What is a data warehouse?

A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making

What is one of the reasons we consider data warehouses to be time-variant

Because data warehouses and data marts record facts about dimensions over time, date and time is always a dimension table, and a date surrogate key is always one of the components of the primary key of any fact table

*Normalizing Dimension Tables* What are Multivalued dimensions? Normalization involves (B)

Facts qualified by a set of values for the same business subject (B) creating a table for an associative entity between dimensions

Why would you want a "factless fact table"?

Generally for the same reason you would want associative entities - to maintain relationships between dimension tables.

Slowly Changing Dimensions (SCD) tell us ...

How to maintain knowledge of the past

*Independent Data Mart* What are data marts? (A) *There is a separate (B) for each (B)* Why is there data access complexity in this architecture? (C)

Mini-warehouses, limited in scope (B) ETL for each independent data mart (C) Due to multiple data marts

What are two variations of the star schema?

Multiple fact tables and factless fact tables

What is one characteristic of factless fact tables? Factless fact tables are Used for: (B)

No data, only keys for associated dimensions (B) - Tracking events - Inventory coverage

*Independent Data Mart Architecture* What is the "data warehouse" considered to be in this architecture? (1) Each data mart is (A), and there is no (B) Thus, they do not (C) However, they are typically easier to build than (D)

(1) The collection of data marts (A) limited in scope (B) centralized data warehouse. (C) give the overall picture of the entire organization. (D) a full-fledged enterprise-wide data warehouse

data warehouse is what kind of system? (A) Operational databases are often called (B) whereas data warehouses are associated with (C)

(A) an informational system (B) "online transaction processing (OLTP) databases (C) "online analytical processing (OLAP)" systems.

The purpose of a data warehouse is to (A) This is different from the *purpose of a transaction-oriented database*, which is to (B) The data produced in transaction-oriented databases forms (C)

(A) assist with management decision-making. (B) support and record the day-to-day operations and transactions of a business. (C) much of the input for a data warehouse.

*Dependent data mart* In Dependent data marts, there is a (A), and the data marts are loaded from (B) The dependent data mart approach is often called a (C) Another term often used is (D)

(A) centralized enterprise-wide data warehouse (B) this enterprise DW. (C) "hub-and-scope" architecture. (D) "corporate information factory"

*Data Characteristics - Transient vs. Periodic Data* Explain transient data (A) Give an example of transient data (B) However, for a data warehouse, which is (C), (D) is important.

(A) changes to existing records are written over previous records, thus destroying the previous data content. (B) operational data (C) "time-variant" (D) maintaining historical data (*periodic data*)

*Data Characteristics - Transient vs. Periodic Data* Explain periodic data (A) How is periodic data different from transient data? (B)

(A) data is never physically altered or deleted once they have been added to the store. (B) in periodic data, old data is not removed. - Rather, it is kept in the data warehouse to maintain a historical record.

*Logical data mart and real time warehouse architecture* The notion of a *real-time data warehouse* means (A) This helps to give (B)

(A) that the source data systems, decision support services, and the data warehouse exchange data and business rules at a near-real-time pace. (B) a more current and comprehensive picture of the organization.

*Three-Layer Data Architecture Data* *Operational data* are stored in (A) *Reconciled data* are (B) *Reconciled data* are stored in (C) *Derived data* are data that (D) *Derived data* are stored in (E)

(A) the various operational systems of record throughout the organization (and sometimes in external systems) - Transient Data (B) detailed, current data intended to be the single, authoritative source for all decision support applications. (C) the EDW and an ODS (D) have been selected, formatted, and aggregated for end-user decision support applications (E) each of the data marts.

*Derived Data* What are 3 characteristics of Derived Data?

- Detailed (mostly periodic) data - Aggregate (for summary) - Distributed

*Derived Data* What are 4 objectives of Derived Data?

- Ease of use for decision support applications - Fast response to predefined user queries - Customized data for particular target audiences - Ad-hoc query support

What are 4 "Other Data Warehouse Changes?"

- New descriptive attributes - Descriptive attributes become more refined - New source of data - Descriptive data are related to one another

*Differences b/w Data Marts & Warehouses* Data marts and data warehouses play (A) As (B) are added, a (C) can be built in phases *The easiest way to do this is to ... (D)*

(A) different roles in a data warehousing environment (B) data marts (C) data warehouse (D) to follow the logical data mart and real-time data warehouse architecture.

*Derived Data* For Derived Data the most common data model is (A), which is usually implemented as a (B)

(A) dimensional model (B) star schema

*Data warehouse* What do these terms mean? (Definition of Data warehouse) Subject-oriented: (A) Integrated: (B) Time-variant: (C) Non-updatable: (D)

(A) e.g. customers, patients, students, products (B) consistent naming conventions, formats, encoding structures; from multiple data sources (C) can study trends and changes (D) read-only, periodically refreshed

*Logical data mart and real time warehouse architecture* In this data warehouse architecture, ODS and data warehouse are (A) Data marts are NOT (B), therefore it is (C)

(A) one and the same (B) separate databases, but logical views of the data warehouse (C) easier to create new data marts

*Logical data mart and real time warehouse architecture* A *logical data mart* is not a (A) Rather, it is created by a (B)

(A) physically separated database. (B) relational view of a data warehouse

A star schema is a (A) Another name for star schema is (B) The star schema is composed of (C)

(A) simple database design in which dimensional data are separated from fact or event data. (B) "dimensional model". (C) fact and dimension tables

What are the four Data warehouse architectures They all involve (B)

- Independent Data Mart - Dependent Data Mart with Operational Data Store - Logical Data Mart and Real-Time Data Warehouse - Three-Layer architecture (B) some form of extract, transform and load (ETL)

*Dependent data mart with ODS* Dependent data mart with operational data store is a Operational Data Store (ODS) provides (B) Single ETL for (C) *Dependent data marts are loaded from (D)*

three-level architecture (B) option for obtaining current data (C) enterprise data warehouse (EDW) (D) the EDW

What tables are normalized and what tables are often not? Why? non-normalized data involves (B)

fact tables are normalized, often dimension tables may not be. This enables dimension tables to be only one join away from the fact table, improving query performance (B) data duplication

Fact tables contain Give examples (B)

factual or quantitative data about a business (B) such as units sold, orders booked, etc.

Successful data warehousing requires

following proven data warehousing practices, sound project management, strong organizational commitment, and making the right technology decisions

Data warehousing is the process whereby

organizations create and maintain data warehouses and extract meaning from data and help inform decision making through the use of data in the data warehouses.

What are 5 issues with company-wide view? Give an example (B) [Slide 5]

- Inconsistent key structures - Synonyms - Free-form vs. structured fields - Inconsistent data values - Missing data (B) A university may have several departments and units. Each may have its own databases, with different versions of student information. Organizations may distribute data (for example about students) among different databases, and this could even involve duplication - On Figure 9-1 Telephone is in two databases

What are 5 Organizational Trends That Motivate For Data Warehouses?

- No single system of records - Multiple systems not synchronized - Organizational need to analyze activities in a balanced way - Customer relationship management - Supplier relationship management

*Independent Data Mart* What are 3 limitations of *independent data marts*?

- Separate ETL process for each data mart, thus redundant data and processing - Inconsistency between data marts - High cost for obtaining consistency between marts

What are 5 Essential Rules for Dimensional Modeling?

- Use atomic facts - Include a date dimension for each fact table - Enforce consistent grain - Honor hierarchies - Use surrogate keys

What are the four basic steps in building the *Independent Data Mart* architecture?

1. Data are extracted from the various internal and external source system files and databases. In a large organization, there may be dozens or even hundreds of such files and databases. 2. The data from the various source systems are transformed and integrated before being loaded into the data marts. Transactions may be sent to the source systems to correct errors discovered in data staging. The data warehouse is considered to be the collection of data marts. 3. The data warehouse is a set of physically distinct databases organized for decision support. It contains both detailed and summary data. 4. Users access the data warehouse by means of a variety of query languages and analytical tools. Results (e.g., predictions, forecasts) may be fed back to data warehouse and operational databases.

The term "Grain" or "Granularity" refers to (A) and how is it determined? Fine grain means (B)

the level of detail in a fact table, determined by the intersection of all the components of the primary key, including all foreign keys and any other primary key elements. (B) Higher volume of data: more dimension tables & more rows in fact table


Related study sets

2.4 Compare and contrast wireless networking protocols.

View Set

Ch 9: Therapies: Ways of Helping

View Set

Chapter 21: Nurse Management of Labor and Birth at Risk

View Set

Palpitations 3 yin effulgent, heart yang def

View Set

Chap 2 (part 2): Structure of Interest Rates (Chap 6: Test bank)

View Set

12.2 The Structure of DNA, Chapet 12

View Set

Chapter 29 & 30 Test (Mental Health Disorders & Delirium and Dementia)

View Set

peds chapter 51: muscular dystrophy and musculoskeletal disorders

View Set