DATA WAREHOUSING

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Pivot (rotate)

Rotate data axes to provide alternate visual representation

Slice

Select on one dimension of data cube -> subcube

Dice

Select on two or more dimension of data cube -> subcube

What is a Data Warehouse?

a system used for reporting and data analysis, integrating data from one or more disparate sources and creating a central repository of data, a data warehouse (DW). • stores current and historical data and is used for creating trending reports for senior management reporting, such as annual and quarterly comparisons. • the data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc.)

Roll-up

aggregate data further • Eliminate a dimension (e.g., eliminate color) OR • Climb up a concept hierarchy (e.g., go from months to quarters)

Data transformation

convert data from legacy or host format to warehouse format

Data cleaning

detect errors in the data and rectify them when possible

Data extraction

get data from multiple, heterogeneous, and external sources

Algebraic

if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function • E.g., avg(), min_N(), standard_deviation()

Distributive

if the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning • E.g., count(), sum(), min(), max()

Holistic

if there is no constant bound on the storage size needed to describe a sub-aggregate. • E.g., median(), mode(), rank()

Refresh

propagate the updates from the data sources to the warehouse

Drill-down

reverse of roll-up - provide more details • Introduce additional dimensions OR • Climb down a concept hierarchy (e.g., from county to city)

Load

sort, summarize, consolidate, compute views, check integrity, and build indices and partitions

Crosstab Definition

• A cross-tab is a table where • values for one of the dimension attributes form the row headers • values for another dimension attribute form the column headers • other dimension attributes are listed on top • values in individual cells are (aggregates of) the values of the dimension attributes that specify the cell. • totals for every row and column are also pre-computed

Non-Volatile

• A physically separate store of data transformed from the operational environment • Operational updates of data do not occur in the data warehouse environment • Does not require transaction processing, recovery, and concurrency control mechanisms • Requires only two operations in data accessing: • initial loading of data and access of data

Integrated

• Constructed by integrating multiple, heterogeneous data sources • relational databases, flat files, records from online transaction processing (OLTP) systems • Data cleaning and data integration techniques applied • Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources • E.g., Hotel price: currency, tax, breakfast covered, etc. • When data is moved to the warehouse, it is converted.

Subject-Oriented

• Organized around major subjects, such as customer, product, sales • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process

Time-Variant

• The time horizon for the data warehouse is significantly longer than that of operational systems • Operational database: current value data • Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) • Every key structure in the data warehouse • Contains an element of time, explicitly or implicitly • But the key of operational data may or may not contain "time element"


Ensembles d'études connexes

Chapter 3 - Organizational Commitment

View Set

Cultural Characteristics of South and Southeast Asia, Natural Resources and Economies of South Asia

View Set

Chapter 74 Drug Therapy for Gout

View Set

Module 2.4 Base de données - 1er test

View Set

Physics Unit 3 and Unit 4 25/09/2020

View Set