Lecture 8 Data Warehouse
Performance Reasons
The performance of operational day-to-day tasks involving data use can be severely diminished if such tasks have to compete for computing resources with analytical tasks.
Reasons for the creation of a data warehouse as a separate analytical database:
1. Time Horizon Difference 2. Performance Reasons 3. Difference in structuring of data
Data mart
A data source based on the same principles as a data warehouse, but with a more limited scope
Analytical Database (ex. Data warehouse)
Collects and presents analytical information in support of analytical tasks
Operational Database
Collects and presents operational information in support of daily operational procedures and processes
Dependent data mart
Does not have its own source systems. The data comes from the data warehouse. Dependent data marts provide users with a subset of the data from the data warehouse, in cases when users or applications do not want, need, or are not allowed to have access to all the dat in the entire data warehouse.
Analytical Data
In a data warehouse where, combined with a vast number of similar facts accumulated over a period of time, it serves to reveal important trends, such as sale patterns or customer behavior.
Operational Data
In an operational database for day-to-day operational purposes, such as inventory monitoring of financial transaction record keeping.
Data Warehouse
Is created within an organization as a separate data store whose primary purpose is data analysis. A typical organization maintains and utilizes a number of operational data sources (ex. Operational database) is sometimes referred to as the target system, to indicate the fact that it is a destination for the data from the source systems. Retrieves selected analytically useful data from the operational data source.
Data Warehouse Components
Source systems (operational database), extraction-transformation-load (ETL) infrastructure, data warehouse, front-end (BI) applications
Independent data mart
Stand-alone data mart, created in the same fashion as the data warehouse. Independent data mart its own systems and ETL infrastructure
Time Horizon Difference
The difference between how loud the data is useful for transactional vs, analytical needs. Regular transaction-oriented database store data for a limited period of time (often 60-90 days) before the data loses its immediate usefulness and it is archived. Keeping data in transaction-oriented databases for longer periods of time would simply cause the vast majority of database operations to deal with unnecessarily large amounts of old, no longer used data. On the other hand, data warehouse tend to tend to keep years' worth of data in order to enable long-term analysis.
Analytical Information
The information collected and used in support of analytical task ex. Information reflecting trends, sales, products statistics, and future growth projections. Operations, (transactional) informayion
Operational Information (transactional information)
The information collected and used in support of day to day operational needs in business and other organizations.
Source Systems
In the context of data warehousing, source systems are operational database that provide analytically useful information for the data warehouse's subjects of analysis. Every operational database that is used as a source system for the data warehouse has two purposes: the original operational purpose & as a source system for the data warehouse. Source systems can include external data sources ex. market research data, census data, stock market data, weather data.
Difference in Structuring of Data
It is often impossible to structure a database which can be used in an efficient manner for both operational and analytical purposes
Data Warehouse Front-end Applications
Often referred to as BI (Business Intelligence) applications. Used to provide access to the data warehouse for users who are engaging in indirect use
ETL Infrastructure
The infrastructure that facilitates the retrieval of data from operational database into the data warehouses. Includes the following tasks: extracting analytically useful data from the operational data sources, transforming such data so that it conforms to the structure of the target data warehouse model, while ensuring the quality of the transformed data, loading the transformed and quality assured data into the target data warehouse.