INFS chap 8 slides
performance reasons
the performance of operational day to day tasks involving data use can be severely diminished if such tasks have to compete for computing resources with analytical tasks
data warehouse
-a typical organization maintains and utilizes a number of operational data sources (i.e. operational databases) -a data warehouse is created within an organization as a separate data store whose primary purpose is data analysis
data warehouses are modeled and structured differently than operational databases
-data warehouses use diff types of technologies for storage and retrieval than operational databases -data warehouses are used for diff (much smaller) set of users than operational databases: --> operational databases can be used by 1000s of people (customers, employees, etc) --> data warehouses are used by data analysts and decision makers
dependent data mart
-doesn't have its own source systems -the data comes from the data warehouse -dependent data marts provide users with a subset of the data from the data warehouse, in cases when users or apps of not want, need, or aren't allowed to have access to all the data in entire data warehouse
source systems
-in context of data warehousing, source systems are operational databases that provide analytically useful info for the data warehouse's subjects of analysis -every operational database that is used as a source system for the data warehouse has 2 purposes: 1. the original operational purpose 2. as a source system for the data warehouse -source systems can include external data sources -ex: market research data, census data, weather data
operational database
collects and presents operational info in support of daily operational procedures and processes
difference in structuring of data
its often impossible to structure a database which can be used in an efficient manner for both operational and analytical purposes
data warehouse front end apps
-often referred to as BI (business intelligence) apps -used t provide access t data warehouse for users who are engaging in indirect use
data warehouse components
1. source systems (operational databases) 2. extraction-transformation-load (ETL) infrastructure 3. data warehouse 4. front end (BI) apps
reasons for creation of a data warehouse as a separate analytical database:
1. time horizon difference 2. performance reasons 3. difference in structuring of data
analytical database (i.e data warehouse)
collects and presents analytical info in support of analytical tasks
data warehouse
-sometimes referred to as the target system, to indicate the fact that it is a destination for the data from the source systems -a typical data warehouse periodically retrieves selected analytically useful data from the operational data sources
ETL infrastructure
-the infrastructure that facilitates the retrieval of data from operational databases into the data warehouses ETL includes the following tasks: 1. extracting analytically useful data from the operational data sources 2. transforming such data so that it conforms to the structure of the target data warehouse model, while ensuring the quality of the transformed data 3. loading the transformed and quality assured data into the target data warehouse
operational vs analytical data
-the same fact can have both an operational and analytical purpose -for ex: data describing that customer X bought product Y in store Z can be stored as operational data in an operational database for day to day operational purposes, such as inventory monitoring -the same fact can also be stored as analytical data in data warehouse where, combined with vast number of similar facts accumulated over a period of time, it serves to reveal important trends such as sale patterns
data mart
a data store based on the same principles as a data warehouse, but with more limited scope
independent data mart
stand alone data mart, created in same fashion as the data warehouse -independent data mart has its own source systems and ETL infrastructure
time horizon difference
the difference between how long the data is useful for transactional vs analytical needs -regular transaction-oriented databases store data for a limited period of time (often 60-90 days) before data loses its immediate usefulness and it is archived -keeping data in transaction-oriented databases for longer periods of time would simply cause the vast majority of databases operations to deal with unnecessarily large amounts of old, no longer used data -on the other hand, data warehouses tend to keep years' worth of data in order to enable long term analysis
analytical information
the info collected and used in support of analytical tasks -ex: info reflecting trends, sales, product stats, future growth projections -analytical info is based on operational (transactional) info
operational information (transactional information)
the info collected and used in support of day to day operational needs in businesses and other organizations -ex: info reflecting withdrawing cash from ATM, making airline reservation, purchasing stocks