INFS 346 Ch 7
source systems
-operational databases and other operational data repositories (in other words, any sets of data used for operational purposes) that provide analytically useful information for the dat warehouses subjects of analysis -every operational data store that is used as a source system for the data warehouse has two purposes -source systems can include external data sources (market research data, census data,stock market data, weather data)
technical issues that could arise and need to be addressed by data warehouse administration
-providing security for the information contained in the warehouse -ensuring sufficient hard-drive space for the data warehouse content -implementing the backup and recovery procedures
data warehouse deployment
-releasing the dat warehouse and its front-end (bi) applications for use by end users -typically prior to this step - initial load populating the created data warehouse with an initial set of data from the operational data sources via the ETL infrastructure is executed
developing front-end (BI) applications
- -designing and creating applications for indirect use by the end-users -included in most data warehousing systems and are often referred to as BI applications -contain interfaces such as forms and reports accessible via a navigation mechanism such as a menu -can take place parallel with data warehouse creation
Retrieval of analytical information
- a data warehouse is developed for the retrieval of analytical information, and it is not meant for direct data entry by the users -the only functionality available to the users fo the data warehouse is retrieval -the data in the data warehouse is not subject to changes -the data in the data warehouse is referred to as non-volatile, static, or read only
ETL includes the following tasks
-EXTRACTING analytically useful data from he operational data sources -TRANSFRORMING such data so that it conforms to the structure of the subject-oriented target data warehouse model (while ensuring the quality of the transformed data) -LOADING the transformed and quality assured data into the target data warehouse
Data mart
-a data store based on the same principles as a data warehouse, but with a more limited scope
detailed and/or summarized data
-a data warehouse, depending on its purpose, may include the detailed data or summary data or both -a data warehouse that contains the data at the finest level of detail is most powerful
A typical organization maintains and utilize ...
-a number of operational data sources -the operational data sources include the databases and other data repositories which are used to support the organization's day-to-day operations
datawarehouse development iterative nature
-again always go back to the beginning when you need to make a change
requirements collection, definition, and visualization - standard
-collected requirements should be clearly defined and stated in a written document, and then visualized as a conceptual data model
dependent data mart
-does no have its own source systems -the data comes from the data warehouse
Purpose of data warehouse
-is the retrieval of analytical information -stores detailed and summarized data
historical
-refers to the larger time horizon in the data warehouse than in the operational databases
Operation data
DATA MAKE UP DIFFERENCES -typical time-horizon:Days/Months -Detailed -Current TECHNICAL DIFFERNCES -small amounts used in a process -high frequency of access -can be updated -non-redundant FUNCTIONAL DIFFERENCES -used by all employees for tactical purposes -application oriented
Analytical Data
DATA MAKEUP DIFFERENCES -typical time horizon:Years -Summarized (and/or Detailed) -Values over time (snapshots) TECHNICAL DIFFERENCES -large amounts used in a process -low/modest frequency of access -read (and append) only -redundancy not an issue FUNCTIONAL DIFFERENCES -used by a narrower set of users for decision making -subject oriented
data warehouse administration and maintenance
performing activities that support the data warehouse end user, including dealing with technical issues
the next version of data warehouse
chapter 7 - slide 41
creating ETL infastructure
creating necessary procedures and code for: -automatic extraction of relevant data from the operational data sources -transformation of the extracted data, so that its quality is assured and its structure conforms to the structure of the modeled and implemented data warehouse -the seamless load of the transformed data into the data warehouse -most time consuming and resource consuming part of the data warehouse development process
data warehouse modeling (logical data warehouse modeling)
creation of the data warehouse data model that is implementable by the DBMS software
time variant
refers to the fact that data warehouse contains slices or snapshots of data form different periods of time across its time horizon -with data slices, the user can create reports for various periods of time within the time horizon
subject-oriented
refers to the fundamental difference in the purpose of an operational database system and a data warehouse -operational database system- developed in order to support a specific business operation -a data warehouse- is developed to analyze specific business subject areas
Steps in the development of the data warehouse
see chapter 7 slide 29
analytical information
the information collected and used in support of analytical tasks -analytical information is based on operational (transaction information)
Operational information (transactional information)
the information collected and used in support of day to day operational needs in businesses and other organizations
data warehouse
-sometimes referred to as the target system, to indicate the fact that it is a destination for the data from the source systems -a typically, periodically retrieves selected analytically useful data from the operational data sources - a data warehouse is created within an organization as a separate store whose primary purpose is data analysis - is a structured repository of integrated, subject-oriented, enterprise-wide, historical, and time-variant data.
data warehouse components
-source systems -extraction-transforamtion-load (ETL) infrastructure -data warehouse -front-end applications
independent data mart
-stand-alone data mart, created in the same fashion s the data warehouse -independent data mart has its own source systems and ETL infrastructure
Integrated
-the data warehouse integrates the analytically useful data from various operational databases (and possibly other sources) -Integration refers to this process of bringing the data from multiple data sources into a singular data warehouse
structured repository
-the data warehouse is a database containing analytically useful information -any database is a structured repository with its structure represented in its metadata
the requirements collection, definition and visualization
-the first and most critical step in the data warehouse development process -collection process aspires to analytically take advantage of all data available for consideration -but it cannot be based on data that is not available or does not exist -results in the requirements specifying the desired capabilities and functionalities of the future data warehouse -requirements based on analytical needs that can be met by the data in the internal data source systems and available external data sources -collecte through interviewing various stakeholders of the data warehouse -additional methods include:focus groups, questionnaires, surveys, and observations of existing analytical practices
ETL infrastructure
-the infrastructure that facilitates the retrieval of data from operational databases into the data warehouses
every operational data store that is used a source system for the data warehouse has two purposes
-the original operational purpose -as a source system for the data warehouse
Two main reasons for the creation of a data warehouse as a separate analytical database
-the performance of operational day-to-day tasks involving data use can be severely diminished if such tasks have to compete for computing resources with analytical queries -it is often impossible to structure a database which can be used in an efficient manner for both operational and analytical purposes
Data warehouse use
-the retrieval of the data in the data warehouse -indirect -direct use -via the DBMS-OLAP (BI)tools
enterprise-wide
-the term enterprise-wide refers tot he fact that the data warehouse provides an organization-aid view of the analytically useful information it contains
data warehouse front-end (BI) applications
-used to provide access to the data warehouse for users who are engaging in indirect use
creating the data warehouse
-using a DBMS to implement the data warehouse data model as an actual data warehouse -typically, data warehouses are implemented using a relational DBMS (RDBMS) software