OIM 350 #2
Kimball pros
- first phase of data warehousing is delivered quick - logical model can be easily understood by business users - small team of developers can manage DWH
kimball cons
- may not be able to handle all enterprise reporting needs -integration of legacy data into DWH can be a complex process
inmon cons
- more etl work is needed for data marts - initial set up and delivery takes more time/resources - requires large team of specialists to manage
DB Security Measures
-authentication: users account identifier and password for logon -access privileges: ensuring that only some people have access to certain parts of the database -role-based access control: privileges are associated with certain roles
Kimball methodologies
-bottom up -deliver easy solution that supports data query - business process oriented - dimensional modeling - a few tables with redundant data -2 tier architecture integrated via conformed dimensions -fairly simple - good direct end user - active participation business driven
Inmon Pros
-dwh serves as the single source of truth - data update anomalies are avoided -dwh can easily be updated
How does the development process of a DWH differ from that of an operational DB system?
-the operational db system does not have a step to create an etl infrastruture - they have a different modeling approach -they have different types of indirect use
Inmon methodologies
-top down -deliver sound technical solution based on proven methods -subject oriented -ERD relational modeling -many tables with no redundant data -3 tier architecture integrated by enterprise data model - quite complex - good for indirect end users - passive participation it-driven
What are the main components of a data warehouse system?
1. source systems- operational databases that provide analytically useful information 2. extraction-transformation-load (ETL)- extracts and analyses data from operational data sources 3. Data warehouse itself- repository of the analytical data integrated from the source systems
Data Warehouse VS Data Mart
DATA WAREHOUSE subjects: multiple Data sources: many typical size:very big implementation time: relatively long focus: organization wide DATA MART subjects: single Data sources: fewer typical size: not as big implementation time: not as long focus: narrower than organization wide
ETL (Extraction, Transformation, Load) Process
Data are extracted from external data sources using custom ETL software, maintained in a staging area where transformed, cleansed, and integrated, then loaded into the Data Warehouse and/or data marts.
dependent data mart
Does not have its own source systems, data comes from data warehouse
MOLAP
Multidimensional Online Analytical Processing, data is stored in multidimensional cubes and the complexity is hidden from users Pros - fast analysis cons- not appropriate for holding transaction level detail data or large amounts of data
Binary Search
Non linear method, faster than linear search, divides sorted lists into two parts of the same size and then eliminated either the bottom or top half, this is repeated until it is found
functional differeneces OLTP OLAP
Operational OLTP -all users -tactical -application oriented Analytical OLAP - narrow user group -decision making -subject oriented
Backup and Recovery
The combination of procedures that can restore lost data in the event of hardware or software failure. examples are: recovery log, checkpoints and complete mirrored backups
alpha release
a data warehouse and the associated front end applications are deployed internally to the members of the development team for initial testing of its functionalities
slowly changing dimensions
a dimension that contains attributes whose values can change
materialized view
a table that actually has physically saved data
Data Quality Characteristics
accuracy, uniqueness, completeness, consistency, timeliness, conformity
DW Deployment
allowing the end user to access the newly created and populated data warehouse and its front end applications.
type 3
applicable when theres a fixed number of changes and limited history is needed. Creates a previous and current column in the dimensional table for each column where changes are anticipated
HOLAP
combines both molap and rolap for a summary type information. its leveraged cube technology for faster performance but when details are needed holap can retried the underlying relational data
dimension tables
contain descriptions of the business, organization or enterprise to which the subject of analysis belongs, QUALITATIVE
fact tables
contain measures related to the subject of analysis, typically numeric. QUANTITATIVE
granularity
describes what is depicted by one row in the fact table. Fine granularity: when each record represents a single fact. Coarse granularity when each record represents a summarizations of multiple facts
data makeup differences
differences in the characteristics of the data that comprises of the operational and analytical information. data time-horizon, data level of detail and data time representation
functional differences
differences in the rational for the use of operational and analytical data. Data audience and data orientation
technical differences
differences in the way that operational and analytical data is handled and accessed by the DBMS and applications . Queried amounts of data and frequency of queries, data update, data redundancy
Detailed fact table
each record refers to a single fact
aggregated fact tables
each record summarizes multiple facts
Snow Flake schema
ease of maintenance: easier to change (no redundancy) ease of use: harder to understand (more complex queries) query performance: longer execution time (more foreign keys) when to use: in data warehouses, when dimension tables are big
Star Schema (not normalized)
ease of maintenance: harder to change (redundant data) ease of use: easier to understand (less complex) query performance: faster (less foreign keys) when to use: in data marts, when dimensional tables have a small number of rows
Query optimization
examining multiple ways of executing the same query and choosing the fastest option, making it the most efficient
When should a data warehouse designer consider choosing the inmon or Kimball approach over the other?
inmon - high rate of change sources - large team of specialists - longer start of time -higher start up cost -easy maintenance Kimball -stable source systems -small reams of generalists -urgent start time -lower start up cost -difficult maintenance
What are the main differences between the inmon and kimball approaches to modeling a data warehouse?
inmon: -top down -deliver sound solution -subject oriented -erd relational modeling -many tables no redundant data -complex -indirect end user Kimball -bottom up -easy solution -business process oriented -dimensional model -few tables redundant -simple -direct end user
Linear Search
looking up a particular record based on a value in an unsorted column. Checks elements sequentially and one at a time until its found
Regular View
not an actual table and does not have any physically saved data
data makeup differences operational and analytical
operational -shorter time horizon - fine detail -current time Analytical -longer time horizon -summarize -current and snapshots (time of presentation
technical differences operational analytical
operational (OLTP) -small data -frequent queries -read insert delete update -low redundancy Analytical (OLAP) -large data -infrequent queries - read only -high redundancy
What are the main differences between operational and analytical data?
operational information refers to the information collected and used in support of day to day operational needs analytical information refers to the information collected and used for decision support of tasks requiring data analysis
analytical data
refers to information collected and used for decision support of tasks requiring data analysis
operational data
refers to information collected and used in sport of day to day operational needs, individual tranactions
OLAP (online analytical processing)
refers to querying and presenting data from data warehouses/data marts for analytical purposes, can only engage in querying and presenting data read only
OLTP (online transaction processing)
refers to the updating, querying and presenting data from databases and other operational purposes. updating querying and presenting data routinely performs transactions that insert,modify and delete data from databases
ROLAP
relational online analytical processing translates queries in a into sql statements Pros- can handle large amounts of data cons- slow performance, limited by SQL functions
View Materialization
saving a view as an actual physical table, this is to improve performance of queries on frequently used views.
independent data mart
stand alone data mart, created in the same fashion as the data warehouse, has its own source systems and ETL infrastructure
Production release
the actual deployment of a functioning data warehousing system
beta release
the data warehouse is deployed to a selected group of users to test usability of the system
Type 1
the simplest and most used, simply changes the value in the dimensions record where the new value is the old value
type 2
used in cases that history should be preserved, creates additional dimension records using a new value for the surrogate key every time a value in a dimension record changes
Why does an organization need a separate analytical DB system in addition to an operational DB system?
• Performance of operational databases can be severely diminished if day-to-day tasks have to share computing resources with analytical queries • It is difficult to structure a database which can be efficiently used for both operational and analytical purposes
