OIM 350 #2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Kimball pros

- first phase of data warehousing is delivered quick - logical model can be easily understood by business users - small team of developers can manage DWH

kimball cons

- may not be able to handle all enterprise reporting needs -integration of legacy data into DWH can be a complex process

inmon cons

- more etl work is needed for data marts - initial set up and delivery takes more time/resources - requires large team of specialists to manage

DB Security Measures

-authentication: users account identifier and password for logon -access privileges: ensuring that only some people have access to certain parts of the database -role-based access control: privileges are associated with certain roles

Kimball methodologies

-bottom up -deliver easy solution that supports data query - business process oriented - dimensional modeling - a few tables with redundant data -2 tier architecture integrated via conformed dimensions -fairly simple - good direct end user - active participation business driven

Inmon Pros

-dwh serves as the single source of truth - data update anomalies are avoided -dwh can easily be updated

How does the development process of a DWH differ from that of an operational DB system?

-the operational db system does not have a step to create an etl infrastruture - they have a different modeling approach -they have different types of indirect use

Inmon methodologies

-top down -deliver sound technical solution based on proven methods -subject oriented -ERD relational modeling -many tables with no redundant data -3 tier architecture integrated by enterprise data model - quite complex - good for indirect end users - passive participation it-driven

What are the main components of a data warehouse system?

1. source systems- operational databases that provide analytically useful information 2. extraction-transformation-load (ETL)- extracts and analyses data from operational data sources 3. Data warehouse itself- repository of the analytical data integrated from the source systems

Data Warehouse VS Data Mart

DATA WAREHOUSE subjects: multiple Data sources: many typical size:very big implementation time: relatively long focus: organization wide DATA MART subjects: single Data sources: fewer typical size: not as big implementation time: not as long focus: narrower than organization wide

ETL (Extraction, Transformation, Load) Process

Data are extracted from external data sources using custom ETL software, maintained in a staging area where transformed, cleansed, and integrated, then loaded into the Data Warehouse and/or data marts.

dependent data mart

Does not have its own source systems, data comes from data warehouse

MOLAP

Multidimensional Online Analytical Processing, data is stored in multidimensional cubes and the complexity is hidden from users Pros - fast analysis cons- not appropriate for holding transaction level detail data or large amounts of data

Binary Search

Non linear method, faster than linear search, divides sorted lists into two parts of the same size and then eliminated either the bottom or top half, this is repeated until it is found

functional differeneces OLTP OLAP

Operational OLTP -all users -tactical -application oriented Analytical OLAP - narrow user group -decision making -subject oriented

Backup and Recovery

The combination of procedures that can restore lost data in the event of hardware or software failure. examples are: recovery log, checkpoints and complete mirrored backups

alpha release

a data warehouse and the associated front end applications are deployed internally to the members of the development team for initial testing of its functionalities

slowly changing dimensions

a dimension that contains attributes whose values can change

materialized view

a table that actually has physically saved data

Data Quality Characteristics

accuracy, uniqueness, completeness, consistency, timeliness, conformity

DW Deployment

allowing the end user to access the newly created and populated data warehouse and its front end applications.

type 3

applicable when theres a fixed number of changes and limited history is needed. Creates a previous and current column in the dimensional table for each column where changes are anticipated

HOLAP

combines both molap and rolap for a summary type information. its leveraged cube technology for faster performance but when details are needed holap can retried the underlying relational data

dimension tables

contain descriptions of the business, organization or enterprise to which the subject of analysis belongs, QUALITATIVE

fact tables

contain measures related to the subject of analysis, typically numeric. QUANTITATIVE

granularity

describes what is depicted by one row in the fact table. Fine granularity: when each record represents a single fact. Coarse granularity when each record represents a summarizations of multiple facts

data makeup differences

differences in the characteristics of the data that comprises of the operational and analytical information. data time-horizon, data level of detail and data time representation

functional differences

differences in the rational for the use of operational and analytical data. Data audience and data orientation

technical differences

differences in the way that operational and analytical data is handled and accessed by the DBMS and applications . Queried amounts of data and frequency of queries, data update, data redundancy

Detailed fact table

each record refers to a single fact

aggregated fact tables

each record summarizes multiple facts

Snow Flake schema

ease of maintenance: easier to change (no redundancy) ease of use: harder to understand (more complex queries) query performance: longer execution time (more foreign keys) when to use: in data warehouses, when dimension tables are big

Star Schema (not normalized)

ease of maintenance: harder to change (redundant data) ease of use: easier to understand (less complex) query performance: faster (less foreign keys) when to use: in data marts, when dimensional tables have a small number of rows

Query optimization

examining multiple ways of executing the same query and choosing the fastest option, making it the most efficient

When should a data warehouse designer consider choosing the inmon or Kimball approach over the other?

inmon - high rate of change sources - large team of specialists - longer start of time -higher start up cost -easy maintenance Kimball -stable source systems -small reams of generalists -urgent start time -lower start up cost -difficult maintenance

What are the main differences between the inmon and kimball approaches to modeling a data warehouse?

inmon: -top down -deliver sound solution -subject oriented -erd relational modeling -many tables no redundant data -complex -indirect end user Kimball -bottom up -easy solution -business process oriented -dimensional model -few tables redundant -simple -direct end user

Linear Search

looking up a particular record based on a value in an unsorted column. Checks elements sequentially and one at a time until its found

Regular View

not an actual table and does not have any physically saved data

data makeup differences operational and analytical

operational -shorter time horizon - fine detail -current time Analytical -longer time horizon -summarize -current and snapshots (time of presentation

technical differences operational analytical

operational (OLTP) -small data -frequent queries -read insert delete update -low redundancy Analytical (OLAP) -large data -infrequent queries - read only -high redundancy

What are the main differences between operational and analytical data?

operational information refers to the information collected and used in support of day to day operational needs analytical information refers to the information collected and used for decision support of tasks requiring data analysis

analytical data

refers to information collected and used for decision support of tasks requiring data analysis

operational data

refers to information collected and used in sport of day to day operational needs, individual tranactions

OLAP (online analytical processing)

refers to querying and presenting data from data warehouses/data marts for analytical purposes, can only engage in querying and presenting data read only

OLTP (online transaction processing)

refers to the updating, querying and presenting data from databases and other operational purposes. updating querying and presenting data routinely performs transactions that insert,modify and delete data from databases

ROLAP

relational online analytical processing translates queries in a into sql statements Pros- can handle large amounts of data cons- slow performance, limited by SQL functions

View Materialization

saving a view as an actual physical table, this is to improve performance of queries on frequently used views.

independent data mart

stand alone data mart, created in the same fashion as the data warehouse, has its own source systems and ETL infrastructure

Production release

the actual deployment of a functioning data warehousing system

beta release

the data warehouse is deployed to a selected group of users to test usability of the system

Type 1

the simplest and most used, simply changes the value in the dimensions record where the new value is the old value

type 2

used in cases that history should be preserved, creates additional dimension records using a new value for the surrogate key every time a value in a dimension record changes

Why does an organization need a separate analytical DB system in addition to an operational DB system?

• Performance of operational databases can be severely diminished if day-to-day tasks have to share computing resources with analytical queries • It is difficult to structure a database which can be efficiently used for both operational and analytical purposes


Set pelajaran terkait

Chapter 21: Nurse Management of Labor and Birth at Risk

View Set

Palpitations 3 yin effulgent, heart yang def

View Set

Chap 2 (part 2): Structure of Interest Rates (Chap 6: Test bank)

View Set

12.2 The Structure of DNA, Chapet 12

View Set

MS2 Exam 2 Practice Questions (GI)

View Set

Chapter 29 & 30 Test (Mental Health Disorders & Delirium and Dementia)

View Set

peds chapter 51: muscular dystrophy and musculoskeletal disorders

View Set

right triangle relationships unit test review

View Set