SQL Database Warehousing

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

How would you recognize a degenerate dimension in a star schema?

(DD) or they are just out of place

6. Factless fact table

- contains no measures than can be aggregated

Additive

- facts (or measures) can be summed, averaged (aggregated) across all dimensions

Accumulating snap shot

- snap shot based on attributes such as part numbers or warehouses that can be summed to get total inventories

What is the difference between a fact table and a dimension table?

-Fact table: attributes in fact tables are measurements for analysis or main contents in reports. -Dimension table: attributes in dimension tables are constraints for the measurements or headers in reports.

what is the difference between a fact table and a dimension table

-Fact table: attributes in fact tables are measurements for analysis or main contents in reports. -Dimension table: attributes in dimension tables are constraints for the measurements or headers in reports.

Measures for Data Quality

1. Correctness 2. Uniqueness 3. Completeness 4. Relevance 5. Currency

Causes for Data Errors

1. Data entry errors 2. Correct data not available at time of data entry 3. Inconsistent or incorrect use of "codes" 4. Inconsistent of incorrect interpretation of data entry "fields" 5. Transaction processing errors (errors in programming code) 6. System and recovery errors 7. Data extract/transform errors

How to correct missing values

1. Exclude the record 2. Exclude the attribute/field 3. Replace missing value by a global constant 4. Replace missing value by the attribute mean 5. Replace missing value by the most probable value 6. Apply 4 & 5 by class/segment of records 7. Manual correction 8. Application specific algorithm

Be able to identify errors in the dimensional model

1. Incorrect or missing facts 2. Missing dimensions or missing dimension attributes 3. Incorrect connection between the fact and dimension table 4. Are all the dimension tables represented in the fact table via having their primary key as part of the composite primary key of the fact table? 5. Can the measures in the fact table be better represented to create the calculations desired? 6. Can the fact table be constructed off the source tables? 7. Are the measures at the same grain? 8. Is there only one subject in the star schema?

Slowly changing dimensions

1. Overwrite the value 2. Add a dimension column 3. Add a dimension row

How does data flow out of the OLTP (Source) System and into the Data Warehouse?

>Start with report content >Identify report source data >Design, build, and populate star schema

Outrigger

A "permissible" snowflake; Used when a dimension table has many attributes that get loaded at different times that are related together;

index

A data structure that is added to a database to increase the look up speed of a single value within a large table

roleplaying

A dimension that has its primary key included multiple times as a foreign key in the fact table; A dimension that can mean different things in a schema, i.e. a supplier and a customer

completeness

A measure for data quality that describes the range of the data

currency

A measure for data quality that describes the time sensitivity of the data

uniqueness

A measure for data quality that determines the frequency of the value

null

A missing value in a data record

snowflake

A name for a data warehouse schema that means it has too many relationships in it

override

A strategy of updating a data warehouse with new information that is easy, but loses all history

mini

A type of dimension that is linked to both the fact table and a dimension table; it contains attributes that are prone to change, with a single row representing a unique combination of values;

bitmap

A type of index that consumes very little disk space

B+

A type of index that is very fast when an attribute has high cardinality

degenerate

A word for a dimension attribute that was put into the fact table because it was alone by itself

ETL

Acronym used to describe the process of moving data out of the OLTP system, cleaning it, and loading it into the data warehouse

measures

Additive or semi-additive attributes found in the fact table

Fact table aggregates

Aggregate base-level fact data across one or many dimensions.

dimension

An input in a report that may have some impact on the output or thing being measured

Conformed dimensions

Conformed dimensions are those that are used in multiple star schemas. The dimensions are referenced by the same name and the attributes are also referenced by the same name.

What is the difference between a report dimension and a dimension in the star schema?

DIMENSION IN REPORTS -ONE OF THE PERSPECTIVES THAT CAN BE USED TO ANALYZE THE DATA -THE CATEGORIES BY WHICH THE MEASURE IS GROUPED DIMENSIONS OR DIMENSION TABLES -KEYS -ATTRIBUTES

Know the SQL (DDL) to Create tables and apply constraints

EXAMPLE: CREATE TABLE Domain( DomainKEY DECIMAL(25) IDENTITY PRIMARY KEY, DomainID DECIMAL(35), DomainName VARCHAR(4000), SiteCategoryID DECIMAL(25), SiteCategory VARCHAR(1000) )

Know the SQL for INSERTing records in a table

INSERT INTO ReferralDomain ( ReferralDomainName ) SELECT distinct ReferralDomainName FROM SessionInfoSource

Know the SQL to INSERT data from a SELECT statement

INSERT SessionPurchaseFact (customerKey, domainKey,sessionDateKey,productKey, referralDomainKey,NumPageViews, duration) SELECT cus.customerKey, dom.domainKey,sdate.sessionDateKey, prod.productKey, rdom.referralDomainkey,sinfo.pagesViewed,sinfo.duration FROM Customer cus JOIN SessionInfoSource sinfo on cust.machineID = sinfo.machineID

factless

If a fact table has nothing by primary keys

conformed

In order for a dimension get used in multiple star schemas it must adhere to a strict naming policy, so it is called _____________

Semi-additive or Snap shot fact

Inventories can be averaged but not summed. Can be aggregated across some, but not all dimensions.

Mini Dimension versus Outrigger

Mini dimension has a Primary Key that is part of the composite key of the fact table. Mini dimensions are good for very large rapidly changing dimension tables. Outrigger table's Primary Key is a foreign key in the related dimension table, not the fact table. Mini dimension is more effective when some attributes change frequently. Outrigger saves disk space by having a row for unique demographic data but does not help with rapidly changing data

Multi-valued or many-to-many dimensions

Most dimension-to-fact relationships are one-to-many. In some cases there can be a many-to-many relationship. For example, we discussed a doctor's visit may result in many medications being prescribed. A MedicationGroup table is created along with a "bridge" table to serve as an associative table between the Medication and the MedicationGroup table.

What do measures in a report become in the dimensional model?

REPORT MEASURES -TYPICALLY A NUMERIC VALUE USED TO MEASURE THE OUTPUT VARIABLE • DIMENSION -ONE OF THE PERSPECTIVES THAT CAN BE USED TO ANALYZE THE DATA -THE CATEGORIES BY WHICH THE MEASURE IS GROUPED THE OUTPUT OF DIMENSIONAL MODELING IS A STAR SCHEMA

grain

Represented by one row of a table.

Row

Something that can be added to a slowly changing dimension to preserve all history

Dimension table aggregates

Summarizes dimension data along a single attribute such as CITY or STATE.

Role-playing dimension

The key for this dimension appears several times in the fact table.

star

The name of a data warehouse schema or model

fact

The name of the center table in a star schema

OLTP

The typical source system for businesses

Snapshot

Type of fact table data that is semi-additive meaning that it cannot be summed over time because it represents a value at a point in time

aggregate

When some of the rows in a fact table are summarized in a table to speed processing of certain reports

Know the difference between the b-tree and bitmap indexes and when you would want to use each of these types of indexes.

b-tree index is better for higher cardinality attribute values and is very fast. On the downside, though, it uses more memory. Bitmap index works best for low cardinality attributes, where possible answers are few (boolean is best). The bitmap index takes up very little space.

1. Non-additive

facts cannot be aggregated across any dimension (degenerate)

What is a degenerate Dimension?

•When a dimension table has only 1 attribute, it can be added to the fact table as a degenerate dimension •Attribute can be part of Fact primary key, but is not a foreign key •All of the other tables attributes were absorbed in other dimensions •Attribute (i.e. order Number) is still valuable for its grouping potential


संबंधित स्टडी सेट्स

Fundamentals of Project Management

View Set

ECON 202 Exam 3: Chapter 9, 10, 11, 12 Wiggins Meer

View Set

Nursing Assessment and Care of Patients with Endocrine Disorders > Level- 4: Confident - NUPN 1510

View Set

cob 487: midterm question 15: transaction costs and vertical integration

View Set

All- PRACTICAL Application Case Review

View Set