SQL Database Warehousing
How would you recognize a degenerate dimension in a star schema?
(DD) or they are just out of place
6. Factless fact table
- contains no measures than can be aggregated
Additive
- facts (or measures) can be summed, averaged (aggregated) across all dimensions
Accumulating snap shot
- snap shot based on attributes such as part numbers or warehouses that can be summed to get total inventories
What is the difference between a fact table and a dimension table?
-Fact table: attributes in fact tables are measurements for analysis or main contents in reports. -Dimension table: attributes in dimension tables are constraints for the measurements or headers in reports.
what is the difference between a fact table and a dimension table
-Fact table: attributes in fact tables are measurements for analysis or main contents in reports. -Dimension table: attributes in dimension tables are constraints for the measurements or headers in reports.
Measures for Data Quality
1. Correctness 2. Uniqueness 3. Completeness 4. Relevance 5. Currency
Causes for Data Errors
1. Data entry errors 2. Correct data not available at time of data entry 3. Inconsistent or incorrect use of "codes" 4. Inconsistent of incorrect interpretation of data entry "fields" 5. Transaction processing errors (errors in programming code) 6. System and recovery errors 7. Data extract/transform errors
How to correct missing values
1. Exclude the record 2. Exclude the attribute/field 3. Replace missing value by a global constant 4. Replace missing value by the attribute mean 5. Replace missing value by the most probable value 6. Apply 4 & 5 by class/segment of records 7. Manual correction 8. Application specific algorithm
Be able to identify errors in the dimensional model
1. Incorrect or missing facts 2. Missing dimensions or missing dimension attributes 3. Incorrect connection between the fact and dimension table 4. Are all the dimension tables represented in the fact table via having their primary key as part of the composite primary key of the fact table? 5. Can the measures in the fact table be better represented to create the calculations desired? 6. Can the fact table be constructed off the source tables? 7. Are the measures at the same grain? 8. Is there only one subject in the star schema?
Slowly changing dimensions
1. Overwrite the value 2. Add a dimension column 3. Add a dimension row
How does data flow out of the OLTP (Source) System and into the Data Warehouse?
>Start with report content >Identify report source data >Design, build, and populate star schema
Outrigger
A "permissible" snowflake; Used when a dimension table has many attributes that get loaded at different times that are related together;
index
A data structure that is added to a database to increase the look up speed of a single value within a large table
roleplaying
A dimension that has its primary key included multiple times as a foreign key in the fact table; A dimension that can mean different things in a schema, i.e. a supplier and a customer
completeness
A measure for data quality that describes the range of the data
currency
A measure for data quality that describes the time sensitivity of the data
uniqueness
A measure for data quality that determines the frequency of the value
null
A missing value in a data record
snowflake
A name for a data warehouse schema that means it has too many relationships in it
override
A strategy of updating a data warehouse with new information that is easy, but loses all history
mini
A type of dimension that is linked to both the fact table and a dimension table; it contains attributes that are prone to change, with a single row representing a unique combination of values;
bitmap
A type of index that consumes very little disk space
B+
A type of index that is very fast when an attribute has high cardinality
degenerate
A word for a dimension attribute that was put into the fact table because it was alone by itself
ETL
Acronym used to describe the process of moving data out of the OLTP system, cleaning it, and loading it into the data warehouse
measures
Additive or semi-additive attributes found in the fact table
Fact table aggregates
Aggregate base-level fact data across one or many dimensions.
dimension
An input in a report that may have some impact on the output or thing being measured
Conformed dimensions
Conformed dimensions are those that are used in multiple star schemas. The dimensions are referenced by the same name and the attributes are also referenced by the same name.
What is the difference between a report dimension and a dimension in the star schema?
DIMENSION IN REPORTS -ONE OF THE PERSPECTIVES THAT CAN BE USED TO ANALYZE THE DATA -THE CATEGORIES BY WHICH THE MEASURE IS GROUPED DIMENSIONS OR DIMENSION TABLES -KEYS -ATTRIBUTES
Know the SQL (DDL) to Create tables and apply constraints
EXAMPLE: CREATE TABLE Domain( DomainKEY DECIMAL(25) IDENTITY PRIMARY KEY, DomainID DECIMAL(35), DomainName VARCHAR(4000), SiteCategoryID DECIMAL(25), SiteCategory VARCHAR(1000) )
Know the SQL for INSERTing records in a table
INSERT INTO ReferralDomain ( ReferralDomainName ) SELECT distinct ReferralDomainName FROM SessionInfoSource
Know the SQL to INSERT data from a SELECT statement
INSERT SessionPurchaseFact (customerKey, domainKey,sessionDateKey,productKey, referralDomainKey,NumPageViews, duration) SELECT cus.customerKey, dom.domainKey,sdate.sessionDateKey, prod.productKey, rdom.referralDomainkey,sinfo.pagesViewed,sinfo.duration FROM Customer cus JOIN SessionInfoSource sinfo on cust.machineID = sinfo.machineID
factless
If a fact table has nothing by primary keys
conformed
In order for a dimension get used in multiple star schemas it must adhere to a strict naming policy, so it is called _____________
Semi-additive or Snap shot fact
Inventories can be averaged but not summed. Can be aggregated across some, but not all dimensions.
Mini Dimension versus Outrigger
Mini dimension has a Primary Key that is part of the composite key of the fact table. Mini dimensions are good for very large rapidly changing dimension tables. Outrigger table's Primary Key is a foreign key in the related dimension table, not the fact table. Mini dimension is more effective when some attributes change frequently. Outrigger saves disk space by having a row for unique demographic data but does not help with rapidly changing data
Multi-valued or many-to-many dimensions
Most dimension-to-fact relationships are one-to-many. In some cases there can be a many-to-many relationship. For example, we discussed a doctor's visit may result in many medications being prescribed. A MedicationGroup table is created along with a "bridge" table to serve as an associative table between the Medication and the MedicationGroup table.
What do measures in a report become in the dimensional model?
REPORT MEASURES -TYPICALLY A NUMERIC VALUE USED TO MEASURE THE OUTPUT VARIABLE • DIMENSION -ONE OF THE PERSPECTIVES THAT CAN BE USED TO ANALYZE THE DATA -THE CATEGORIES BY WHICH THE MEASURE IS GROUPED THE OUTPUT OF DIMENSIONAL MODELING IS A STAR SCHEMA
grain
Represented by one row of a table.
Row
Something that can be added to a slowly changing dimension to preserve all history
Dimension table aggregates
Summarizes dimension data along a single attribute such as CITY or STATE.
Role-playing dimension
The key for this dimension appears several times in the fact table.
star
The name of a data warehouse schema or model
fact
The name of the center table in a star schema
OLTP
The typical source system for businesses
Snapshot
Type of fact table data that is semi-additive meaning that it cannot be summed over time because it represents a value at a point in time
aggregate
When some of the rows in a fact table are summarized in a table to speed processing of certain reports
Know the difference between the b-tree and bitmap indexes and when you would want to use each of these types of indexes.
b-tree index is better for higher cardinality attribute values and is very fast. On the downside, though, it uses more memory. Bitmap index works best for low cardinality attributes, where possible answers are few (boolean is best). The bitmap index takes up very little space.
1. Non-additive
facts cannot be aggregated across any dimension (degenerate)
What is a degenerate Dimension?
•When a dimension table has only 1 attribute, it can be added to the fact table as a degenerate dimension •Attribute can be part of Fact primary key, but is not a foreign key •All of the other tables attributes were absorbed in other dimensions •Attribute (i.e. order Number) is still valuable for its grouping potential