Data and Info. Management Chapter 8 (Marist College Eitel Lauria)
Info. Packages: A new idea for what and what information requirements for a data warehouse?
determining and recording
What relationship between dimension tables and fact tables?
1:N
Two general principles when choosing primary keys for dimension tables
Avoid built in meanings in the primary key of the dimension tables Do not use production system keys as primary keys for dimension tables
5 Benefits of Surrogate Keys
Data warehouse insulated from changes to operational systems Easy to integrate data from multiple systems Narrow dimension keys Better handling of exceptional cases Avoids tempting query writers to assume implicit semantics
2 Advantages of the Star Schema
Easy for users to understand More suitable for query processing. All queries are executed/formulated in the same way
Dimension Table: What primary key? Fact Table: What primary key?
Generated Composite (each component contributed by each dimension table
Star Schema Keys: USE what KEYS? DO NOT USE what KEYS?
SURROGATE PRODUCTION SYSTEM
Dimension Tables: Use what Keys? : System generated "meaningless" keys
Surrogate
Business Dimensions: Get a good grasp of the what of the business data?
The dimensional nature
Star schema (Star join)
The dimensional schema containing facts and dimensions that is a result of dimensional modeling
Dimension Table Key
The primary key of the dimension table uniquely identifies each row in the table.
Dimension Table is typically narrow or wide?
Wide
Dimensional modeling
a data design methodology used for designing subject-oriented analytical databases (e.g., data warehouses or data marts)
Dimension Tables: Keep the OLTP PKs as what in the dimension tables?
additional attributes
Textual Attributes: In the dimension table you will seldom find what used for calculations? The attributes in a dimension table are of what format?
any numerical values textual format.
Info. Packages: Measurements shown where?
at the bottom
Info. Packages: Business dimensions along which measurements will be taken are shown where?
at the top as column headings
Managers think of business in terms of what?
business dimensions
Info. Packages: An artifact for determining requirements for a data warehouse system is based on what?
business dimensions.
primary goal of requirements definition phase
compile information packages for all the subjects for the data warehouse.
OLTP systems are primarily what systems?
data capture systems.
The approach of including a transaction identifier in the fact table is often referred to in the literature and practice as a what? Where what?
degenerate dimension, where the term "degenerate" signifies "mathematically simpler than."
Dimension tables are what to maximize performance?
denormalized
Star schema: The dimension tables are purposely what?
denormalized
For efficient query performance, choose an attribute from the dimension table and go directly to the what?
fact table
In the star schema, the chosen subject of analysis is represented by a what?
fact table.
Info. Packages: Rows represent the what?
hierarchy
Drilling Down, Rolling Up: The attributes in a dimension table provide the ability to get to the details from what to what?
higher levels of aggregation to lower levels of details.
Values for the surrogate keys are typically what?
simple auto increment integer values
Surrogate key: Typically, in a star schema all dimension tables are given a what also called a surrogate key
simple, non composite system generated key
Business dimensions are relevant to what?
the business to subject of analysis
Dimension tables (dimensions) contain descriptions of what to which what?
the business, organization, or enterprise to which the subject of analysis belongs
Fact tables contain measures related to what and what?
the subject of analysis the foreign keys (associating fact tables with dimension tables)
Two of the most typical additional attributes that can appear in the fact table are what?
the transaction identifier and the transaction time.
Info. Packages: This concept helps us to give a concrete form to what 3 things?
various insights, unclear thoughts, opinions expressed during the process of collecting requirements.
STAR schema: Normalized or not a normalized model?
Not a normalized
What systems are needed to run the day-to-day business and returns are seen when?
OLTP immediately
surrogate key
a simple, non-composite, system-generated key
Business Dimensions: three dimensions form a collection of what?
cubes.
Dimension tables (dimensions) contain what?
descriptions of the business, organization, or enterprise to which the subject of analysis belongs.
Data warehouse systems are what systems?
information delivery systems
The actual proposed usage of a data warehouse could be unclear, the business dimensions used by the managers for decision making are what?
quite clear.
Planning DW is about what and what?
solving users' problems and providing strategic information to the user
Columns in dimension tables contain descriptive information that is often what, but can also be what? This information provides a what?
that is often textual (e.g., product brand, product color, customer gender, customer education level) numeric (e.g., product weight, customer income level) basis for analysis of the subject
5 Design Considerations in Dimensional Modeling
Choosing the Process Choosing the Grain Identifying and Conforming the Dimensions Choosing the Facts Choosing the Duration of the Database
Star Schema: Use what to select rows from dimension tables?
Filtering conditions
Fact table: For the PK, if use a single compound primary key whose length is the total length of the keys of the individual dimension tables or if use a generated primary key independent of the keys of the dimension tables. What must also be kept in the fact table as additional attributes. Increases the size of the fact table.
Foreign keys
Are all the attributes in a dimension table directly related to the other attributes in the table?
No
Narrow dimension keys → what fact table → Better or worse performance
Thinner Better
Star Schema (or Star Joint Schema) according to Kimball Ralph: A specific organization of a database in which a fact table with a what is joined to a number of what, each with a what?
composite key single level dimension tables single, primary key
The STAR schema defines the what in exactly the same way users normally visualize the relationships?
join paths
The Fact table is on the what side of 1:M relationships with dimension tables?
many side
If there are more than three dimensions, visualize what?
multidimensional cubes, also called hypercubes
Surrogate key values have no meaning or purpose except to give each dimension a what that serves as a what within the dimensional model instead of the operational key?
new column primary key
Info. Packages: Useful for taking data warehouse development to what?
next phases (relational DB star schemas)
Fact tables: The measures in the fact tables are typically what and are intended for what and what?
numeric mathematical computation and quantitative analysis
Business Dimensions: analysis of sales units along the three business dimensions of what, what, and what?
product, time, and geography.
Star schema maps the dimensional model to the what?
the relational model
In each of the small dimensional cubes, you will find the what for what?
the sales units for that particular slice of time, product, and geographical division.
Users can tell you how they think about the business: Need to look at what 3 things?
what measurement units are important for them How they measure success how they combine the various pieces of information for strategic decision making.
What should be the PK of the Fact Table? Three alternatives:
A single compound primary key whose length is the total length of the keys of the individual dimension tables. A concatenated primary key that is the concatenation of all the primary keys of the dimension tables (most typical) A generated primary key independent of the keys of the dimension tables.
The result of dimensional modeling is a what containing what and what?
dimensional schema containing facts and dimensions