WEEKS 5 & 6
Mapping entities with multivalued attributes into relational database constructs
- double circle - multi valued attribute has another separate table with the primary key of the entity as the foreign key. - the column of the FK and the multi valued attribute form a composite primary key in this new relation
Primary key
column (or a set of columns) whose value is unique for each row. Each relation must have a primary key. The name of the primary key column is underlined in order to distinguish it from the other columns in the relation
Foreign key
column in a relation that refers to a primary key column in another (referred) relation. A mechanism that is used to depict relationships in the relational database model. For every occurrence of a foreign key, the relational schema contains a line pointing from the foreign key to the corresponding primary key
logical data warehouse modeling
creation of the data warehouse data model that is implementable by the DBMS software
Developing front end (BI) applications -
designing and creating applications for indirect use by the end-users. Front-end applications are included in most data warehousing systems and are often referred to as business intelligence (BI) applications. Front-end applications contain interfaces (such as forms and reports) accessible via a navigation mechanism (such as a menu)
Primary key constraint
each relation must have a primary key, which is a column (or a set of columns) whose value is unique for each row
Autonumber data type option
enables automatic generation of consecutive numeric data values in a column
Entity integrity constraint
in a relational table, no primary key column can have null (empty) values. A rule stating that no primary key column can be optional. Every RDBMS enforces this rule
operational data sources
include the databases and other data repositories which are used to support the organization's day-to-day operations. A data warehouse is created within an organization as a separate data store whose primary purpose is data analysis
Creation of relational schema straight from requirements
is not advisable requirements should be accompanied by the ER modeling and then followed by mapping the ER model into a subsequent relational schema
Relational database model
logical database model that represents a database as a collection of related tables. collection of related relations within which each relation has a unique name.
Column
attribute / field
Mapping 1:1 relationships
1:1 relationships are mapped in the same way as 1:M relationships. One of the resulting relations will have a foreign key pointing to the primary key of another resulting relation. One of the mapped relations is chosen to have a foreign key referring to the primary key of the other mapped relation. In cases when there is no particular advantage in choosing which resulting relation will include a foreign key, the choice can be arbitrary. In other cases one choice can be more efficient than the other
Data mart
A data store based on the same principles as a data warehouse, but with a more limited scope
Retrieval of analytical information
A data warehouse is developed for the retrieval of analytical information, and it is not meant for direct data entry by the users. The only functionality available to the users of the data warehouse is retrieval. The data in the data warehouse is not subject to changes. The data in the data warehouse is referred to as non-volatile, static, or read-only
Detailed and/or summarized data
A data warehouse, depending on its purpose, may include detailed data or summary data or both. A data warehouse that contains the data at the finest level of detail is the most powerful
Data warehouse components
Source systems, Extraction-transformation-load (ETL) infrastructure, Data warehouse, Front-end applications
Creating ETL infrastructure
Creating necessary procedures and code for: Automatic extraction of relevant data from the operational data sources & Transformation of the extracted data, so that its quality is assured and its structure conforms to the structure of the modeled and implemented data warehouse. The seamless load of the transformed data into the data warehouse. Due to the amount of details that have to be considered, creating ETL infrastructure is often the most time and resource consuming part of the data warehouse development process
User defined constraints
Database constraints that are added by the database designer
Mapping derived attributes
Derived attributes are not mapped as a part of the relational schema. They are implemented as a part of the database front-end application
Granularity of the table
Describes what is depicted by one row in the table
Dependent data mart
Does not have its own source systems. The data comes from the data warehouse
relational schema
ER diagram. collection of relations.
Mapping entities with composite attributes into relations
Each component of a composite attribute is mapped as a column of a relation. The composite attribute itself does not appear in the mapped relation
Independent data mart
Stand-alone data mart, created in the same fashion as the data warehouse. Independent data mart has its own source systems and ETL infrastructure
Mapping multiple relationships between the same entities
Each relationship is mapped
Relation
relational table / table
Data warehouse use
the retrieval of the data in the data warehouse
Mapping entities with unique composite attributes into relations
An entity whose only unique attribute is a composite attribute is mapped as a relation with a composite primary key
Mapping associative entities
Associative entities are mapped into relational database constructs in the identical way as M:N relationships
Referential integrity constraint
In each row of a relation containing a foreign key, the value of the foreign key EITHER matches one of the values in the primary key column of the referred relation OR the value of the foreign key is null (empty). A rule that defines values that are valid for use in foreign keys. In a relational schema lines pointing from the foreign key to the corresponding primary key are referred to as referential integrity constraint lines
mapping entities into relations
Each regular entity becomes a relation Each regular attribute of a regular entity becomes a column of the newly created relation. If an entity has a single unique attribute, then that attribute becomes the primary key in the resulting mapped relation
Mapping M:N unary relationships
In addition to the relation representing the entity involved in a unary M:N relationship, another relation is created to represent the M:N relationship itself. This new relation has two foreign keys, both of them corresponding to the single primary key of the relation representing the entity involved in the unary M:N relationship. Each of the foreign keys is used as a part of the composite primary key of the new relation
Mapping M:N relationships
In addition to the two relations representing the two entities involved in the M:N relationship, another relation is created to represent the M:N relationship itself. This new relation has two foreign keys, corresponding to the primary keys of the two relations representing the two entities involved in the M:N relationship. The two foreign keys form the composite primary key of the new relation
Source systems
In the context of data warehousing, source systems are operational databases and other operational data repositories (in other words, any sets of data used for operational purposes) that provide analytically useful information for the data warehouse's subjects of analysis. Every operational data store that is used as a source system for the data warehouse has two purposes: 1. The original operational purpose 2. As a source system for the data warehouse. Source systems can include external data sources
Indirect and Direct use
Indirect use (Via the front end (BI) applications) & Direct use (Via the DBMS, Via the OLAP (BI) tools)
Mapping 1:1 unary relationships
Mapped in the same way as 1:M unary relationships
Mapping entities with candidate keys (multiple unique attributes) into relations
One of the candidate keys is chosen by the database designer as the primary key during the mapping process. Other candidate keys are mapped as non-primary key columns
Mapping entities with optional attributes into relations
Optional attribute of an entity is mapped as an optional column
Steps in the development of a DW
Requirements collection, definition, and visualization - results in the requirements specifying the desired capabilities and functionalities of the future data warehouse. The requirements are based on the analytical needs that can be met by the data in the internal data source systems and available external data sources. The requirements are collected through interviewing various stakeholders of the data warehouse. In addition to interviews, additional methods for eliciting requirements from the stakeholders can be used
Mapping ternary relationships
Ternary relationships are used as many-to-many-to-many relationships. A new relation is created with foreign keys from the participating entities forming a composite primary key of the new relation - in a new table, 3 FKs of the 3 table's PKs
Requirements collection, definition, and visualization
The collected requirements should be clearly defined and stated in a written document, and then visualized as a conceptual data model
Integrated
The data warehouse integrates the analytically useful data from the various operational databases (and possibly other sources). Integration refers to this process of bringing the data from multiple data sources into a singular data warehouse.
Structured repository
The data warehouse is a database containing analytically useful information. Any database is a structured repository with its structure represented in its metadata
Data Warehouse
The data warehouse is sometimes referred to as the target system, to indicate the fact that it is a destination for the data from the source systems. A typical data warehouse periodically retrieves selected analytically useful data from the operational data sources
Implicit constraints
The implicit relational database model rules that a relational database must satisfy in order to be valid. Each relation in a relational schema must have a different name. Each row must be unique, Domain constraint, In each row, each value in each column must be single valued, The order of columns is irrelevant, The order of rows is irrelevant, Primary key constraint, Entity integrity constraint, Referential integrity constraint
ETL infrastructure
The infrastructure that facilitates the retrieval of data from operational databases into the data warehouses. Extracting analytically useful data from the operational data sources. Transforming such data so that it conforms to the structure of the subject-oriented target data warehouse model (while ensuring the quality of the transformed data). Loading the transformed and quality assured data into the target data warehouse
Mapping 1:M unary relationships
The relation mapped from an entity involved in a 1:M unary relationship contains a foreign key that corresponds to its own primary key - its own PK is an FK in the same table. (ClientID (PK) is referred by ClientID (FK))
Mapping 1:M relationships
The relation mapped from the entity on the M side of the 1:M relationship has a foreign key that corresponds to the primary key of the relation mapped from the 1 side of the 1:M relationship.
Enterprise wide
The term enterprise-wide refers to the fact that the data warehouse provides an organization-wide view of the analytically useful information it contains
Historical
The term historical refers to the larger time horizon in the data warehouse than in the operational databases
Subject oriented
The term subject-oriented refers to the fundamental difference in the purpose of an operational database system and a data warehouse. An operational database system is developed in order to support a specific business operation. A data warehouse is developed to analyze specific business subject areas
Time variant
The term time variant refers to the fact that a data warehouse contains slices or snapshots of data from different periods of time across its time horizon. With the data slices, the user can create reports for various periods of time within the time horizon
Mapping unary relationships
Unary relationships in ER diagrams are mapped in the same way as binary relationships
Data warehouse frontend (BI) applications-
Used to provide access to the data warehouse for users who are engaging in indirect use
Business Rules
User defined constraints that specify restrictions on databases that are not a part of the standard notation for creating ER diagrams
Mapping weak entities
Weak entities are mapped in a same way as regular entities with one addition. The resulting relation has a composite primary key that is composed of the partial identifier and the foreign key corresponding to the primary key of the owner entity
in order for a table to be relational
Within one table, each column must have a unique name. Within one table, each row must be unique. All values in each column must be from the same (predefined) domain. Within each row, each value in each column must be single valued (one value from a predefined domain, within each row in each column).
Composite primary key
a primary key that is composed of multiple columns. Column names of a composite primary key are underlined, because combined together they form the primary key
Domain constraint
all values in each column must be from the same predefined domain
Data warehouse administration and maintenance
performing activities that support the data warehouse end user, including dealing with technical issues, such as: Providing security for the information contained in the data warehouse & Ensuring sufficient hard-drive space for the data warehouse content. Implementing the backup and recovery procedures
Designer created primary key
primary key column, not called for by the original requirements, added to a table by the database designer. Often used in conjunction with the autonumber data type option
Data warehouse deployment
releasing the data warehouse and its front-end (BI) applications for use by the end users
Relational database constraints
rules that a relational database has to satisfy in order to be valid
Data warehouse
separate analytical database. The performance of operational day-to-day tasks involving data use can be severely diminished if such tasks have to compete for computing resources with analytical queries. It is often impossible to structure a database which can be used in an efficient manner for both operational and analytical purposes. The data warehouse is a structured repository of integrated, subject-oriented, enterprise-wide, historical, and time-variant data. The purpose of the data warehouse is the retrieval of analytical information. A data warehouse can store detailed and/or summarized data.
Relation
table in a relational database. A table containing rows and columns . The main construct in the relational database model. Every relation is a table, not every table is a relation. Order of columns is irrelevant. Order of rows is irrelevant.
Analytical information
the information collected and used in support of analytical tasks. Analytical information is based on operational (transactional) information
Operational information (transactional information)
the information collected and used in support of day to day operational needs in businesses and other organizations
row
tuple / record
Creating the data warehouse
using a DBMS to implement the data warehouse data model as an actual data warehouse. Typically, data warehouses are implemented using a relational DBMS (RDBMS) software
Relational schema
visual depiction of the relational database model