Database Design
Map Dependent (Weak) Entities
Becomes a separate table with a foreign key taken from the primary key of the table for the strong entity Primary key is composed of the partial identifier of the dependent entity plus the primary key from the table for the strong entity (thus, creating a composite PK)
Data Integrity
Business rules may be defined that maintain the integrity of the data that is manipulated
Business rules
Can be used to define or constrain some aspect of a business's structure or processes Database systems should support business rules that are relevant to the specific functions supported by the system
Two Cases When you Must Use Associative Entities
Case 1: If it is necessary to assign an identifier to uniquely identify each occurrence of the M:M relationship between the original entities, then an associative entity must be created. (Entities have identifiers; relationships do not.) Case 2: In a ternary relationship, if the optionalities at all three entities are not identical, then an associative entity must be created.Otherwise, the ERD cannot depict the interactions between the entities without ambiguity.
Object-Relational Database Model
Combine: Ability of object technology to handle advanced relationship types Data integrity, reliability, and recovery features of the traditional relational model Represented by the most popular and powerful of modern database management systems E.g., Oracle, IBM DB2, Microsoft SQL Server
Map Regular Entities to Tables
Composite Attributes: Use only their simple, component attributes Multivalued Attributes: Become a separate table with a foreign key taken from the table for the original entity Derived Attributes: Are not included in a relational schema (since, by definition, they represent data that are not stored, only calculated as needed)
Hardware and software characteristics
Computer, network, and DBMS characteristics
Map Supertype/Subtype Structures
Create a separate table for the supertype and each of the subtypes Assign common attributes, including subtype discriminator, to the supertype table Assign to the subtype tables those attributes unique to each subtype Assign to the subtype tables the primary key of the supertype table (which also functions as a FK referencing the supertype)
Relational Database Model Components
Data Structure, Data Manipulation, and Data Integrity
Data Structure
Data are organized in two-dimensional tables (also called relations) with columns and rows
Information
Data processed to increase the knowledge of the people who use it
Application characteristics
Data requirements and application priorities
Operational requirements
Data security and backup & recovery concerns
Data Manipulation
Data stored in the tables may be manipulated through the use of a command language (Structured Query Language - SQL - was developed expressly for this purpose)
Physical database design inputs include
Data tables described in logical database design, Business environment requirements, Data characteristics, Application characteristics, Operational requirements, and Hardware and software characteristics
Metadata
Data that describes the properties or charac-teristics of other data (a means of providing context for data)
Data characteristics
Data volume and volatility assessments
Foreign keys
Define relationships between tables
Transaction Processing Systems (TPS)
Designed to support the processing of everyday operational business transactions (i.e., retrieving, adding, modifying, and deleting data) Represent the vast majority of database systems Are the primary focus of this course
Relational Database Model Constraints
Domain Integrity, Policy Integrity, Entity Integrity, and Referential Integrity
Primary key
Enforces uniqueness for each record
Entity-Relationship Diagram (ERD)
Entities: named as singular nouns Attributes: named as singular nouns Relationships: named as verbs or verb phrases
E-R Model
Entity, Attribute, and Instance
Data
Facts that can be recorded and stored using computer media
Removing Transitive Dependencies
For each nonkey attribute that is a determinant in a table, create a new table; that attribute becomes the primary key of the new table Move all of the attributes that are functionally dependent on that determinant attribute from the old table to the new table Leave the attribute that serves as the primary key in the new table in the old table to serve as a foreign key to allow the tables to be related
Referential Integrity
For every value of a foreign key there must be a primary key with that value For every value of CustomerID in the Order table there must be a matching value of CustomerID in the Customer table The primary key must exist before the foreign key can be defined
Database Management System (DBMS)
General-purpose software system that facilitates the processes of defining, creating, using, and maintaining databases
Map Associative Entities
Identifier Not Assigned Default primary key for the table formed for the associative entity is typically a composite PK composed of (at least) the primary keys of the two entities Identifier Assigned May use if one exists that is natural and familiar to end-users Must use if the composite PK can not be made unique by adding intersection data
candidate key
If a table has more than one column that provides a way of uniquely identifying the rows of the table When there is more than one candidate key, one of them must be chosen to be the primary key of the table
Third Normal Form
Important points about the 3NF structure are: It is free of unnecessary data redundancy All foreign keys appear where needed to logically tie together related tables It is the same structure that would have been derived from a properly drawn ERD of the same business environment Normalization can be used to check the structures of any relational tables, even those created through ERD conversions, for 3NF (e.g., to check for the possible existence of previously unidentified transitive dependencies)
Functional Dependency
A constraint between two attributes in which the value of one attribute is determined by the value of another attribute. The value of one attribute (the determinant) determines the value of another attribute A -----> B "A determines B" or "B is functionally dependent on A"
Attribute
A discrete data element Describes an entity (i.e., is a characteristic) Meaningful (for the system being modeled) Value may be required or optional
Data Normalization
A formal process for grouping attributes into tables A tool to validate and improve logical designs so that they satisfy certain constraints to avoid unnecessary duplication of data The process of decomposing a table with anomalies into two or more, smaller, well-structured tables
Inheritance Rule
An instance of a subtype is also an instance of the supertype Subtype entities inherit values of all attributes of the supertype
Database
An organized collection of logically related data. Includes data and metadata.
Physical Database Design
Physical design involves enhancing and/or modifying a logical database design to improve the performance of the run-time environment This may involve adding additional structures (e.g., indexes) to the logical database design, and/or modifying the 3NF tables produced by the logical database design Thus, the well-structured relational tables produced either by the conversion from ERDs or by the data normalization process form the starting point for physical database design
Supertype/Subtype Relationships Rule
Relationships shown at the supertype level indicate that all subtypes will participate in the relationship The instances of a subtype may participate in a relationship unique to that subtype; in this situation, the relationship is shown at the subtype level
Business environment requirements
Response time and throughput requirements
Decision Support Systems (DSS)
Specifically designed to aid managers in decision-making tasks Contain data that has been accumulated over time to aid in trend analysis and forecasting Data about subjects must be organized in such a way that it provides a unified, overall picture of all the important details about the subjects over time Data warehouses are the most common example Data marts are subsets of data warehouses designed to support a part of an organization
Relational Model
Table (Relation), Column (Attribute), and Row (Tuple)
First Normal Form
Table has no multivalued attributes a table that has multivalued attributes is unnormalized in this context, a multivalued attribute is sometimes referred to as a repeating group
Second Normal Form
Table is in 1NF and has no partial functional dependencies (that is, every nonkey attribute is fully functionally dependent on the entire primary key) This means that every nonkey attribute must be determined by the entire primary key, not by only part of the primary key
Third Normal Form
Table is in 2NF and no transitive dependencies (functional dependencies between nonkey attributes) This means that no nonkey attribute should be able to determine another nonkey attribute
Well-Structured Tables
Tables that contains minimal redundancy and that allow users to insert, delete, and modify table rows without errors or inconsistencies are considered to be well-structured When designing relational database tables, we seek to eliminate anomalies through the use of normalization
Logical Database Design
The process of arranging the entities and attributes of the conceptual data model of the business environment into the tables and columns of a relational database structure to serve that business in an information system The goal is to create well-structured tables (i.e., free of anomalies) that properly reflect the organization's business environment (see the following slides for an explanation of anomalies)
Candidate Keys as Determinants
There is one exception to the rule that, in third normal form, nonkey attributes are not allowed to define other nonkey attributes The rule does not hold if the defining nonkey attribute is a candidate key of the table Thus, candidate keys in a table may define other nonkey attributes without violating third normal form
Rule Restricting the Use of Supertype/Subtypes
You may use this type of structure only when either (or both) of the following are present: 1. When there are attributes that apply to some (but not all) of the instances of an entity type 2. When the instances of a subtype participate in a relationship unique to that subtype
Overlap
a composite attribute whose subparts pertain to different subtypes; each subpart contains a boolean value (yes or no) to indicate whether or not the instance belongs to the associated subtype
Composite
a related group of attributes example: Address (Street, City, State, Zip)
Disjoint
a simple attribute with alternative values to indicate the possible subtypes
Subtype
a subgrouping of a supertype entity that is meaningful to an organization (not just possible) shares all attributes of its supertype, but also has unique attributes of its own and/or : has relationships with other entities distinct from those of other subtypes
Enhanced Entity-Relationship Model
addressed shortcomings of the basic E-R model not universally agreed upon in some respects introduced the supertype/subtype structure
Identifier
an attribute that uniquely identifies each entity instance (e.g., Social Security Number) can be one or more attributes (e.g., First Name, Middle Name, and Last Name) Create an identifier if there is no obvious identifying attribute (e.g., Part Number) Identifier of a dependent (weak) entity is its partial identifier combined with identifier of its owner
candidate key
any attribute that would qualify as an identifier (e.g., SSN and Employee ID)
Deletion Anomalies
are experienced when a value for a column we wish to keep is unexpectedly removed when a value for another column is deleted e.g., cannot delete the sole order for a customer without deleting the only copy of the customer's information also
Update Anomalies
are experienced when changes to multiple rows of a table are needed to effect an update to a single value of a column e.g., cannot completely update a customer's address without changing it for every order placed by that customer
Insertion Anomalies
are experienced when we attempt to store a value for a column but cannot because the value of another column is unknown e.g., cannot add a new customer's information until an order number is ready to be entered (because OrderID column serves as the primary key for the table and cannot have null values)
Simple
at the atomic, most basic level
Derived
calculated, but not stored (e.g., Total)
Referential Integrity
constrains a foreign key value to match a primary key value in a related table more on this concept to come
Domain Integrity
constrains allowable values for columns(e.g., data type, column size, maximum value, etc.)
Entity Integrity
constrains data operations to business rules (e.g., only managers may place vendor orders)
Dependent (Weak) Entity
dependent on a strong entity; cannot exist on its own does not have a unique identifier, only a partial identifier (shown w/double underline) represented with rectangle symbol with lines in corners
Strong Entity
exists independently of other entities has its own unique identifier (shown w/single underline) represented with regular rectangle symbol
Conceptual Data Modeling
goal is to arrive at an understanding of the principal data sources and data elements of interest to the business or organization, and the relationships between the data sources, in order to satisfy requirements for information tool is the Entity-Relationship Diagram (ERD)
Logical Data Modeling
goal is to convert the conceptual model into a form that can be utilized to create an IS (e.g., a relational database) tool is the Relational Schema
Physical Data Modeling
goal is to specify all identifying & operational characteristics of the data that will be recorded in the information system tools include the Data Dictionary and others
Each table (relation)
in a given database has a unique name
Every row (tuple)
in a table is unique
Referential Integrity
involves the circumstance of referring to a specific data row in one table in the database based on the value stored in a column in another table This constraint states that a foreign key value can not be stored in a table unless a matching value can be found in the primary key of the related table
A relationship:
is an association between entities: The degree of a relationship indicates the number of entities involved in the relationship specified as unary, binary, ternary, or higher (n-ary) The cardinality of a relationship describes the number of instances of one entity that may be associated with one instance of another entity specified as either one or many (many = one or more) The optionality (modality) indicates whether partici-pation in the relationship is required or not specified as either mandatory or optional
referential integrity arrows
linkages between the tables used to connect the FK of one table to the PK of another table
Identifying Relationship
links strong entities to dependent (weak) entities represented with diamond symbol with lines in corners
Data dictionary
location where definitions of data (metadata) are maintained
Multivalued
multiple values per entity instance are possible (e.g., Degree, Club, Skill)
unary relationship
only one entity is involved.
Single Valued
only one value per entity instance (e.g., Last Name, Date Of Birth)
Anomalies
problems that are experienced when attempting to manipulate stored data
E. F. Codd
published an article titled "A Relational Model of Data for Large Shared Data Banks" in the Communications of the ACM journal.
Data access
read, insert, update, and delete
derived attributes
represent values that are calculated when needed and not stored(e.g., values generated by a query for use on a report) These are only optionally shown on an ERD When shown, diagrammed within square bracketsExample: Acquisition Cost * (1 + Percent Markup) = Selling Price
Foreign Keys
serves as a primary key in another table means by which tables are linked together to repre-sent unary, binary, ternary, etc. relationships
Entity
something of interest in the environment (e.g., person, place, object, event, concept) that is described by attributes and that will have numerous instances we wish to track
Columns
store attribute data
data
the actual values of interest
ternary relationship
three entities are simultaneously involved.
binary relationship
two entities are involved.
Rows
unique instances or records of entity
Each column (attribute)
within a given table has a unique name Every column (attribute) is single-valued Thus, multivalued attributes require special treatment when designing relational tables
Map Ternary (and n-ary) Relationships
One table for each original entity and one for the common relationship (associative entity) (i.e., a ternary relationship maps to a total of four tables) Table representing the associative entity has foreign keys to each entity in the relationship PK of the table formed for the associative entity is typically a composite PK composed of (at least) the primary keys of the three entities
Map Binary Relationships
One-to-Many - Primary key on the one side becomes a foreign key on the many side Many-to-Many - Create a new table; the primary key of the new table is typically a CPK comprised of (at least) the primary keys of the two entities involved in the relationship One-to-One - Primary key on the mandatory side becomes a foreign key on the optional side (if optionalities are asymmetric)
Map Unary (Recursive) Relationships
One-to-Many: Recursive foreign key in the same table (also true for unary One-to-One) Many-to-Many (e.g., bill of materials): Two tables result: One for the entity type One for an associative relation in which the primary key has two fields, both taken from the identifier of the original entity
Supertype
(example: Employee) a generic entity that has a relationship with one or more subtypes
Supertype/Subtype Constraints
1. Completeness Constraint: Addresses the question of whether an instance of a supertype must also be a member of at least one subtype 2. Disjointness Constraint: Addresses the question of whether an instance of a supertype may simultaneously be a member of two (or more) subtypes
Disjointness Constraint:Two Possible Rules
1. Disjoint Rule (Letter "d" notation) Specifies that if an entity instance is a member of one subtype, it cannot simultaneously be a member of any other subtype (Example: all PERSONS are either MALE or FEMALE) 2. Overlap Rule (Letter "o" notation) Specifies that an entity instance can simultaneously be a member of two (or more) subtypes (Example: an ATHLETE can be both a RUNNER and a JUMPER)
Two Processes to Develop Supertype/Subtypes
1. Generalization: The process of defining a more general entity type from a set of more specialized entity types (A "bottom-up" approach) 2. Specialization: The process of defining one or more subtypes of a general entity based on distinguishing attributes or relationships (A "top-down" approach)
Completeness Constraint:Two Possible Rules
1. Total Specialization Rule (Double-line notation) Specifies that each entity instance of the super-type must be a member of some subtype in the relationship(Example: all STUDENTS are either UNDERGRADUATE or GRADUATE students) 2. Partial Specialization Rule (Single-line notation) Specifies that an entity instance of the super-type is allowed to not belong to any subtype (Example: FACULTY and STAFF are not the only possible types of EMPLOYEE)
alternate key
A candidate key that is not chosen to be the primary key
Primary Key (PK)
A column (or columns) whose value uniquely identifies or differentiates each row in a table(e.g., EmployeeID) Required for every table in a relational database Composite Key - a primary key made up of more than one column (e.g., FirstName + MiddleName + LastName)
Foreign Key (FK)
A column in one table that serves as the primary key of another table in the same database (thus serving as a link between the two tables)