Database Design Quiz 2
Maps to a Foreign Key
(1 : *) binary - define FK on "many" side, points to a CK on "1" side, all relationship attributes stored in "many" relationship, no relation necessary (* : *) binary - create a relation including all relationship attributes, each entity is a FK (1 : 1) binary - optional = define FK for relation associated with mandatory, both optional = either can have FK Multi-valued attributes - define relation for multi-valued attribute, create FK to relation representing the containing entity
Steps of Physical Database Design
1. Define base relations for the chosen DBMS - specify unique name for each relation, specify list of simple attributes and domains, default values, NULLs permitted, specify PK and FK, specify referential integrity constraints 2. Design representation of derived data - ensure data needed to derive field is present, some designers store derived field as a simple field and define rules for it to be updated if difficult to derive 3. Design general constraints for the table, not enforced in MySQL 4. Design file organizations and indexes, MySQL uses InnoDB structure a version of B trees, indexes: storage mechanism used to speed up data retrieval, estimate disk space requirements 5. Design user views 6. Design security mechanism
Steps of Logical Database Design
1. Derive relations for logical data model - strong entity types, weak entity types, (1 : *) binary relationship, (1 : 1) binary relationship, (1 : 1) recursive relationship, superclass/subclass relationship, (* : *) binary relationship, complex relationship, multi-valued attributes 2. Validate relations using normalization 3. Validate relations against user transactions 4. Check integrity constraints 5. Review logical data model with user
Steps of Conceptual Database Design
1. Identify the entity types - Identify nouns in the user requirement specification, entities should be major objects NOT properties of other objects, objects that have existence in their own right, look for entity types that may be synonyms of each other, all entity names should be well descriptive 2. Identify the relationship types - verbs in the user requirement specification, classify relationships as complex, binary, or recursive, determine the multiplicity of each relationship, check for fan and chasm traps, document and assign meaningful names to the relationships 3. Identify the entity and relationship attributes - properties of qualities of entity types, classify each as simple vs composite, single vs multi-valued, derived (ensure attribute can be derived from given attributes) 4. Identify the attribute domains 5. Identify the candidate keys and primary keys - candidate keys, choose primary key from candidate key that has minimal set of attributes, least likely to be updated, fewest number of bytes, lowest max value, easiest to manipulate for a user 6. Apply generalization (is-a), aggregation (has-a), and composition (part-of) - generalization to represent super and subclasses or an entity type (participation - all members of superclass must fall into a subclass, mandatory or optional) (disjoint - subclasses do not share members, and or or), compositions represents an entity type that composes another entity type, aggregation represents an entity type that has a collection of another entity type 7. Check model for dependency - ensure entity types are really different and not synonyms, remove relationships that provide the same info as another relationship, consider time and its effect on each relationship 8. Validate conceptual model against user transactions - must provide a response for all user defined transactions, if model can't provide answer then conceptual model is not complete 9. Review model with user GOAL : produce an unambiguous representation of the data domain and its processes, communicate between a technical team and a nontechnical team to ensure the proposed technical solution fulfills the needs of the enterprise
1NF to 2NF Process
1. Identify the primary key for the 1NF relation 2. Identify the functional dependencies in the relation 3. If partial dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their determinant
2NF to 3NF Process
1. Identify the primary key in the 2NF relation 2. Identify functional dependencies in the relation 3. If transitive dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their dominant
UNF to 1NF Process
1. Nominate an attribute or group of attributes to act as the key for unnormalized table 2. Identify the repeating group in the unnormalized table which repeats for the key attributes 3. Remove repeating groups by entering appropriate data into the empty columns of rows containing the repeating data or by placing the repeating data along with a copy of the original key attribute into a separate relation
Relational Database
A collection of normalized relations with distinct relation names
Subclass
A distinct sub grouping of occurrences of an entity type
Attribute
A named column of a relation
Tuple
A row of a relation
Relation
A table with columns and rows, only applies to logical structure of the database, not the physical structure
General Constraints
Additional rules specified by users or database administrators that define or constrain some aspect of the enterprise
Updating Views
All updates to a base relation should be immediately reflected in all views that reference that base relation, if view is updated, underlying base relation should reflect change Restrictions of types of modifications: - Updates are allowed if query involves a single base relation and contains a candidate key of base relation - Updates are not allowed involving multiple base relations - Updates are not allowed involving aggregation or grouping operations Classes of views are defined as theoretically not updatable, theoretically updatable, partially updatable
Superkey
An attribute or set of attributes that uniquely identifies a tuple within a relation
Attribute Inheritance
An entity in a subclass represents same 'real world' object as in superclass, and may possess subclass-specific attributes, as well as those associated with the superclass
Superclass
An entity type that includes one or more distinct sub groupings of its occurrences
Simple Attribute
Attribute composed of a single component with an independent existence
Composite Attribute
Attribute composed of multiple components, each with an independent existence
Foreign Key
Attribute or set of attributes within one relation that matches candidate key of some (possibly same) relation
Multi-valued Attribute
Attribute that hold multiple values for each occurrence of an entity type
Single-valued Attribute
Attribute that holds a single value for each occurrence of an entity type
Derived Attribute
Attribute that represents a value that is derivable from value of a related attribute, or set of attributes, not necessarily in the same entity type
Functional Dependencies
Can be used to identify schemas with such problems and to suggest schema refinements, each relation is dependent on the primary key since the primary key identifies the values for the other attributes, if no attributes are dependent on another (not including primary key) then there is no redundancy
Primary Key
Candidate key selected to identify tuples uniquely within relation
Alternate Keys
Candidate keys that are not selected to be primary key
MySQL Data Type Categories
Character, Numeric, Date and Time, Large Object, Spatial
Referential Integrity defines DB behavior
Define database behavior to ensure a child relation NEVER references a parent relation instance that doesn't exist Changes in child relation: CREATE a new record in the child relation if all FK attributes are NULL, if not NULL ensure parent tuple exists UPDATE a FK attribute in child relation DELETE a record from the child relation, operation can't violate referential integrity Changes in parent relation: UPDATE PK attribute in relation, identify child tuples in other table, may choose to not allow update (ON UPDATE RESTRICT), may choose to allow UPDATE to parent relation to propagate to child (ON UPDATE CASCADE), may choose to remove the link between 2 entities (ON UPDATE SET NULL or ON UPDATE SET DEFAULT) DELETE a record from parent relation, same as above except DELETE not UPDATE CREATE a record in the parent relation, no check to be done
Transitive Dependencies
Describes a conditions where A, B, and C are attributes of a relation such that if A -> B and B -> C then C is transitively dependent on A via B, can potentially cause update anomalies
Physical Database Design
Describes the base relations, file organizations, and indexes and any associated integrity constraints and security measures, once the logical model is created we need to choose the targeted database management system and determine the best method for physically implementing the logical model, each vendor provides different storage mechanisms, constraint representation
Full Functional Dependency
Determinants should have the minimal number of attributes necessary to maintain the functional dependency with the attributes on the right hand-side, indicates that if A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A
Relation Name
Distinct from all other relation names in relational schema, each cell of relation contains exactly one atomic value, each attribute has a distinct name, values of an attribute are all from the same domain, each tuple is distinct, order of attributes or tuples has no significance
View
Dynamic result of one or more relational operations operating on base relations to produce another relation, a virtual relation that does not necessarily actually exists in the database but is produced upon request, at time of request, contents of a view are defined as a query on one or more base relations
Benefits of Normalization
Easier for the user to access and maintain the data, take up minimal storage space on the computer
Conceptual Database Design
Entity relationship modeling (ER) creates a diagrammatic representation of a conceptual model, build a conceptual representation of the data domain, process must identify the necessary entities, relationships and attributes
Weak Entity Type
Entity type that is existence-dependent on some other entity type
Strong Entity Type
Entity type that is not existence-dependent on some other entity type
Normal Form Addresses Dependencies
Free the collection of relations from undesirable insertion, modification and deletion dependencies; if schema has duplicated data in multiple rows, forced to update/delete all copies of a piece of data Reduce the need for restructuring the collection of relations, build an extensible design now as opposed to later Make the relational model more informative to users, cleaner model should be easier to understand Make the collection of relations neutral to the query statistics, designed for general purpose querying
Entity Type
Group of objects with same properties, identified by enterprise as having an independent existence
Normalization
Identifies redundancy that leads to functional dependency, use the process as a validation technique for the defined relations, crucial step in the logical database design process, a series of tests that help identify the optimal grouping of attributes to relations, each step corresponds to a specific normal form Goal = reduce data redundancy
Types of Integrity Constraints
Identifying attributes that are required (for each column decide if it needs to have a value) Attribute domain constraints (list or describe the legal values for each attribute) Multiplicity (ensure the relationship constraints are properly represented) Entity integrity (PK attributes can't hold a NULL value) Referential integrity (FK created in the child tuple linking to existing parent tuple) General constraints
Referential Integrity
If foreign key exists in a relation, either foreign key value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null
Entity Integrity
In a base relation, no attribute of a primary key can be null
Superclass/Subclass Conversion
Mandatory Nondisjoint - single relationship with 1 or more attributes acting as a discriminator for the subclass Mandatory Disjoint - many relations one for each subclass/superclass combination Optional Nondisjoint - 2 relations, 1 relation for the superclass and 1 relation for all of the subclasses, subclass needs a discriminating attribute to differentiate type of subclass Optional Disjoint - Many relations, one relation for the superclass, one relation for each subclass
Binary Relationship Degree Multiplicity Constraints
Most common one-to-one (1 : 1) one-to-many (1 : *) many-to-many(* : *) ex. 1 : 1 - Each branch is managed by one member of staff 0 : 1 - A member of staff can manage zero or one branch 0 : * - Each member of staff oversees zero or more properties for rent 1 : * - Each newspaper advertises one or more properties for rent 0 : 1 - Zero or one entity occurrence 1 : 1 - Exactly one entity occurrence 0 : * - Zero or many entity occurrences 1 : * - One or many entity occurrences 5 : 10 - Minimum of 5 up to a max of 10 entity occurrences 0, 3, 6 -8 - Zero, three, six, seven, or eight entity occurrence
Base Relation
Name relation corresponding to an entity in conceptual schema, whose tuples are physically stored in database
Relation Schema
Named relation defined by a set of attribute and domain name pairs
Unnormalized Form
No primary key or NULL values in the primary key fields, a table that contains an attribute with one or more repeating groups (set), attributes need not be atomic
Degree
Number of attributes in a relation
Degree of a Relationship
Number of participating entities in relationship, two is binary (private owner owns property for rent), three is ternary (staff registers a client at a branch), four is quaternary (A solicitor arranges a bid on behalf of a buyer supported by a financial institution)
Multiplicity for Complex Relationships
Number of possible occurrences of an entity type in an n-ary relationship when other (n - 1) values are fixed
Constraints on Specialization/Generalization
Participation constraint - determines whether every member in superclass must participate as a member of a subclass, may be mandatory or optional Disjoint constraint - describes relationship between members of the subclasses and indicates whether member of a superclass can be a member of one, or more than one, subclass, may be disjoint or nondisjoint Four categories of constraints: 1. Mandatory and disjoint 2. Optional and disjoint 3. Mandatory and nondisjoint 4. Optional and nondisjoint
Connection Traps
Problems that arise when designing a conceptual data model, often due to a misinterpretation of the meaning of certain relationships, Fan Trap - where a model represents a relationships between entity types but pathway between certain entity occurrences is ambiguous Chasm Trap - where a model suggests the existence of a relationship between entity types, but pathway does not exist between certain entity occurrences
Specialization
Process of maximizing differences between members of an entity by identifying their distinguishing characteristics
Generalization
Process of minimizing differences between entities by identifying their common characteristics
Attribute
Property of an entity or a relationship type
Purpose of Views
Provides powerful and flexible security mechanism by hiding parts of database from certain users, permits users to access data in a customized way so that same data can be seen by different users in different ways at same time, can simplify complex operations of base relations
Recursive Relationship
Relationship type where same entity type participates more than once in different roles, relationships may be given role names to indicate purpose that each participating entity type plays in a relationship
Null
Represents value for an attribute that is currently unknown or not applicable for tuple, deals with incomplete or exceptional data, represents the absence of a value and is not the same as zero or spaces which are values
Second Normal Form
Requirement for tables that have composite key, table is in first normal form, every non-primary key attribute is fully functionally dependent on the (entire) primary key, a table in first normal form and having a primary key with only one field is also in 2nd normal form, to get a table in 1st normal form into 2nd normal form, remove partial key dependencies by table decomposition A relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key
Enhanced Entity-Relationship model (EER)
Semantic concepts are incorporated into the original ER model Specialization/generalization = examples of additional concept of EER model, superclass/subclass relationship is one-to-one (1 : 1), superclass may contain overlapping or distinct subclasses, not all members of a superclass need be a member of a subclass
Attribute Domain
Set of allowable values for one or more attributes
Relationship Type
Set of meaningful associations among entity types
Relational Database Schema
Set of relation schemas, each with a distinct name
Maps to a Relation
Strong entity - create a relation that contains all simple attributes Weak entity - create a relation that contains all simple attributes (PK must take into account the owner entity's key) (* : *) binary - include all relationship attributes, each entity in relation is a foreign key (1 : 1) binary - mandatory = combine entities into one relation, optional = define a foreign key for relation associated with mandatory participation, both optional = either can have FK Multi-valued attributes - define relation and create foreign key to relation representing the containing entity Complex relationship - each entity in relation is a foreign key
Candidate Key
Superkey (K) such that no proper subset is a super key within the relation, in each tuple of R, values of K uniquely identify that tuple (uniqueness), no proper subset of K has the uniqueness property (irreducibility)
Third Normal Form
Table is in first and second normal form, no dependencies between 2 non-key attributes, no non-primary-key attribute is transitively dependent on the primary key, solution: decompose the table so that the offending attribute is in a separate table Attribute is fully functionally dependent on the primary key A relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key
Characteristics of a suitable set of relations
The minimal number of attributes necessary to support the data requirements of the enterprise; attributes with a close logical relationship are found in the same relation; minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys
Cardinality
The number of tuples in a relation
Domain
The set of allowable values for one or more attributes
Logical Database Design
Translates the conceptual model to the logical structure of the database, for the relational model we must design the relations for the schema, Once the conceptual model is created we need to target a specific data model
First Normal Form
Tuples in a relation must contain the same number of fields, the domain of each attribute contains atomic values, the value of each attribute contains only a single value, no attributes are sets A relation in which the intersection of each row and column contains one and only one value
Relationship occurrence
Uniquely identifiable association, which includes one occurrence from each participating entity type
Entity Occurrence
Uniquely identifiable object of an entity type
Update Anomalies
What relations that contain redundant information may potentially suffer from Insertion anomaly - tuple being inserted may contain data fields that are inconsistent with data in other tuples in the table Deletion anomaly - deleting a tuple leads to loss of information other than the tuple Modification anomaly - modification of one tuple is dependent on the modifications of other tuples
Multiplicity
main type of constraint on relationships, number (or range) of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship, represents policies established by user or company Made up of two types of restrictions on relationships: cardinality (describes max number of possible relationship occurrences for an entity participating in a given relationship type) participation (determines whether all or only some entity occurrences participate in a relationship)