Week 2
Entity supertypes and subtypes Supertype can have ______________
many subtypes
Entity instance or entity occurrence they are ______ in the _________
Rows Relational Table
Data Models
Simple representation of complex real-data structure
Dara models are important for supporting a ______________
Specific domain problem
Entity Supertypes and Subtypes Criteria to determine the usage The different kinds of instances should each have _______ or _______ that are unique to that_____________
one or more attributes kind of instance
Allows designers to Develop appropriate relationship_________ n ________
participation rules and constraints
The most important characteristic of an entity is its ___________.
primary key
Weak Entity The entity has a___________ that is _________ or _______from the _________ in the _____________.
primary key partially or totally derived parent entity relationship
Composite Keys they are ___________ with more than _________
primary keys one attribute
Connectivity they are terms used to label the ___________
relationship types
Entity supertypes and subtypes Subtype exists within the context of a __________
supertype
Entity they are _________ and _________ used to _______ and _______
unique and distinct object collect and store data
Entity subtype contains _________ of each______
unique characteristics entity subtype
Codd originally established three normal forms: ___, ___, n ___. There are now three others that are generally accepted, but ___ is sufficient for most_________
1NF, 2NF and 3NF 3NF database applications.
BCNF To be in the BCNF, your database must already comply with the _____
3NF
BCNF ·BCNF is considered to be the stronger than ________
3NF
Most tables when reaching ___are also in ______
3NF BCNF (Boyce-Codd Normal Form).
De-Normalization If your database is in higher than _____and suffers from ________ you should first try to improve ___________
3NF performance issues, hardware
·Never de-normalize below ________
3NF.
Specialization Hierarchy Provides the means to:
<Support attribute inheritance <Define a special supertype attribute known as the subtype discriminator <Define disjoint/overlapping constraints and complete/partial constraints
Multi-valued Attribute
An attribute that can have many values for a single occurrence
BCNF Multiple candidate keys in a table will violate the ________.
BCNF
BCNF ·Also known as _____________
Boyce-Codd Normal Form
Disjoint Constraint Assume that one of the business rules dictates that an employee cannot belong to more than one subtype at a time; that is, an employee cannot be a pilot and a mechanic at the same time.
Disjoint Constraint Assume that one of the business rules dictates that an employee cannot belong to more than one subtype at a time; that is, an employee cannot be a pilot and a mechanic at the same time.
__________ first proposed the process of normalization and what came to be known as the ________ in his paper __________
Edgar F. Codd 1st normal form A Relational Model of Data for Large Shared Data Banks.
De-Normalization
In some situations, higher normal forms can potentially slow database performance without reducing data redundancy any further. If your database is in higher than 3NF and suffers from performance issues, you should first try to improve hardware. De-normalization should be the last option to improve performance. Never de-normalize below 3NF.
Data Modeling and Data Models
Models: they are abstraction of real-world object or event Data Models: simple representations of complex real world data structures. They are useful supporting a specific problem domain Data Modeling: iterative and progressive process of creating a specific data model for a determined problem domain
Activity 3 Using the above data model, list all of the attributes of a movie.
Recall that the subtype inherits all of the attributes and relationships of the supertype. Therefore, all of the attributes of a subtype include the common attributes from the supertype plus the unique (unique to that subtype) attributes from the subtype. All of the attributes of a movie would be: ·Prod_Num ·Prod_Title ·Prod_ReleaseDate ·Prod_Price ·Prod_Type ·Movie_Rating ·Movie_Director
Primary Keys
The most important characteristic of an entity is its primary key. A primary key is a single attribute or some contribution of attributes which uniquely identifies each entity instance Its main function is to uniquely identify an entity instance or row within a table Primary key and foreign keys are used to implement relationships among entities
Allows designers to create an _____________
accurate data model
Disjoint subtypes
also known as nonoverlapping subtypes, are subtypes that contain a unique subset of the supertype entity set; in other words, each entity instance of the supertype can appear in only one of the subtypes.
Relationship it describes an________________
association among entities
Specialization hierarchy provides the means to support __________________
attribute inheritance
BCNF There should not be more than one_________
candidate key
BCNF When a table contains only one _________, the ____and the _____are _________.
candidate key 3NF BCNF equivalent
·Entity supertype: Contains __________
common characteristics
Composite primary keys are particularly useful in two cases: 1. As identifiers of ____________, where each primary key combination is allowed only once in the _______________. 2. As identifiers of___________, where the weak entity has a ____________ with the__________
composite entities M:N relationship weak entities strong identifying relationship parent entity
Normalization it reduces __________
data anomalies
Constraints are sets of rules to ensure _____________
data integrity
A surrogate key is a primary key created by the __________________ to simplify the ______________
database designer identification of entity instances
Conceptual Schema: it is the basis for the __________ and ______ of the __________
definition and high level description main data objects
Entity Supertypes and Subtypes Criteria to determine the usage There must be _______, identifiable kinds of the _______in the _________
different entity users environment
Disjoint Constraint it is an entity supertype can have _________ or _________
disjoint or overlapping entity subtype
Specialization Hierarchy Provides the means to define ________________ n _______________
disjoint/overlapping constraints and complete/partial constraints
Second Normal Form (2NF) To apply 2NF, you should move columns that don't depend on the ___________/_________ to ____________
entire primary key (or composite key) another table.
The Entity Relationship Model Graphical representation of ________and their relationship in_____________
entities database structure
Weak Entity The entity is ___________; it cannot exist without the entity with which it has a __________.
existence-dependent relationship
Entity Supertype they are _________ type related to ______ or ______
generic entity one or more entity subtypes
The Conceptual Model it represents a _________ of the _______ by the ______________
global over view entire database entire organization
Entity Relationship Diagram (ERD) they use___________ to model __________
graphic representation database components
A specialization hierarchy depicts the arrangement of ____________ supertypes/parent entities and ___________/____________
higher-level entity lower-level entity subtypes/child entities
Entity Supertypes and subtypes depicts arrangement of_____________ and _______________
higher-level entity supertypes lower-level entity subtypes.
·De-normalization should be the last option to
improve performance.
Overlapping subtypes Example
in a university environment, a person may be an employee, a student, or both. In turn, an employee may be a professor as well as an administrator.
The conceptual method it is software and hardware _______
independent
One practical advantage of a surrogate key is that because it has no ___________, values for it can be generated by the _________to ensure that unique values are always _________
intrinsic meaning DBMS provided
Primary Key
it is a single attribute or some combination of attributes which uniquely identifies each entity instances
Model
it is abstraction of the real word object or event
Data Modeling ________ and ________ of creating a specific data model for a determined _____________
iterative and progressive process problem domain
The conceptual method it has a _________ of data environment
macro-level view
Business Rules 3. Describe ___________ and _________
main and distinguishing characterstics of the data
First Normal Form (1NF) Requirements A table must not have __________
multi-valued attributes (one value per cell).
Allows designer to Understand the ___________, __________, __________ n ___________
nature, role, scope of data and business processes
Second Normal Form (2NF) In 2NF, every _________ must depend on the _____________
non-key column entire (not partial) primary key.
Overlapping Subtypes are subtypes that contain ___________ of the supertype entity set; that is each ______of the supertype may appear in more than ___________
non-unique subsets entity instance one subtype
Disjoint subtypes, also known as ______________ subtypes
nonoverlapping
De-Normalization In some situations, higher _____________ can potentially_________ without reducing ________ any further.
normal forms slow database performance data redundancy
Preferably single-attribute Single-attribute primary keys are desirable but_________. Single-attribute primary keys simplify the ________of __________.
not required implementation foreign keys
Recursive Relationship ·A recursive relationship is one in which a relationship can exist between __________________. For example, a 1:M unary relationship can be expressed by
occurrences of the same entity set an EMPLOYEE may manage many EMPLOYEEs, and each EMPLOYEE is managed by one EMPLOYEE."
First Normal Form (1NF) Requirements A unique______ is identified.
primary key
A Candidate key has the same characteristics of a _________but it was not chosen to be a _____________
primary key primary key
Second Normal Form (2NF) the normal forms are ______________. In order to achieve ______, the tables must already be in _________.
progressive 2NF 1NF
Normalization it is a process of ____________ from ______to improve _____, _____ n _____
removing redundant data tables storage efficiency, data integrity, and scalability.
First Normal Form (1NF) Requirements A table must not contain__________
repeating columns.
Primary keys need to be secure. The selected primary key must not be composed of any attributes that might be considered a ___________ Eg: __________
security risk. SSN
Disjoint subtypes are subtypes that contain a _______ of the supertype entity set; in other words, each entity instance of the ______can appear in only one of the _______
unique subset supertype subtypes
Preferably numeric: _____________ can be better managed when they are ___________
unique values numeric
Reasons for identifying and documenting business rules 1. helps standardize company's _______
view of data
Normalization Process objective is to ensure that each table conforms to the concept of ________________
well-formed relations
Specialization Hierarchy
·A specialization hierarchy depicts the arrangement of higher-level entity supertypes (parent entities) and lower-level entity subtypes (child entities). ·Provides the means to: <Support attribute inheritance <Define a special supertype attribute known as the subtype discriminator <Define disjoint/overlapping constraints and complete/partial constraints
First Normal Form (1NF) Requirements
·A unique primary key is identified. ·A table must not contain repeating columns. ·A table must not have multi-valued attributes (one value per cell).
BCNF
·Also known as Boyce-Codd Normal Form ·BCNF is considered to be the stronger than 3NF ·To be in the BCNF, your database must already comply with the 3NF ·There should not be more than one candidate key ( a candidate key has the same characteristics of a primary key but it was not chosen to be a primary key). ·When a table contains only one candidate key, the 3NF and the BCNF are equivalent. ·Multiple candidate keys in a table will violate the BCNF.
Disjoint Constraint
·An entity supertype can have disjoint or overlapping entity subtypes. ·Disjoint subtypes, also known as nonoverlapping subtypes, are subtypes that contain a unique subset of the supertype entity set; in other words, each entity instance of the supertype can appear in only one of the subtypes. ·Example: Assume that one of the business rules dictates that an employee cannot belong to more than one subtype at a time; that is, an employee cannot be a pilot and a mechanic at the same time.
·The following table is a violation of the BCNF. CLASS (Class_Code, Course_Code, Class_Section, Class_Time, Room_Code) It has three possible candidate keys:
·Class_code ·Course_Code + Class_Section ·Class_Time + Room_Code
Normalization
·Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks. ·Codd originally established three normal forms: 1NF, 2NF and 3NF. There are now three others that are generally accepted, but 3NF is sufficient for most database applications. Most tables when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).
Third Normal Form (3NF)
·Every non-key column must depend only on the primary key. There should not be any transitive dependencies. An attribute that depends on another attribute that is not the primary key of the relation is said to be transitively dependent. ·In other words, when a non-key attribute (or column) depends on another non-key attribute (or column), it is said to be transitively dependent. ·In order to achieve 3NF, the tables must already be in 2NF.
Normalization it involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.
·It is a process of removing redundant data from tables to improve storage efficiency, data integrity, and scalability. ·It reduces data anomalies. ·Each table can contain information about a single entity, and each piece of information is stored in exactly one place (no duplication). ·It involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.
Normalization Process
·Objective is to ensure that each table conforms to the concept of well-formed relations <Each table represents a single subject <No data item will be unnecessarily stored in more than one table <All nonprime attributes in a table are dependent on the primary key <Each table is void of insertion, update, and deletion anomalies
Overlapping Constraint
·Overlapping subtypes are subtypes that contain non-unique subsets of the supertype entity set; that is, each entity instance of the supertype may appear in more than one subtype. For example, in a university environment, a person may be an employee, a student, or both. In turn, an employee may be a professor as well as an administrator
Second Normal Form (2NF)
·The normal forms are progressive. In order to achieve 2NF, the tables must already be in 1NF. ·In 2NF, every non-key column must depend on the entire (not partial) primary key. ·To apply 2NF, you should move columns that don't depend on the entire primary key (or composite key) to another table.
Activity 4 •Does the above data model have a partial or total constraint? •Is it a disjoint or overlapping constraint? •What is the subtype discriminator?
·Total constraint ·Disjoint ·Prod_Type
Example 1 (UNF)
•More than one class field (repeating columns)
Entity supertypes and subtypes relationships are described in terms of _______________
"is-a" relationships
Composite primary keys are particularly useful in two cases:
1. As identifiers of composite entities, where each primary key combination is allowed only once in the M:N relationship. 2. As identifiers of weak entities, where the weak entity has a strong identifying relationship with the parent entity.
Steps for designing a data structure
1.Identify the data elements •Interview stakeholders, past databases, similar databases, use cases 2.Subdivide each element into its smallest useful components 3.Identify tables and assign columns (attributes) 4.Identify primary key, foreign key, and relationships 5.Create the ERDs 6.Normalize your data structure 7.Revise the ERDs if needed 8.Create the database in DBMS 9.Identify the indexes 10.Test the database and its functionality
Criteria to determine the usage
<There must be different, identifiable kinds of the entity in the user's environment <The different kinds of instances should each have one or more attributes that are unique to that kind of instance
Allows designers to
<Understand the nature, role, scope of data, and business processes <Develop appropriate relationship participation rules and constraints <Create an accurate data model
Weak Relationship
A weak relationship exists when the PK of the related entity does not contain at least one of the PK attributes of the parent entity. A weak relationship is indicated by a dashed line in the ERD.
Business Rules 2. Enable defining the ____________
basic building blocks
Business Rules 1. they are _______, ________, and ___________________ of _______, _______or_____
brief, precise and unambiguous description of policy Policy, Procedure or Principle
A weak relationship is indicated by a ________ in the __________
dashed line ERD
Specialization Hierarchy provides the means to define a _______________attribute known as the ______________
special supertype subtype discriminator
Entity supertypes and subtypes Every subtype has one ______to which it is___________
supertype directly related
Selecting Primary Keys
they should be 1. Unique 2. No change over time 3. Preferably single attribute 4. Preferably numeric 5. Security
Primary keys have ________. A primary key must be able to guarantee ______. It cannot contain ________.
unique values unique values nulls
Surrogate primary key is an "artificial" primary key that is used to ____________ each occurrence when there is ______________
uniquely identify no good natural key available.
Reasons for identifying and documenting business rules 2. Communication tool between______ and ____
user and designer
The surrogate key has no meaning in the __________ it exists only to distinguish __________ from ________. It is just like any other ___________
users environment one entity instance another primary key
Answer 2 Write the business rules reflected in this ERD.
·A store may place many orders. (Note the use of "may" - which is reflected in the ORDER optionality.) ·An order must be placed by a store. (Note that STORE is mandatory to ORDER) ·An order contains at least one order line. ·Each order line is contained in one and only one order. ·Each order line has a specific product written in it. ·A store may employ many employees. ·An employee may have one or more dependents. ·A dependent must be related to an employee.
Surrogate Key
·A surrogate primary key is an "artificial" PK that is used to uniquely identify each entity occurrence when there is no good natural key available ·A surrogate key is a primary key created by the database designer to simplify the identification of entity instances. ·The surrogate key has no meaning in the user's environment—it exists only to distinguish one entity instance from another (just like any other primary key). ·One practical advantage of a surrogate key is that because it has no intrinsic meaning, values for it can be generated by the DBMS to ensure that unique values are always provided.
Entity Supertypes and Subtypes
·Depicts arrangement of higher-level entity supertypes and lower-level entity subtypes ·Relationships are described in terms of "is-a" relationships ·Subtype exists within the context of a supertype ·Every subtype has one supertype to which it is directly related ·Supertype can have many subtypes
Entity Supertypes and Subtypes
·Entity supertype: Generic entity type related to one or more entity subtypes <Contains common characteristics ·Entity subtype: Contains unique characteristics of each entity subtype ·Criteria to determine the usage <There must be different, identifiable kinds of the entity in the user's environment <The different kinds of instances should each have one or more attributes that are unique to that kind of instance
Data Model Basic Building Blocks
·Entity: Unique and distinct object used to collect and store data <Attribute: Characteristic of an entity ·Relationship: Describes an association among entities <One-to-many (1:M) <Many-to-many (M:N or M:M) <One-to-one (1:1) Constraint: Set of rules to ensure data integrity
The Entity Relationship Model
·Graphical representation of entities and their relationships in a database structure ·Entity relationship diagram (ERD) <Uses graphic representations to model database components ·Entity instance or entity occurrence <Rows in the relational table ·Connectivity: Term used to label the relationship types
Reasons for Identifying and Documenting Business Rules
·Help standardize company's view of data ·Communications tool between users and designers ·Allow designer to: <Understand the nature, role, scope of data, and business processes <Develop appropriate relationship participation rules and constraints <Create an accurate data model
The Conceptual Model
·Represents a global view of the entire database by the entire organization ·Conceptual schema: Basis for the identification and high-level description of the main data objects ·Has a macro-level view of data environment ·Is software and hardware independent ·Logical design: Task of creating a conceptual data model
Selecting a Primary Key
·Unique values: A primary key must be able to guarantee unique values. It cannot contain nulls. ·No change over time: If Vickie Smith is the primary key, what happens if she changes her name when she gets married? ·Preferably single-attribute: Single-attribute primary keys are desirable but not required. Single-attribute primary keys simplify the implementation of foreign keys. ·Preferably numeric: Unique values can be better managed when they are numeric. ·Security: The selected primary key must not be composed of any attribute(s) that might be considered a security risk. Example: SSN
_________ is a characteristic of entity
Attribute
Sources of Business Rules
Company managers Department Managers Written Documentation Direct interviews with end users
Surrogate Key Example
DATE, TIME_START, ROOM) or (DATE, TIME_END, ROOM)
Derived attributes
Derived attributes are attributes whose value can be calculated from related attribute values.
Eg of Multi-Valued Attribute
EMP_ DEGREE attribute might store the string "BBA, MBA, PHD" to indicate three different degrees held
What are the basic building blocks of Data
Entity Relationship Constraint
Primary key main function is to uniquely identify an________ or _____within a table
Entity instance Raw
Primary keys cannot change overtime for example
If Sarah smith is a primary key what is going to happen when she gets married and changes her name to Sara Hagos?
What makes an entity weak?
It has two conditions 1. The entity is existence-dependent; it cannot exist without the entity with which it has a relationship. 2. The entity has a primary key that is partially or totally derived from the parent entity in the relationship.
_____________ it is a task of creating a conceptual data model
Logical Design
Relationship includes
One to many (1:M Many to many (M:N or M:N) One to one (1:1)
Weak Relationship exists when the_______ of the related entity does not contain at least ___________ of the __________
PK one of the PK attributes parent entity
_______ n ___________ are used to implement relationships among entities
Primary keys and foreign keys
Importance of Data Models
They are communication tool They give an overall view of the database Organize data for various users They are abstraction for the creation of good database