MIS 331
Why study data management?
"at the heart of any organization that wants to survive in the global arena" it's trendy (Knowledge management, data mining, CRM, ERP) know what to ask for and whom to ask make information decisions Design GREAT databases that are accessed by GREAT systems!
What is a "Well-Structured Relation"? (Goal of Logical Database Design)
1. Minimum redundancy (ideally only controlled) 2. Can insert records without creating errors or inconsistencies (no insertion anomalies) 3. Can delete records without creating errors or inconsistencies (no deletion anomalies) 4. Can modify records without creating errors or inconsistencies (no modification anomalies)
Implementation Models
A conceptual model formatted for the particular technology (e.g., relational database). Half-way between conceptual and logical design. Key differences from Preliminary: 1. Associative entities are converted - there are NO M:N relationships in an implementation model. 2. All composite attributes are represented at finest level 3. All multivalued attributes are converted.
Associative Entities
A relationship can also be an entity
Advantages and Disadvantages of Database System
Advantages: Program-data independence Minimal data redundancy Improved data consistency Improved data sharing Enforcement of standards Improved data quality Increased productivity of application development Disadvantages: Needs new specialized personnel Need for explicit backups Interference due to shared data Organizational conflict
The Components of An ER Diagram: Multi-Valued Attributes
Assume an employee may have multiple phone numbers (double circle)
Subtype Discriminator
Attribute of supertype whose value determines the target subtypes
The Components of An ER Diagram: Derived Attributes
Attribute that can be calculated from other attributes. Need to note it, but it will not take up space in the database
The Components of An ER Diagram: Candidate Keys and Primary Keys
Attributes that uniquely identify an entity instance (i.e., row, record) Also referred to as "identifier" (if you see the word identifier on HW and stuff it's talking about primary key) Candidate keys are ALL attributes that uniquely define the entity instance Primary keys are the chosen ones How to choose: -Should not change over time - must have unique, non-null values (non null- you need to have a value in there, can't use names of first pets because some people might not have a first pet) -use as few attributes as possible Attribute that is underlined is the primary key Often Used: SSN, ID numbers
Translating the E-R Diagram: 2. Represent relationships
Binary M:N - create a third relation (associative entity) using primary key of each Binary 1:M - add the primary key of the entity on the 1 side as a foreign key of the entity on the M side. Binary 1:1 - add the primary key for either one as a foreign key to either. Note: if one side is optional, it should be the primary key to the other (the mandatory). Unary 1:N - recursive foreign key, referencing the primary key (using a different name)
Example: Data Redundancy
Bottom Line: Difficult to maintain Data Quality
Example: Program-Data Dependence
Bottom Line: Excessive Program Maintenance
Example: Data Duplication
Bottom Line: Wastage of Storage Space
Hardware
Central processing unit (CPU) is the hardware within a computer that carries out the instructions of a computer program by performing the basic arithmetical, logical, and input/output operations of the system Memory: primary, secondary, and Tertiary
Physical Data Independence Implications
Change the file structure, e.g., the sequence of fields: FS: change code DS: If the DBMS changes the physical storage structure, the user is not aware of the change Add a B-tree index for, e.g., empx, to speed up queries: FS: Write code DS: Application programs do not need to change at all.
Attributes =
Columns = Fields
Constraints (Two Types)
Completeness constraint - total or partial Disjointness constraint -Disjoint rule or Overlap rule
Network (CODASYL) Database Model
Created to improve database performance and to impose standards Advantages -Continued data independence -Data access and flexibility are better -Enforces database integrity Disadvantages -Difficult to design and use properly -Difficult to make changes in the database -Yields a complex structure
Translating the E-R Diagram: 1. Represent entities as relations
Customer (Custnum (underline), CustLName), (CustFName), PhoneNum) ideally primary key is listed first
Transform to a Set of Relations
Customer (Custnum [underline], CustLName), (CustFName), PhoneNum) Product (ProdCode [underline], ProdName) Orders (Quantity, Custnum [dotted underline], ProdCode [dotted underline], OrderNum [underline])
Metadata
Data about data What is the data about?
Software
Database Management System (DBMS) -Define: specify datatypes -Populate: enter and store data -Manipulate: query and update data -Control: Grant access Application programs
Example of a File Based System
Each row is a record
Subtypes and Supertypes Examples
Employees (super type) and part time, full time, seasonal, managerial, intern (subtypes) Products (super types) consumer goods, industrial, personal use, commercial use (subtypes)
EER and Implementation Models
Enhanced ER model -Subtype/Supertype -Generalization/Specialization -Disjoint/Overlap Implementation model -Translate associative entities -Fine-tune composite entities -Translate multi-valued attributes
The Components of An ER Diagram: Entities and Relationships
Entities- A distinguishable thing (go in a rectangle) Ex. Entities: student and residence hall Relationship- Serves to interconnect two or more other entities; they are directional (goes into a diamond) Ex. Relationship: lives in
Integrity Constraints
Entity Referential
When/why entity relationship diagrams are used
Ex. Designing a database for a school. You need to establish relationships and entities and stuff. Once you know how to diagram something you can start writing code. If something changes in the diagram you have to change the code, so it's a lot easier if you have it drawn out first.
Three Schema Architecture
External Level (individual user views) Conceptual Level (community user view) Internal View (storage view)
Supertype
Generic entity type that has relationship with one or more subtypes
The five basic database types are ...
Hierarchical Network Relational (current standard) Object-oriented Object-relational
Information Systems Architecture (ISA) and Information Engineering
Information Systems Architecture (ISA): Blueprint for the information systems in an organization Information Engineering: -Formal methodology for developing an ISA -Includes four steps: planning, analysis, design and implementation
Subclass
Inherits attributes and relationships of superclass
The Importance of Foreign Keys
Link Tables An attribute in one relation that serves as the primary key of another relation. Relationship Participation (of the PARENT or the ONE side) is captured using the foreign key constraints -Three components need to be specified NOT NULL or NULL ALLOWED ON DELETE {RESTRICT or CASCADE or SET NULL} ON UPDATE CASCADE <--Default for all cases
What is Modeled in an ER Diagram
Mini-world (the scope of our modeling), also known as the Universe of Discourse (UOD): Some part of the real world about which information will be stored in the database.
What do you need to know?
Modeling Normalization- how to prepare the data model to be used by a specific database SQL- how to access the data Data in Organizations- learning data warehousing, ERP, CRM, etc. how to "talk the talk"
Database System
Need: Database, software, hardware, and people
Hierarchal Database Model
North American Rockwell + IBM = IMS Advantages -Data independence -Promotes database integrity and data sharing -Efficient for large volume of transactions Disadvantages -Changes in structure require changes in programs -Complex to manage and lacks flexibility -No precise set of standards -Requires extensive programming activities
Entity Type: Strong vs. Weak
Not every entity type has a primary key. Such an entity type is referred to as a weak entity type. A weak entity is existence-dependent on some other entity, called its identifying entity. A weak entity cannot exist if its identifying entity does not also exist. (double box it in the diagram with a double diamond to show weak relationship) e.g., an employee's dependents might be weak entities. They cannot exist (in the database) if the relevant employee does not exist. If an employee is deleted, her dependents are deleted too. Has a primary key that is derived from the parent entity in the relationship - in notation, we designate the "partial identifier" with a double-underline Note that the entity and its identifying relationship are double-lined.
The Components of An ER Diagram: Cardinality
Number of instances of one entity that are associated with another (Minimum and maximum - lower and upper bounds on the number of instances)
The Entity-Relationship Diagram
Picture of the people, places, objects, things, events, or concepts, their characteristics and relationships, for an organization or business area. Visual Representation Communication Tool Capture the user data requirements using entities, the properties of entities (i.e., attributes) and relationships between entities In ERD, we do not capture Process, i.e., flow of data Reports and queries, i.e., how the data will be used
Evolution of Database Systems
Pre-1960's -1945: Magnetic tape developed -1959: IBM introduced the Random Access Method of Accounting and Control 1960's -IMS developed by IBM -SABRE developed by IBM and American Airlines 1970's - Relational model developed by Ted Codd -ER diagram introduced by Chen 1980's -DBMS developed for personal computers -Preliminary SQL standards published 1990's -Parallel processors -OODB - Standards for data query and exchange
Keys:
Primary Composite - create in the weak entity using primary key of strong as part of PK; strong primary and weak primary make up composite key of weak entity Foreign - uniquely relates two entities (dotted underline)
File Systems - A non-database way of capturing data is; each application is designed with its own set of files
Problems: limited data sharing program-data dependence duplication of data uncontrolled data redundancy inconsistent data inflexibility lengthy development times excessive program maintenance changes in files require changes in programs poor enforcement of standards
The Components of An ER Diagram: Attributes
Properties or characteristics of entities; A piece of information that describes an entity; Actual data items we collect
Attribute Inheritance
Property by which subtype entities inherit values of all attributes of the supertype
Reading ER diagram with Cardinality
Read the entity, then relationship, then cardinality on the other side, then the other entity I (crow's foot)=one to many O (crow's foot) =zero to many OI= zero to one II=one and only one Ex. A student takes one to many courses or a course is taken by one to many students
Entities =
Relations = Tables
Steps in Logical Design - When Translating the Data Model
Represent Entities as Relations -Each entity is represented as a relation -A relation will become a table; an identifier will be a key Represent Relationships -Depends on the nature of the relationship... -Develop foreign key constraints Merge the Relations -If needed
Data =
Rows = Records
Subtypes and Supertypes
Some entities are subtypes of other entities -Share some attributes, but not others -They are all entities just the same! Attribute Inheritance -Subtypes inherit values of all attributes of the supertype When to use? -Share some -- but NOT all -- attributes -Participate in a unique relationship
Generalization/Specialization
Specialization Top-down Identify "sub-sets" of the supertype Generalization Bottom-up Identify the supertype from the "sub-sets"
Subtype
Subgrouping of entities in an entity type that is meaningful to the organization
Conceptual Design
The Entity-Relationship (ER) model -Introduced by Peter Chen in 1976 -Most popular conceptual model used in database design. Basic idea: A database can be modeled as: -a collection of entities (things), and relationships among entities. The result is an ER Diagram (or Schema) -straightforward picture to explain to the users and therefore can be used as a communication tool between the designer and the users. Other reasons for conceptual modeling: -Independent of technology -Helps elicit business rules -Strong foundation for development process
Logical Data Independence
The capability to change the conceptual schema (logical model of data) without having to change external schemas Implications: Add a field, e.g., SSN to employee information: FS: Change code DS: Application programs still work
Identifying relationship
The relationship between a weak entity and its identifying entity
Converting the Composite Attribute from Enhanced ER to Implementation Model
This makes it easier to for sorting and record lookups
Converting the Associative Entity from Enhanced ER to Implementation Model
To uniquely indentify Cardinality moves diagonally (take from one side and drop it to the other) and the outside cordiality is always one and only one
Essentially Two Paths of Logical Database Design:
Transform the Data Model -Process of transforming the conceptual model into a logical model. -Taking the picture and getting it ready for a particular class of DBMS. -Bridging the user and analyst views. Transform the Data (tables, files) -Process of converting complex data structures into simple, stable data structures -Process of moving through the 'normal forms' -Creating smaller, 'cleaner' tables
Modeling Hints and Tips
Use different names for different things If an entity has only one attribute, ask yourself if it is really necessary. Model the data - not the physical artifacts (reports) in which the data reside Do NOT try to write code or think about the final format for the data. Don't worry about IFs, or how to connect entities except through relationships.
Without Database we ask how do we...
Use same data file (e.g., customer.txt) in two programs? Make sure the values of ContactPhone are valid? Make sure each customer's Name is not stored more than once? Allow multiple users to run the program at the same time? Authorize users controlled access (e.g., insert, query)? Standardize the data representation across systems? the answer... WRITE CODE
Transform to a Set of Relations: Subtype
Use the key of the supertype as both primary and foreign (gets solid and dotted underline)
Logical data independence says that any changes in the conceptual schema does not require any changes in the external schema. If we change the conceptual schema, do we need to make any changes to the initial schema?
Yes we do. Like saying I want to store something in my pockets, but I only have two that are full, so I need to have another pocket. If you add anything to conceptual you have to add it to internal or at least make space for it in internal. If nobody calls it from external that's ok.
People
administrators, designers, developers, end users
Overlapping Constraint (Type of Disjointness Constraint)
an entity instance can belong to more than one subtype
Disjoint Constraint (Type of Disjointness Constraint)
an entity instance cannot belong to more than one subtype
Schema
blueprint, design, how things look, how things work
Record
collection of fields (row in file based system)
File
collection of records
First Level of Design is
conceptual
Upside down Triangle
data becomes information which becomes knowledge
Total Specialization Constraint (Type of Completeness Constraint)
every entity instance in the super type must belong to a subtype also
End Users
get very little access compared to designers and developers
Field
group of characters with meaning
Knowledge
how you use information in your business
Data
largely means nothing to humans (in a human readable format)
Object Oriented Database Model
late 1980's (Yourdon) - data first, logic second Advantages - can handle complex data types (e.g., CAD, GIS, multimedia) - reuse Disadvantages -difficult to use - expensive to develop (steep learning curve) - too costly to convert leggy systems
Information
means something to humans can we use this data? if so it becomes information
Partial Specialization Constraint (Type of Completeness Constraint)
not every entity instance in the super type must belong to a subtype also if something is not required to be recorded, it's partial
Database
organized collection of logically related persistent data includes data and metadata
Internal Schema
pulls data from the hard disk
Physical Data Independence
the capability to change the physical storage structure or access methods (e.g., index), i.e., internal schema without having to change conceptual schemas
Relational Database
the current standard (what we use today) Developed by Codd in 1970 Entity relationship model - current design standard Advantages -Structural independence (AND data independency) Disadvantages -large overhead Poorly planned user development systems (due to ease of use) (think of it like a club where there's a bouncer vs. no bouncer)
Benefits of database approach
the data can be shared Sharing data is straightforward. FS: Write program. Hard to share old data. DB: Existing data can easily be shared by many applications. redundancy can be reduced Facts are recorded in one place in database. FS: Write program. Hard to control. DB: Redundancy can be controlled. inconsistency can be avoided (to some extent) Make sure Symbol of each company is valid. Make sure each company name is not stored more than once. FS: Write program. DB: DBMS can enforce some types of "integrity." transaction support can be provided Allow multiple users to run the program at the same time. FS: Write programs, maybe using locks. DB: DBMS supports transaction management. security can be enforced Authorize each user controlled access (e.g., insert, update). FS: Write programs, maybe using passwords. DB: DBMS enforces security. standards can be enforced Standardize the data representation across systems? FS: All programs must agree on the data representation. DB: The DBA can ensure that all applicable standards are observed in the representation of data. increased productivity of application development data accessibility and responsiveness integration of data
The user in the external level generally doesn't know what's being added
to the internal
Composite Keys
when there are two primary keys, you double underline Ex. Employee ID and employee phone number are both PKs for the entity, phone