MIS 331

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Why study data management?

"at the heart of any organization that wants to survive in the global arena" it's trendy (Knowledge management, data mining, CRM, ERP) know what to ask for and whom to ask make information decisions Design GREAT databases that are accessed by GREAT systems!

What is a "Well-Structured Relation"? (Goal of Logical Database Design)

1. Minimum redundancy (ideally only controlled) 2. Can insert records without creating errors or inconsistencies (no insertion anomalies) 3. Can delete records without creating errors or inconsistencies (no deletion anomalies) 4. Can modify records without creating errors or inconsistencies (no modification anomalies)

Implementation Models

A conceptual model formatted for the particular technology (e.g., relational database). Half-way between conceptual and logical design. Key differences from Preliminary: 1. Associative entities are converted - there are NO M:N relationships in an implementation model. 2. All composite attributes are represented at finest level 3. All multivalued attributes are converted.

Associative Entities

A relationship can also be an entity

Advantages and Disadvantages of Database System

Advantages: Program-data independence Minimal data redundancy Improved data consistency Improved data sharing Enforcement of standards Improved data quality Increased productivity of application development Disadvantages: Needs new specialized personnel Need for explicit backups Interference due to shared data Organizational conflict

The Components of An ER Diagram: Multi-Valued Attributes

Assume an employee may have multiple phone numbers (double circle)

Subtype Discriminator

Attribute of supertype whose value determines the target subtypes

The Components of An ER Diagram: Derived Attributes

Attribute that can be calculated from other attributes. Need to note it, but it will not take up space in the database

The Components of An ER Diagram: Candidate Keys and Primary Keys

Attributes that uniquely identify an entity instance (i.e., row, record) Also referred to as "identifier" (if you see the word identifier on HW and stuff it's talking about primary key) Candidate keys are ALL attributes that uniquely define the entity instance Primary keys are the chosen ones How to choose: -Should not change over time - must have unique, non-null values (non null- you need to have a value in there, can't use names of first pets because some people might not have a first pet) -use as few attributes as possible Attribute that is underlined is the primary key Often Used: SSN, ID numbers

Translating the E-R Diagram: 2. Represent relationships

Binary M:N - create a third relation (associative entity) using primary key of each Binary 1:M - add the primary key of the entity on the 1 side as a foreign key of the entity on the M side. Binary 1:1 - add the primary key for either one as a foreign key to either. Note: if one side is optional, it should be the primary key to the other (the mandatory). Unary 1:N - recursive foreign key, referencing the primary key (using a different name)

Example: Data Redundancy

Bottom Line: Difficult to maintain Data Quality

Example: Program-Data Dependence

Bottom Line: Excessive Program Maintenance

Example: Data Duplication

Bottom Line: Wastage of Storage Space

Hardware

Central processing unit (CPU) is the hardware within a computer that carries out the instructions of a computer program by performing the basic arithmetical, logical, and input/output operations of the system Memory: primary, secondary, and Tertiary

Physical Data Independence Implications

Change the file structure, e.g., the sequence of fields: FS: change code DS: If the DBMS changes the physical storage structure, the user is not aware of the change Add a B-tree index for, e.g., empx, to speed up queries: FS: Write code DS: Application programs do not need to change at all.

Attributes =

Columns = Fields

Constraints (Two Types)

Completeness constraint - total or partial Disjointness constraint -Disjoint rule or Overlap rule

Network (CODASYL) Database Model

Created to improve database performance and to impose standards Advantages -Continued data independence -Data access and flexibility are better -Enforces database integrity Disadvantages -Difficult to design and use properly -Difficult to make changes in the database -Yields a complex structure

Translating the E-R Diagram: 1. Represent entities as relations

Customer (Custnum (underline), CustLName), (CustFName), PhoneNum) ideally primary key is listed first

Transform to a Set of Relations

Customer (Custnum [underline], CustLName), (CustFName), PhoneNum) Product (ProdCode [underline], ProdName) Orders (Quantity, Custnum [dotted underline], ProdCode [dotted underline], OrderNum [underline])

Metadata

Data about data What is the data about?

Software

Database Management System (DBMS) -Define: specify datatypes -Populate: enter and store data -Manipulate: query and update data -Control: Grant access Application programs

Example of a File Based System

Each row is a record

Subtypes and Supertypes Examples

Employees (super type) and part time, full time, seasonal, managerial, intern (subtypes) Products (super types) consumer goods, industrial, personal use, commercial use (subtypes)

EER and Implementation Models

Enhanced ER model -Subtype/Supertype -Generalization/Specialization -Disjoint/Overlap Implementation model -Translate associative entities -Fine-tune composite entities -Translate multi-valued attributes

The Components of An ER Diagram: Entities and Relationships

Entities- A distinguishable thing (go in a rectangle) Ex. Entities: student and residence hall Relationship- Serves to interconnect two or more other entities; they are directional (goes into a diamond) Ex. Relationship: lives in

Integrity Constraints

Entity Referential

When/why entity relationship diagrams are used

Ex. Designing a database for a school. You need to establish relationships and entities and stuff. Once you know how to diagram something you can start writing code. If something changes in the diagram you have to change the code, so it's a lot easier if you have it drawn out first.

Three Schema Architecture

External Level (individual user views) Conceptual Level (community user view) Internal View (storage view)

Supertype

Generic entity type that has relationship with one or more subtypes

The five basic database types are ...

Hierarchical Network Relational (current standard) Object-oriented Object-relational

Information Systems Architecture (ISA) and Information Engineering

Information Systems Architecture (ISA): Blueprint for the information systems in an organization Information Engineering: -Formal methodology for developing an ISA -Includes four steps: planning, analysis, design and implementation

Subclass

Inherits attributes and relationships of superclass

The Importance of Foreign Keys

Link Tables An attribute in one relation that serves as the primary key of another relation. Relationship Participation (of the PARENT or the ONE side) is captured using the foreign key constraints -Three components need to be specified NOT NULL or NULL ALLOWED ON DELETE {RESTRICT or CASCADE or SET NULL} ON UPDATE CASCADE <--Default for all cases

What is Modeled in an ER Diagram

Mini-world (the scope of our modeling), also known as the Universe of Discourse (UOD): Some part of the real world about which information will be stored in the database.

What do you need to know?

Modeling Normalization- how to prepare the data model to be used by a specific database SQL- how to access the data Data in Organizations- learning data warehousing, ERP, CRM, etc. how to "talk the talk"

Database System

Need: Database, software, hardware, and people

Hierarchal Database Model

North American Rockwell + IBM = IMS Advantages -Data independence -Promotes database integrity and data sharing -Efficient for large volume of transactions Disadvantages -Changes in structure require changes in programs -Complex to manage and lacks flexibility -No precise set of standards -Requires extensive programming activities

Entity Type: Strong vs. Weak

Not every entity type has a primary key. Such an entity type is referred to as a weak entity type. A weak entity is existence-dependent on some other entity, called its identifying entity. A weak entity cannot exist if its identifying entity does not also exist. (double box it in the diagram with a double diamond to show weak relationship) e.g., an employee's dependents might be weak entities. They cannot exist (in the database) if the relevant employee does not exist. If an employee is deleted, her dependents are deleted too. Has a primary key that is derived from the parent entity in the relationship - in notation, we designate the "partial identifier" with a double-underline Note that the entity and its identifying relationship are double-lined.

The Components of An ER Diagram: Cardinality

Number of instances of one entity that are associated with another (Minimum and maximum - lower and upper bounds on the number of instances)

The Entity-Relationship Diagram

Picture of the people, places, objects, things, events, or concepts, their characteristics and relationships, for an organization or business area. Visual Representation Communication Tool Capture the user data requirements using entities, the properties of entities (i.e., attributes) and relationships between entities In ERD, we do not capture Process, i.e., flow of data Reports and queries, i.e., how the data will be used

Evolution of Database Systems

Pre-1960's -1945: Magnetic tape developed -1959: IBM introduced the Random Access Method of Accounting and Control 1960's -IMS developed by IBM -SABRE developed by IBM and American Airlines 1970's - Relational model developed by Ted Codd -ER diagram introduced by Chen 1980's -DBMS developed for personal computers -Preliminary SQL standards published 1990's -Parallel processors -OODB - Standards for data query and exchange

Keys:

Primary Composite - create in the weak entity using primary key of strong as part of PK; strong primary and weak primary make up composite key of weak entity Foreign - uniquely relates two entities (dotted underline)

File Systems - A non-database way of capturing data is; each application is designed with its own set of files

Problems: limited data sharing program-data dependence duplication of data uncontrolled data redundancy inconsistent data inflexibility lengthy development times excessive program maintenance changes in files require changes in programs poor enforcement of standards

The Components of An ER Diagram: Attributes

Properties or characteristics of entities; A piece of information that describes an entity; Actual data items we collect

Attribute Inheritance

Property by which subtype entities inherit values of all attributes of the supertype

Reading ER diagram with Cardinality

Read the entity, then relationship, then cardinality on the other side, then the other entity I (crow's foot)=one to many O (crow's foot) =zero to many OI= zero to one II=one and only one Ex. A student takes one to many courses or a course is taken by one to many students

Entities =

Relations = Tables

Steps in Logical Design - When Translating the Data Model

Represent Entities as Relations -Each entity is represented as a relation -A relation will become a table; an identifier will be a key Represent Relationships -Depends on the nature of the relationship... -Develop foreign key constraints Merge the Relations -If needed

Data =

Rows = Records

Subtypes and Supertypes

Some entities are subtypes of other entities -Share some attributes, but not others -They are all entities just the same! Attribute Inheritance -Subtypes inherit values of all attributes of the supertype When to use? -Share some -- but NOT all -- attributes -Participate in a unique relationship

Generalization/Specialization

Specialization Top-down Identify "sub-sets" of the supertype Generalization Bottom-up Identify the supertype from the "sub-sets"

Subtype

Subgrouping of entities in an entity type that is meaningful to the organization

Conceptual Design

The Entity-Relationship (ER) model -Introduced by Peter Chen in 1976 -Most popular conceptual model used in database design. Basic idea: A database can be modeled as: -a collection of entities (things), and relationships among entities. The result is an ER Diagram (or Schema) -straightforward picture to explain to the users and therefore can be used as a communication tool between the designer and the users. Other reasons for conceptual modeling: -Independent of technology -Helps elicit business rules -Strong foundation for development process

Logical Data Independence

The capability to change the conceptual schema (logical model of data) without having to change external schemas Implications: Add a field, e.g., SSN to employee information: FS: Change code DS: Application programs still work

Identifying relationship

The relationship between a weak entity and its identifying entity

Converting the Composite Attribute from Enhanced ER to Implementation Model

This makes it easier to for sorting and record lookups

Converting the Associative Entity from Enhanced ER to Implementation Model

To uniquely indentify Cardinality moves diagonally (take from one side and drop it to the other) and the outside cordiality is always one and only one

Essentially Two Paths of Logical Database Design:

Transform the Data Model -Process of transforming the conceptual model into a logical model. -Taking the picture and getting it ready for a particular class of DBMS. -Bridging the user and analyst views. Transform the Data (tables, files) -Process of converting complex data structures into simple, stable data structures -Process of moving through the 'normal forms' -Creating smaller, 'cleaner' tables

Modeling Hints and Tips

Use different names for different things If an entity has only one attribute, ask yourself if it is really necessary. Model the data - not the physical artifacts (reports) in which the data reside Do NOT try to write code or think about the final format for the data. Don't worry about IFs, or how to connect entities except through relationships.

Without Database we ask how do we...

Use same data file (e.g., customer.txt) in two programs? Make sure the values of ContactPhone are valid? Make sure each customer's Name is not stored more than once? Allow multiple users to run the program at the same time? Authorize users controlled access (e.g., insert, query)? Standardize the data representation across systems? the answer... WRITE CODE

Transform to a Set of Relations: Subtype

Use the key of the supertype as both primary and foreign (gets solid and dotted underline)

Logical data independence says that any changes in the conceptual schema does not require any changes in the external schema. If we change the conceptual schema, do we need to make any changes to the initial schema?

Yes we do. Like saying I want to store something in my pockets, but I only have two that are full, so I need to have another pocket. If you add anything to conceptual you have to add it to internal or at least make space for it in internal. If nobody calls it from external that's ok.

People

administrators, designers, developers, end users

Overlapping Constraint (Type of Disjointness Constraint)

an entity instance can belong to more than one subtype

Disjoint Constraint (Type of Disjointness Constraint)

an entity instance cannot belong to more than one subtype

Schema

blueprint, design, how things look, how things work

Record

collection of fields (row in file based system)

File

collection of records

First Level of Design is

conceptual

Upside down Triangle

data becomes information which becomes knowledge

Total Specialization Constraint (Type of Completeness Constraint)

every entity instance in the super type must belong to a subtype also

End Users

get very little access compared to designers and developers

Field

group of characters with meaning

Knowledge

how you use information in your business

Data

largely means nothing to humans (in a human readable format)

Object Oriented Database Model

late 1980's (Yourdon) - data first, logic second Advantages - can handle complex data types (e.g., CAD, GIS, multimedia) - reuse Disadvantages -difficult to use - expensive to develop (steep learning curve) - too costly to convert leggy systems

Information

means something to humans can we use this data? if so it becomes information

Partial Specialization Constraint (Type of Completeness Constraint)

not every entity instance in the super type must belong to a subtype also if something is not required to be recorded, it's partial

Database

organized collection of logically related persistent data includes data and metadata

Internal Schema

pulls data from the hard disk

Physical Data Independence

the capability to change the physical storage structure or access methods (e.g., index), i.e., internal schema without having to change conceptual schemas

Relational Database

the current standard (what we use today) Developed by Codd in 1970 Entity relationship model - current design standard Advantages -Structural independence (AND data independency) Disadvantages -large overhead Poorly planned user development systems (due to ease of use) (think of it like a club where there's a bouncer vs. no bouncer)

Benefits of database approach

the data can be shared Sharing data is straightforward. FS: Write program. Hard to share old data. DB: Existing data can easily be shared by many applications. redundancy can be reduced Facts are recorded in one place in database. FS: Write program. Hard to control. DB: Redundancy can be controlled. inconsistency can be avoided (to some extent) Make sure Symbol of each company is valid. Make sure each company name is not stored more than once. FS: Write program. DB: DBMS can enforce some types of "integrity." transaction support can be provided Allow multiple users to run the program at the same time. FS: Write programs, maybe using locks. DB: DBMS supports transaction management. security can be enforced Authorize each user controlled access (e.g., insert, update). FS: Write programs, maybe using passwords. DB: DBMS enforces security. standards can be enforced Standardize the data representation across systems? FS: All programs must agree on the data representation. DB: The DBA can ensure that all applicable standards are observed in the representation of data. increased productivity of application development data accessibility and responsiveness integration of data

The user in the external level generally doesn't know what's being added

to the internal

Composite Keys

when there are two primary keys, you double underline Ex. Employee ID and employee phone number are both PKs for the entity, phone


Ensembles d'études connexes

Biology 201 Exam 2- Evolution and Ecology

View Set

Chapter 10 Health Problems of Infants

View Set

Biology 107, study guide questions: EXAM 1

View Set

Regulation of gene expression in eukaryotes; lecture 20

View Set

HW 05 - Ch 5: Glacial and Arid Landscapes

View Set