Database Design Quiz 1
What should an entity type represent?
An item or noun
Logical Data Independence
The capacity to change the conceptual schema without having to change external schemas or application programs Change of a particular user's view will not affect views of other user groups. Transform the schema differences by various mappings Examples: Age, temperature, price
Physical Data Independence
The capacity to change the internal schema without having to change the conceptual schema. Hence, the external schemas need not be changed as well Examples: Learning function is programmed by humans
Can ER diagrams define the database scope?
Yes
Mappings
between levels support data independences
Is there a systematic process for conceptual data modeling?
yes
Conceptual Data Schema
Abstraction of the reality of data elements and their relationships Conceptual view of the data requirements Pictorial representation of the data requirements It is easier to communicate with end users by using conceptual schemas.
Bottom-Up
Analysis of existing files and applications Synthesis programs and algorithms
Fundamental Modeling structures:
Attributes Entities Relationships
Why Model Data?
Conduct a system development problem from different perspectives by using different tools Every modeling tool has its purpose: Based on certain assumptions about the aspect of the problem it is intended to solve. Help designers to collect input and seek feedbacks from users Data modeling is considered as the most important modeling technique and skill for information system development.
Conceptual Level
Core of the ANSI/SPARC framework Represents the global view of the structure of the entire database: the model of the whole database Describes all data items and data relationships together with data integrity constraints Separates data from the program (or views from the physical storage structure) Captures data specification (metadata) Technology independent Relatively stable over long period of time Visible to DBA and database designers
Database Environment
Database hardware Database Management Systems (DBMS): software package for managing databases Database users Database administrator Database: Schema (meta-data) and Instances (data contents) Database models: logical and physical Database languages (DDL and DML) Data dictionary and directory Database design tools (e.g., ERWin)
Notation of ER Diagrams
Entity types: rectangles Weak entity types: double rectangles Relationship types: diamonds Connecting lines: roles entities play in the relationships Cardinalities and connectivities
Database Design Problem
Given: a data management request and its data and application requirements Objective: develop a database system Such that: Adequately manage the data of application domains Effectively supports applications' information needs
Inaccurate data
Having $1000 instead of $10000 for customer's credit limit Having $1.05 instead of $1.50 for an item
Characteristics of Business Data
Inaccuracies: Data do not reflect the reality. E.g.: wrong address, wrong number of products on shelf Inconsistency: Over time duplicate records do not agree. E.g.: school recording two different grades for same student
Cardinalities
Indicates the number of each object type that may participate in an association.
1: Requirements Collection and Analysis
Input: Application knowledge and users' information needs Task: The database designers interview users to understand and document their data requirements. Output: concisely written set of users' data requirements
4: Physical Database Design
Input: the logical database schema Task: specify the internal storage structure indexes, access paths, and file organizations for database files Output: physical implementation structures It is fully dependent on the target DBMS.
Evolution of Data Management
Manual systems (before computer age) File-based processing: sequential, indexed and direct data access
Lost data
Missing the record of a product item for sales Losing an invoice transaction
Mappings from real world primitives to ER modeling constructs: See Table 2.1
Object -> Entity Association -> Relationship Property -> Attribute
Real World Primitives
Objects Object types Properties Fact Property value set Associations Object classes
For binary associations:
One-to-one (1:1) One-to-many (1:N) Many-to-many (N:M)
Database Development Life Cycle
Preliminary Planning Database Initial Study (Feasibility Study) Database Design
Values of Properties
Properties for object types and associations Property value set: a set of possible values for a particular property.
Entity-Relationship (ER) Model
Proposed by Peter Chen in 1976 The most widely accepted data model for conceptual design Easy to use and understand by end users Capture and represent data requirements Doesn't imply how data should be organized and implemented Chosen by ANSI in 1988 as the standard model for Information Resource Directory Systems (IRDSs)
Database Evolution
Redesign and Reengineering Database Migration Database Integration IS30
Database Design
Requirements collection and analysis Conceptual design using a conceptual model Logical design using an implementation model Physical design using different file structures
Data Independences
The capacity to change the schema at one level of a database system without having to change the schema at the next higher level Two types: Logical data independence Physical data independence
External Level
The external schemas (usually many) represent different user views, each describing portions of the database for a particular application. All external schemas are generated exclusively from the conceptual schema. Each external schema (also called view) describes the database of one application such as inventory control, customer relationship, etc. Visible to the applications and end users Technology independent
Focus on Processing Functions
The program logic and functions will be specified first. Data structure is quite simple.
Database Management System (DBMS)
A collection of general-purpose software that facilitates the process of defining, constructing, and manipulating a database for various applications E.g.,: Microsoft Access, ORACLE RDBMS, IBM DB2, Microsoft SQL Server
Databases
A single, integrated set of files
The Basic Structure: Relation
A two dimensional table Rows correspond to individual records and data relationships Rows are called "tuples" Columns correspond to attributes (properties) Each attribute has a data domain specified
Objects and Object Types
An object is a thing in the real world Objects may be tangible or intangible Object type: refers to a group of objects. E.g., Employee is an object type. Each instance of an object type has certain characteristics. E.g., Employees have name, titles, salary, etc.
Top -Down
Analysis of organizational functions Determine their information requirements Synthesis these requirements
Why should we capture and represent subtype relationships?
Because the account holder could possibly do more than one action (buy or sell)
Database Development Life Cycle
Construction and Loading Testing and Evaluation Implementation Performance Maintenance and Evaluation Database Evolution
How to make sure I capture and represent all the data requirements?
Consult a partner and by prototyping
Advantages of Database Systems
Control data redundancy Eliminate data inconsistencies Facilitate data sharing Facilitate better data management Increase application reliability Increase data accessibility Better data control and protection Improve end-user interfaces Reduce maintenance work Increase security, privacy and integrity control
Large volume of data
Customer data for a bank: e.g., 5/3 Items and customer orders: e.g., Amazon, eBay Personal data recorded by government: e.g., social security
Why maintain organizational memory
Daily transaction processing Decision making Competitive advantages
Why Database Approach?
Data is the valuable resource, not the hardware. Data should be shared and accessed as needed. Data should occupy the central position. One representation for each piece of data. Control data redundancy Separate data from programs Easy to establish relationships among data Provide data security and control Provide multiple interfaces Enforce integrity constraints The most popular tool for managing data.
Problems of File-based Data Processing
Data redundancy and inconsistency Lack of Data Integration: It is the data isolation problem (Islands of Data). Each application defines and processes its own universe of data. Difficult to locate and access data Difficult to ensure data consistency Lack of control over data Lack of flexibility Ad hoc data queries to the applications are impossible. Cost of software development is high: excessive programming effort. Dependence between programs and data Inconsistent naming of terms: Homonyms and Synonyms
Data should be shared.
Data should not be owned by one entity and used for one application. Existing and new applications share the same data.
Internal Level
Describes the physical structure of the stored data (how the data is actually laid out on storage devices) Describes the methods used to implement data access (e.g., indexes, hashed addresses, etc.) Concerns with the efficiency of physical data storage and access methods Technology dependent Visible to DBMS and particularly to DBA
Relational Data Model
Developed by E. F. Codd at IBM in 1970 Conceptual simplicity The database is perceived by users as tables. Basic structure: relation (logical table) Data are related through the logical relationships according to the data values. Users are isolated from complex implementation details inside the database systems.
Inconsistent data
Different inventory levels of an item Different grades for the same course taken
Physical Models (Physical data structures)
E.g.: Indexed files (ISAM, VSAM, etc.)
File-based Data Processing Issues
Each application has its own input and output files designed to meet its particular needs. Certain data items will appear in multiple files. Each application developed independently that will result in different representations and formats for the same data items.
'Account' & 'Category': attributes or entity types?
Entity
Is 'Bid" an attribute, an entity type or a relationship type?
Entity
Conceptual Models
Entity-Relationship (ER) model Object-oriented model such as UML Semantic data models
Three-schema architecture
External level Conceptual level Internal level
ANSI/SPARC Framework
Foundation for database development Proposal for DBMS architecture to support data independencies Very influential to DBMS's design by vendors, especially Relational DBMSs such as ORACLE
Logical Models (Implementation Models)
Hierarchical Network Relational
Databases
Hierarchical (legacy database systems) Network (legacy database systems) Relational (most popular commercial databases)
Extensions to ER Model
Incorporate a set of data abstractions to capture data requirements: Aggregation Generalization Specification Categorization
Value of Digital Data for Businesses
Increase production efficiency Enhance management control Support decision making Act as the competitive source Generate strategic value (e.g., Big Data)
3: Logical Design (Data Model Mapping)
Input: conceptual schema Task: Transform the conceptual schema according to an implementation data model Output: the logical database schema Relational data model is the most popular implementation data model of commercial DBMSs.
2: Conceptual Design
Input: users' data requirements Task: analyze users' requirements (business rules) and express them by using a conceptual data model Output: the conceptual schema The conceptual schema doesn't include implementation details. It is easier to understand and be used to communicate with nontechnical users. ER is the most popular model for this phase.
Associates
Interactions and relationships among objects. Associates can generate their own properties: E.g., A list of students register IS3030 taught by Professor Chiang in Fall 2017.
How to draw a 'good' and 'simple' ER diagram?
It can vary there is usually not one right answer
Is it necessary to design a database starting from the conceptual modeling?
It is up to the designer's preference
How detailed should an ER diagram be?
It should contain entities, attributes, and relationships
Human Information Processing
Memory is limited Ability to process data is restricted Use a variety of external tools
Evolution of Data Management
Multimedia database: organize a variety of information sources such as voice, audio, photos, video, etc. Object-oriented (OO) databases Client/Server databases Distributed databases (e.g., Blockchain) Data warehouses (Data Lake)
Can attributes have relationships with other attributes or entity types?
No
Can or should we combine account and customer as one entity type?
No
Can relationship types have directions?
No
Can we have an entity type without any attribute (e.g., Auction, Web site)?
No
Should a relationship type represent multiple associations (i.e., buy and sell)?
No
Should we consider implementation details during the conceptual modeling?
No
Should we model Buyer, Seller, and Bidder as separate entity types? If not, why not?
No
Should we model control flows such as "Customers create Accounts to offer items", and "Customers assign categories to items"?
No
Should we specify the 'primary keys' for the entity types?
No
Should we specify the sequence of relationship types (e.g., bid before buy)?
No
Conceptual Data Modeling Using Entity-Relationship Model
Objective: represent the data requirements using ER model
Why Conceptual Data Modeling for Database Design?
Opportunity for enhanced user participation in the design process Merits that accrue from DBMS-independent modeling Ease of understanding of the big picture and the consequent facilitation of maintenance of schemas and applications in the long run
Organizational Information Processing
Processors: humans and computers Use external memory Adopt external sources as processors
Common Problems of Organizational Memory
Redundancy Data control issues Poor user interface Poor data security Poor information quality Long delays in responding to information needs Unable to answer questions about the real world: Lack of reality Lack of memory (data) integration
Bid, Buy & Sell: Relationship or Entity?
Relationship
Conceptual Data Model:
Shows the data elements and how they are related without considering how they should be implemented.
Why Relational Databases?
Simple structure, easy to design Support data independences Powerful and flexible data manipulation capability 4GL query language (SQL) View data logically rather than physically Better control data redundancies
Organizational Memory
Storage medium: most data is stored in memory Storage structure: File and Database Data entry and information retrieval methods Trade-off between access speed and cost Essential component of modern organizations Key competitive resources
Needs of Data Independences
Support independent end users' and applications' views of data Support program-data independence Minimize the ripple effort of changes: E.g.: application programs immune to changes in storage structure and access methods But, how to achieve them?
Components of a DBMS
The major components of this include one or more query languages; tools for generating reports; facilities for providing security, integrity, backup and recovery; a data manipulation language for accessing the database; and a data definition language used to define the structure of data.
Integrated
The responsibility for 'integrating' data items as needed is assumed by the DBMS instead of the programmer.
Self-describing
The structure of the database (metadata) is recorded within the database system - not in the application programs.
File-based Data Processing Advantages
Very efficient use of machine resource, based on the view that computing power is the most valuable resource Relatively easy to design and implement for simple applications
Data Models
What is a model? A representation of something real for design purposes Examples: blueprint, prototype, simulation What is the purpose of models? So designers know what they need to change What are data models and their purposes? Abstraction of reality of data items A conceptual way in organizing data items
Can relationship types have their own attributes?
Yes
Can we have more than one relationship type between two entity types? (e.g., Customers and Items)
Yes
Should (can) I make assumptions for the conceptual data modeling?
Yes
Should we model the 'derived' attribute such as '# of bidder', 'the highest bid'?
Yes
Should we model the business rules such as "A bidder cannot bid his/her own items."
Yes
Can we model process requirements into the ER diagram? If we can, do we want to do so?
Yes, but we don't want to
Can attributes have their own attributes?
Yes, they are known as composite attributes
Information
a collective of data that holds meaning
What is the purpose of conceptual modeling?
comprehension, abstraction, representation
Data
material that can be accessed that has no specific meaning
What to do and how to start the conceptual data modeling process?
start with an ER diagram