bmgt 402
• Database planning should also include development of standards that govern: - how data will be collected - how the format should be specified - what necessary documentation will be needed - how design and implementation should proceed
Database Planning - Development Standards
of a functional dependency refers to the attribute or group of attributes on the left-hand side of the arrow -Describes the attributes and how they relate to one another
The determinant
• Develop a data model by starting with attributes • Analysis of associations of attributes with other attributes creates relations and relationships between relations • The normalization process uses a bottom-up approach
- Bottom-up (simple databases)
• Similar to bottom-up approach by identifying a small number of entities and then the attributes of those entities • Further entities are identified by spreading out to consider other entities tied to the first ones identified
- Inside-out
• Uses aspects of top-down and bottom-up approaches before combining all parts together
- Mixed strategy
is where a model suggests the existence of a relationship between entity types, but the pathway between certain entity occurrences is missing • Example: A single branch has one or more staff members who oversee zero or more properties for rent - Note: Not all staff oversee properties for rent, and not all properties are overseen by a staff member
A chasm trap
is a data inconsistency or problem caused by an non-normalized database (e.g. a data problem not expected) - Types • Insertion anomaly • Modification (Update) anomaly • Deletion anomaly
A data anomaly
entity and referential integrity that describe requirements for primary and foreign keys
A properly designed relational schema exhibits what kind of integrity?
- It is already in 1NF and - It does not contain any partial dependencies • Every non-key attribute is fully functionally dependent on the primary key
A relation is in 2NF when:
- It is already in 2NF and - It does not contain any transitive dependencies • Every non-key attribute is fully functionally dependent on the primary key
A relation is in 3NF when:
a named relation followed by the attribute names in parentheses-Examples (note the primary key is underlined and the foreign key is italicized):•Branch(branchNo, street, city, postcode)•Staff(staffNo,fName,lName,position,sex,DOB,salary,branchNo)
A relation schema is
is one or more attributes that determine other attributes-If a key consists of two or more attributes, it is referred to as a composite key
A relational key
•External Level •Conceptual Level •Internal Level
ANSI-SPARC Three-Level Architecture
- A minimal (irreducible) super key - A superkey that does not contain a subset of attributes that is itself a super key - May have more than one attribute (composite key) - A relation may have multiple candidate keys
Candidate Key
-Describes data at the lowest level -Example: •Frame memory -data stored in frames or blocks of data
Categories of Data Models•Physical Data Models
-Relational -tables -Network -collection of records represented by sets -Hierarchical -records and sets, tree structure
Categories of Data Models•Record-Based Data Models
• Requirements for each user view are merged into a single set of requirements
Centralized Approach to Managing Multiple User Views
• The process of constructing a model of the data used in an enterprise, independent of all physical considerations - Avoid "IDs" - Avoid foreign keys - Model should be business friendly
Conceptual Design Phase
is a process of creating a design for a database that will support an enterprise's mission statement and mission objectives for the required database system
Database Design
• It is critical to capture necessary facts to build the required database application • Facts about systems, requirements, and preferences are captured using fact-finding techniques • Examples of fact-finding techniques: - Existing documentation - Interviews - Job shadowing - Research - Questionnaires
Fact-Finding Techniques
Process of constructing a model of the data used in an enterprise based • Based on: - a specific data model (e.g. relational model) - still independent of a particular DBMS and other physical considerations • Conceptual data model is refined and mapped onto a logical data model
Logical Database Design
•Collection of records represented by sets are used in the model •Relies on pointers to go between nodes
Network Data Model
• An effective technique for understanding a system • Possible to either participate in, or watch, a person perform activities to learn about the system • Useful when validity of data collected is in question or when the complexity of certain aspects of the system prevents a clear explanation by end-users
Observing the Organization in Operation
• Conceptual database design • Logical database design • Physical database design
Phases of Database Design
• Process of producing a description of the database implementation on secondary storage - Base relations - File organization - Indexes - Integrity constraints - Security measures • Must be tailored to a specific DBMS
Physical Database Design
In order for a relation to exist, all of the following properties must exist 1.The relation is perceived as a two-dimensional structure composed of rows and columns 2.The relation's name is distinct from all other relation names in the relational schema 3.Each relation has an attribute or combination of attributes that uniquely identifies each tuple (each tuple is distinct; there are no duplicate tuples) 4.Each tuple represents a single entity occurrence within the relation 5.Each cell (row/column intersection) of the relation contains exactly one, atomic (single) value 6.Each attribute has a distinct name 7. The values of an attribute are all from the same domain 8.The order of tuples and attributes has no significance •Practically, the order of tuples may affect efficiency of record retrieval in a database
Properties of Relations
•One computer with a single CPU and a number of terminals •Processing performed within same physical computer •User terminals are "dumb" incapable of functioning on their own, and are cabled to the central computer
Teleprocessing
- Late - Over budget - Unreliable - Difficult to maintain - Performing poorly • Structural approaches to development are available as a Software Development Lifecycle (SDLC)
The Importance of Analysis - As a result, many major software projects are:
- Lack of a complete requirements specification - Lack of appropriate development methodology - Poor decomposition of design into manageable components
The Importance of Analysis - Major reasons for failure of software projects:
•Three Tiers-Client (tier 1) manages user interface-Application Server (tier 2) manages application logic -Server (tier 3) holds database and DBMS; includes database access and server-side validation •Relies on "thin" client with less expensive hardware •Easier to modify or replace one tier without impacting others•Easier to load balance traffic •A standard for some Web environments
Three-Tier Client-Server
•Two Tiers-Client (tier 1) manages user interface; runs application logic -Server (tier 2) holds database and DBMS; includes database access and server-side validation •Client and server do not need to reside on the same server •Increased performance, possible reduction in hardware costs
Traditional Two-Tier Client-Server
• Unified Modeling Language (UML) • Chen • Crow's Foot In Chen and Crow's foot notation, an entity type is represented by a rectangle and is labeled with a singular noun and typically all capital letters (e.g. STUDENT, COURSE, etc.)
entity type Examples
•Three-tier architecture can be expanded to ntiers, with more flexibility and scalability •Application servers host programming interfaces (APIs) to expose business logic and processes for use by other applications •Examples of application servers-Java Platform Enterprise Edition (J2EE)-.NET Framework-Oracle Application Server
n-Tier Client-Server
is a set of all relation schemas for the database
relational database schema (or conceptual model or conceptual schema)
illustrates how two or more entity types are associated to each other
relationship type
•A shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization •System catalog (metadata) provides description of data to enable program-data independence •Logically related data contains details about the organization's entities, attributes, and relationships of an organization's information
what is a Database?
-Physical representation of the database on the computer-Describes how data is stored in the database
•Internal Level
-Community view of database -Describes what data is stored in database and relationships among the data
•Conceptual Level
-Users' view of database -Describes part of database relevant to a particular user or application
•External Level
-If you know the value of attribute(s) A, you can determine the value of attribute(s) B-Example:•Knowing a student's UID number, you are able to look up/determine that student's last name, first name, etc.•The reverse is not definitively true. If you know a student's last name, you cannot definitively determine the UID number because several students might have the same last name
A key's role is based on the principle of determination
exists when one or more instances of duplicate data is updated, but not all - Potential source of data inconsistency - Must locate all tuples associated with the update
A modification (update) anomaly
means no data entry, a missing value, or not applicable. It does not mean a blank space, the value "n/a", the value "null", or any other set of characters - Null values should be avoided if possible because they can create problems when tables are joined or when mathematical functions are used for calculation (e.g. AVG, SUM, etc.)
A null value
describes a condition where a non-determinant attribute determines another non-determinant attribute • Consider that A, B, and C are (sets of) attributes of a relation, such that: - If A -> B, and B -> C - Then C is transitively dependent on A via B • Example: - staffNo -> sName, position, salary, branchNo, bAddress - branchNo -> bAddress - staffNo -> bAddress via branchNo
A transitive dependency
- a particular job role (e.g. manager, supervisor, etc.) or - an enterprise application area (e.g. marketing, personnel, stock control, etc.) • Database applications likely will have more than one user view • Identifying user views helps ensure no major users of the database are forgotten when developing requirements for the new system • User views help in development of a complex database system by breaking down requirements into more manageable pieces
A user view defines what is required of a database system from the perspective of:
is a uniquely identifiable object of an entity type - Entity occurrences will correspond to a tuple in the relational model and a row in a relational database
An entity occurrence (or entity instance)
is a group of objects with the same properties, identified through business rules - Entity types will correspond to a relation in the relational model and a table in a relational database
An entity type
when the entity type exists in the database only when it is associated with another related entity occurrence - Example: DEPENDENT entity type is existence-dependent on EMPLOYEE - No independent primary key - Often referred to as a weak entity
An entity type exhibits existence dependence
when the entity type can exist apart from one or more related entity types - Often referred to as a strong entity
An entity type exhibits existence independence
occurs when certain attributes cannot be inserted without the presence of other attributes - Potential source of data inconsistency • Example - How could a staff person be added? - How could a branch be added?
An insertion anomaly
• Design of user interface and application programs that use and process the database • Database design and application design are typically parallel activities • Includes two activities: - transaction design • An action or a series of actions, carried out by a single user or application program, which accesses or changes database content • Define and document high-level characteristics of required transactions - user interface design
Application Design
- Chen notation: attributes are represented by ovals connected to entity type rectangle with a line - Crow's Foot notation: attributes are written in the entity type rectangle
Attributes are properties or characteristics of entity types
-Entity-Relationship -entities and the relationships between them -Semantic -more expressive data model with relationships, inheritance, and data abstraction •No single accepted practice -Functional -functions and the transactions between them -Object-Oriented -objects and the relationships between them
Categories of Data Models•Object-Based Data Models
•Hardware -Can range from a single PC to a network of computers •Software -DBMS, operating system, network software, and application programs •Data -Used by the organization and metadata (description of the data) •Procedures -Instructions and rules that should be applied to the design and use of the database and DBMSs •People
Components of a DBMS Environment
• Process of constructing a model of the data used in an enterprise, independent of all physical considerations and implementation details • Data model is built, tested, and validated using the information in users' requirements specification - Entity Relationship Diagram shows the conceptual design's entities, attributes, and relationships between entities • Conceptual data model is source of information for logical design phase
Conceptual Database Design
-Result is a conceptual data model
Conceptual Modeling is a process of developing a model of information use that is independent of implementation details
-A schema is a blueprint of the database construction -Identifies the core of a database system supporting all user views -Should be a complete and accurate representation of an organization's data requirements
Conceptual Modeling•Conceptual Schema
is a problem arising when creating an E-R model, typically due to misrepresentation of the meaning of relationships - Incorrect modeling may not adequately represent the "real world" • Types - Fan Trap - Chasm Trap
Connection Traps
Structural validity Consistency with the way the enterprise defines and organizes information. Simplicity Ease of understanding by IS professionals and nontechnical users. Expressibility Ability to distinguish between different data, relationships between data, and constraints. Non redundancy Exclusion of extraneous information; in particular, the representation of any one piece of information exactly once. Shareability Not specific to any particular application or technology and thereby usable by many. Extensibility Ability to evolve to support new requirements with minimal effect on existing users. Integrity Consistency with the way the enterprise uses and manages information. Diagrammatic representation Ability to represent a model using an easily understood diagrammatic notation.
Criteria for Optimal Data Modeling
•Control of data redundancy •Data consistency •Sharing of data •More information from integrated data •Improved data integrity •Improved security •Enforcement of standards •Cost savings/economy of scale •Balanced conflicting requirements •Improved data accessibility and responsiveness •Increased productivity •Improved maintenance through data independence •Increased concurrency •Improved backup and recovery services
DBMS Advantages
•Partitioned into several software components (or modules), each with its own operation •DBMS interfaces with other software components, such as user queries and application programs
DBMS Components
•Complexity•Size •DBMS cost •Additional hardware costs •Conversion cost -training, DB specialists, ... •Performance -general vs. specific •Greater impact of a failure
DBMS Disadvantages/Challenges
•Common architectures used to implement multi-user database management systems -Teleprocessing -File-Server -Client-Server
DBMS Functions
•A mechanism must allow only authorized users to perform database activities
DBMS Functions•Authorization Services
•A mechanism must allow multiple users to update the database concurrently
DBMS Functions•Concurrency Control Services
•A mechanism must allow network connectivity and communication
DBMS Functions•Data Communication Support
•A mechanism must allow data and changes follow specified rules
DBMS Functions•Integrity Services
A mechanism must allow users to create, read, update, and delete data in the database
DBMS Functions•List of services derived from E.F. Codd of what all DBMS should include-Data Storage, Retrieval, and Update•
•A mechanism must allow database recovery or rollback in the event of database damage
DBMS Functions•Recovery Services
•A mechanism must support the independence of program functionality from database structure
DBMS Functions•Services to Promote Data Independence
•A mechanism must exist to enable all-or-nothing transaction support
DBMS Functions•Transaction Support
•A mechanism must allow users to access a catalog of metadata
DBMS Functions•User-Accessible System Catalog
•Mechanisms should exist to aid the DBA in administering the database, such as importing/exporting data, monitoring performance, and generating statistics
DBMS Functions•Utility Services
-Refers to immunity of external schemas to changes in conceptual schema •Example: Conceptual schema changes (e.g. addition/removal of entities) -Should not require changes to external schema or rewrites of application programs
Data Independence and the Three-Level Architecture•Logical Data Independence
-Refers to immunity of conceptual schema to changes in the internal schema •Example: Internal schema changes, such as using different file organizations, storage structures, devices, etc.) -Should not require change to conceptual or external schemas
Data Independence and the Three-Level Architecture•Physical Data Independence
-General querying language of the data -Allows the user to create, read, update, and delete database data •Provides basic data manipulation operations on data held in the database -Think CRUD (create, read, update, and delete)
Data Manipulation Language (DML)
serves the purposes of: - to assist in understanding the meaning (semantics) of the data - to facilitate communication about the information requirements • Building a data model requires answering questions about entities, attributes, and relationships • A resulting data model will help ensure understanding of: - Each user's perspective of the data - Nature of the data itself, independent of its physical representations - Use of data across user views
Data Modeling
•Interfaces with application programs and queries •Examines the external and conceptual schemas to determine which records meet the query request •Places a call to the file manager to perform the request
Database Manager Components
• Management activities that allow stages of database of the DBLC to be realized as efficiently and effectively as possible • Must be integrated with overall IS organizational strategy
Database Planning
• Once a mission statement is defined, mission objectives are defined • Each objective should identify a particular task that the database must support • Examples: - To maintain (create, read, update, and delete) data on students - To perform searches on students - To track the courses each student completes for a transcript - To report on students • May be accompanied by some additional information - the work to be done - the resources with which to do it - the location of the money to pay for it all • Mission objectives are created by conducting interviews or having end users complete questionnaires • Make sure you include a wide variety of end users • Sample Questions: - What is your job description? - What kinds of tasks do you perform in a typical day? - What kinds of data do you work with? - How does the data relate to each other? - What types of reports do you use? - What types of things do you need to keep track of? • Alternative: Job shadowing is useful when the end user may not have the time for an interview or questionnaire
Database Planning - Mission Objectives
• A mission statement for a database project succinctly defines the major aims of the database and its associated applications • Helps clarify the purpose of the database project and provides a clear path towards the efficient and effective creation of the required database system • Examples: - The purpose of the Internship Talent database is to track intern data to supply information that supports students looking for internships and the services - The purpose of the Donations4Terps database is to maintain donor and donation event data used by the institution to increase donations for scholarships to UMD students
Database Planning - Mission Statement
• Part of the software development lifecycle • A phased approach from planning to implementation of a database - Database planning - System definition - Requirements collection and analysis - Database Design - DBMS selection (if needed) - Application Design - Prototyping (sometimes) - Implementation - Data Conversion and Loading - Testing - Operational Maintenance
Database System (Development) Lifecycle - DBLC
• Creation of normalized relations is an important database goal • Processing requirements should also be a goal • If relations are decomposed to conform to normalization requirements, the number of relations expands • Joining a larger number of physical tables reduces system speed • Conflicts are often resolved via compromise that may include denormalization • Defects of unnormalized database tables - Data updates are less efficient - Indexing is more cumbersome
Denormalization
- Increased performance, but greater data redundancy - Often used in data warehousing/historical databases
Denormalization produces a lower normal form
is a structured approach that uses procedures, techniques, tools, and documentation aids to support and facilitate the process of design - Conceptual design phase - Logical design phase - Physical design phase
Design Methodology
•The principle of determination is used to identify functional dependencies-Attribute B is functionally dependent on attribute A if each value in attribute A determines one and only one value in attribute B-Written as: A -> B
Functional Dependence
• Physical realization of the database and application designs - Use DDL to create database schemas and any specified user views - Use DML to populate data within the database schemas - Use 3GL or 4GL languages to create the application programs
Implementation
determinants have the minimal number of attributes necessary to maintain the functional dependency with attribute(s) on the right hand side - If A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A - Example: staffNo -> sName, position, salary, branchNo • Each value of staffNo is associated with a single value of sName, position, salary, and branchNo
In a full functional dependency,
attributes can be removed from the determinant and the dependency still holds - A partial dependency can only occur when the determinant has multiple attributes - Attributes are dependent on only part of the determinant - Example: staffNo, sName -> branchNo • Each value of (staffNo, sName) is associated with a single value of branchNo • branchNo can be determined solely by staffNo
In a partial dependency
• Requirement - In a base relation, no value of a primary key can be null and all primary key entries are unique • Purpose - Each tuple will have a unique identity and foreign key values can properly reference primary key values • Example - No invoice can have a duplicate number, nor can it be null - All invoice details are uniquely identified by their invoice number
Integrity Constraint Rules - Entity Integrity
• Requirement - Additional rules specified by users or database administrators that define or constrain some aspect of the enterprise - Usually based on business rules • Purpose - To maintain data integrity, it is important to enforce rules on the data being stored in the database - "Garbage in, garbage out" • Examples - All states must be specified by their two letter abbreviation - A student's date of birth must be between 1/1/1970 and today's date
Integrity Constraint Rules - General (Other) Constraints
exists when certain attributes are lost because of the deletion of other attributes - Potential source of data loss
A deletion anomaly
can show the primary key and all dependencies - Helpful in getting a bird's eye view of all relationships among a relation's attributes - Makes it less likely you will overlook an important dependency - "Good" dependencies are shown above the diagram - "Bad"/problematic dependencies are shown below the diagram • Partial dependencies • Transitive dependencies
A dependency diagram
is where a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous • Example: A single division operates one or more branches and has one or more staff
A fan trap
describes the relationship between attributes when one attribute uniquely determines another attribute • Important concept with normalization • If A and B are attributes of R: - B is functionally dependent on A (denoted A -> B) if each value of A in R is associated with exactly one value of B in R - A and B may each consist of one or more attributes
A functional dependency
- Candidate keys that are not selected to be primary keys - May be used for data retrieval purposes
Alternate Key
is the repetition of the same data in multiple places
Data redundancy
-top-down -bottom-up -inside out -mixed strategy
Database Design Approaches • Main approaches include:
• Create a relation for each strong entity type - Include all simple attributes - Flatten composite attributes into simple attributes - Include the primary key (bolded and underlined)
E-R Schema to Relational Schema: Strong Entity Type
•Processing is distributed about the network, typically using a local area network (LAN) •Application and DBMS run on each workstation •File server acts as a shared hard disk drive •Performance problems may occur with large network traffic generated
File-Server Architecture
•Records and sets in a tree structure are used in the model •Only one parent is allowed
Hierarchical Data Model
Limitations of File-Based Systems•Separation and isolation of data-Each program maintains its own data set-Users of one program may be unaware of potentially useful data held by other programs•
Limitations of File-Based Systems•Separation and isolation of data
• The process of constructing a model of the data used in an enterprise based on a specific data model (e.g. relational) - Independent of a particular DBMS - Do not care about any physical considerations, such as file storage
Logical Design Phase
- Centralized approach - View integration approach - Combination of centralized and view integration approaches
Managing Database System Requirements • Three main approaches
People may knowingly or unknowingly perform differently when being observed May miss observing tasks involving different levels of difficulty or volume normally experienced during that time period Some tasks may not always be performed in the manner in which they are observed May be impractical
Observing the Organization Disadvantages
• The process of producing a description of the implementation of the database on secondary storage - Describes the base relations, file organizations, and index design to achieve efficient data access - Integrity constraints and security measures put into place
Physical Database Design
-Responsible for management of data resource
Roles in the Database Environment•Data Administrator (DA)
-Clients of the database-May be naïve or sophisticated
Roles in the Database Environment•End Users
•Databases work best by hiding the details of how data is stored and manipulated (abstraction) •Multiple views are provided into the database based on the user• American National Standards Institute (ANSI) Standards Planning and Requirements Committee (SPARC) created an architectural structure used by most commercial databases today
The Three-Level Architecture
•Allows a user to state what data is needed rather than how it is retrieved •Used in accessing data in the relational mode -Example: Structured Query Language (SQL) •Constitutes fourth generation languages (4GL) •Example: SELECT address FROM customer WHRE name = 'Henry';
Types of DML-Non-Procedural DML
• Requirements for each user view remain as separate lists • Data models representing each user view are created and then merged later during the database design stage
View Integration Approach to Managing Multiple User Views
• Develop a data model with a few high-level entities and relationships • Apply successive top-down refinements to identify lower-level entities • Creating an entity-relationship (ER) diagram uses a top-down approach
- Top-down (complex databases)
• Requirement - If a foreign key exists in a relation, one of the following must be true: • The foreign key value must match a candidate key value of some tuple in its home relation • Foreign key value must be null and not be part of the relation's primary key • Purpose - It is possible for an attribute NOT to have a corresponding value, but it is impossible for an invalid entry - Referential integrity makes it impossible to delete a tuple in a relation whose primary key has mandatory matching foreign key values in another relation • Example - A customer might not yet have an assigned sales representative (number) - It is impossible to be assigned to a non-existent/invalid sales representative (number)
Integrity Constraint Rules - Referential Integrity
• Mostly commonly used and normally most useful to obtain information face-to-face • Objectives include finding out facts, verifying facts, clarifying facts, generating enthusiasm, getting the enduser involved, identifying requirements, and gathering ideas and opinions • Types of interviews: - Structured, closed-ended questions restrict answers to either specific choices or short, direct responses - Unstructured, open-ended questions allow the interviee to respond in any way that seems appropriate
Interviewing
-Same data is held by different programs -Wasted space and potentially different values and/or different formats for the same item
Limitations of File-Based Systems-Duplication of data-
-Programs are written to satisfy particular functions-Any new requirement needs a new program
Limitations of File-Based Systems-Fixed queries/lots of additional application programs
-File structure is defined in the program code
Limitations of File-Based Systems•Data dependence
-Programs are written in different languages....cannot easily access each other's files•
Limitations of File-Based Systems•Incompatible file formats
is a design process for evaluating and correcting data structures to reduce data anomalies - It produces a suitable set of relations that support the data requirements of an enterprise - It is a formal technique for analyzing relations based on their primary key (or candidate keys) and functional dependencies • Minimizes data redundancies • Reduces data inconsistencies
Normalization
• Easier for the user to access and maintain data • Reduction in file storage space required by the relations, minimizing costs • Updates to data stored are achieved with minimal operations reducing opportunities for data inconsistencies • Produces better information from querying because anomalies are removed Creation of normalized relations should be an important design goa
Normalization Benefits
- Each relation represents a single subject/theme - No data attribute will be unnecessarily stored in more than one relation - All nonprime attributes in a relation are dependent on the primary key - Each relation is void of insertion, modification, and deletion anomalies
Normalization Success - A database design is successfully normalized if:
•All users should be able to access the same data •A user's view is immune to changes made in other views •Users should not need to know the physical details of data storage •DBA should be able to change database storage structures (internal schema) without affecting users' views (conceptual schema) •Internal database structure should be unaffected by changes to physical storage aspects •DBA should be able to change all conceptual database structures without affecting all users
Objectives of the Three-Level Architecture
Advantages Allows the validity of facts and data to be Checked Observer can see exactly what is being done Observer can also obtain data describing the physical environment of the task Relatively inexpensive Observer can do work measurements
Observing the Organization Advantages
- A candidate key selected to uniquely identify tuples within a relation • Cannot contain null entries • In a relational schema, the primary key is underlined
Primary Key
• Building a working model of a database system • Purpose - Identify features of a system that work well or are inadequate - Suggest improvements or new features - Clarify user requirements - Evaluate feasibility of a particular system design
Prototyping
People can complete and return questionnaires at their convenience Relatively inexpensive way to gather data from a large number of people People more likely to provide the real facts as responses can be kept confidential Responses can be tabulated and analyzed quickly
Questionnaire Advantages
Number of respondents can be low, possibly only 5% to 10% Questionnaires may be returned incomplete May not provide an opportunity to adapt or reword questions that have been misinterpreted Cannot observe and analyze the respondent's body language
Questionnaire Disadvantages
• Conduct surveys through questionnaires, which are special-purpose documents that allow facts to be gathered from a large number of people while maintaining some control over their responses. • There are two types of questions, namely free-format and fixed-format
Questionnaires
-First man to walk on the moon -The first Woodstock music festival was held -E.F. Codd proposes the relational model for database management
Relational Databases: Historical Perspective•Back in 1969, all of these things happened:
•Database processing is the heart of almost all applications and information systems •There is a demand for people knowledgeable with databases in several areas: -Application development -Data analytics/Data warehousing -Data privacy and security
Relational Databases: Today
•A relation is a table with columns and rows -Only applies to logical structure of the database, not the physical structure •An attribute is a named column of a relation •The domain is the set of allowable values for one or more attributes •A tuple is a row of a relation •Degree is the number of attributes in a relation •Cardinality is the number of tuples in a relation •A relational database is a collection of normalized relations with distinct relation names
Relational Model Terminology
is a process of collecting and analyzing information about the part of the organization to be supported by the database system • Information is gathered for each major user view including: - a description of data used or generated - details of how data is to be used/generated - any additional requirements for the new database system
Requirements collection and analysis
• Useful to research the application and problem • Use computer trade journals, reference books, and the Internet • Provide information on how others have solved similar problems, plus whether or not software packages exist to solve or even partially solve the problem
Research
Can save time if solution already exists Researcher can see how others have solved similar problems or met similar requirements Keeps researcher up to date with current developments
Research Advantages
Requires access to appropriate sources of information May ultimately not help in solving problem because problem is not documented elsewhere
Research Disadvantages
-Responsible for creating application programs to interact with the database
Roles in the Database Environment•Application Developers
-Responsible for physical realization of database
Roles in the Database Environment•Database Administrator (DBA)
-Responsible for modeling the database design (logical) and physically implementing it using database tables, constraints and other constructs (physical)•
Roles in the Database Environment•Database Designers
ensures that any instance of the original relation can be found from corresponding instances in the smaller relation
Solution to Avoid Anomalies • Decompose StaffBranch into Staff and Branch • A decomposition must satisfy two properties: - A lossless-join property
ensures the enforcement of a constraint on the original relation by enforcing constraints on each of the smaller relations
Solution to Avoid Anomalies • Decompose StaffBranch into Staff and Branch • A decomposition must satisfy two properties: - A dependency preservation property
- An attribute (or combination of attributes) that uniquely identifies a tuple within a relation
Superkey
•Repository of information (metadata) describing data in the database •One of the fundamental components of a DBMS •Typically stores -Names, types, and sizes of data items -Constraints on the data -Names of authorized users -Data items accessible by a user and the type of access -Usage statistics •Used by modules such as Authorization Control and Integrity Checker
System Catalog
• Used throughout the database application lifecycle • Most crucial in the early stages of database planning, system definition, and requirements collection and analysis stages • Enables developer to learn about terminology, problems, opportunities, constraints, requirements, and priorities of the organization and users of the system
Timing of Fact-Finding Techniques
•Allows user to tell system exactly how to manipulate data •Used in accessing data in the network and hierarchical models •Example: GET FIRST customer WHERE customer.name := "Henry";
Types of DML-Procedural DML
• A table that contains one or more repeating groups • To create an unnormalized table - Transform data from an information source (e.g. form) into a tabular format with rows and columns
Unnormalized Form (UNF)
•Integrated collection of concepts for describing data, relationships between data, and constraints on the data within an organization •Purpose: To represent data in an understandable way •Data model components: -A structural part -how the database will be constructed -A manipulative part -the types of operations allowed on the data -A set of integrity rules -ensures the data remains accurate
What is a Data Model?
• The minimal number of attributes necessary to support the data requirements of the enterprise • Attributes with a close logical relationship are found within the same relation • Minimal data redundancy with each attribute represented only once - Foreign keys are excluded from this
What is a Suitable Set of Relations?
-In society, we are becoming more dependent on data •Data drives business decisions •We frequently use websites such as Google and Wikipedia to search for knowledge
Why Do We Need Databases?•"Data is the commodity"
-According to Cisco, we entered the Zettabyte Era back in 2016 and are continuing to grow at a rapid pace•Zettabyte = 1,000,000,000,000,000,000,000 bytes! -There are massive amounts of data that need to be catalogued for retrieval
Why Do We Need Databases?•We have more data than we know what to do with
•Definition of data was embedded in application programs, rather than being stored separately and independently •No control over access and manipulation of data beyond that imposed by application programs •Result: A database and a database management system (DBMS)
Why is a Database Approach Needed?
Allows interviewee to respond freely and openly to questions \Allows interviewee to feel part of project / Allows interviewer to follow up on interesting comments made by interviewee/ Allows interviewer to adapt or reword questions during interview Blank/ Allows interviewer to observe interviewee's body language Blank
interviewing Advantages
Very time-consuming and costly, and therefore may be impractical/ Success is dependent on communication skills of interviewer/ Success can be dependent on willingness of interviewees to participate in interviews
interviewing Disadvantages
is a uniquely identifiable association which includes one occurrence from each participating type
relationships occurrence
Information is analyzed to identify requirements which are described in a requirements specification - The cost of missing requirements can be huge!
requirements specification
•A software system that enables users to define, create, maintain, and control access to the database -Controlled access may include systems of: security, integrity, concurrency control, recovery control, and cataloging•Relies on a database application program -A computer program that interacts with the database by issuing appropriate requests (queries) to the DBMS
what is a Database Management System (DBMS)?
has a value that may be calculated from other attributes or known data - Need not be physically stored in the database - Example: age can be derived from the date of birth and current date
• A derived attribute
is created representing all user views during the database design stage
• A global data model
- Attribute, or set of attributes within one relation that matches a candidate key of some (possibly the same) relation, or be null • Foreign key references primary key - In a relational schema, the foreign key is italicized
• Foreign Key
are created each representing a single user view (or subset of all user views) • Each model includes diagrams and requirements documentation for one or more, but not all views of the database • Local data models are merged at a later stage during database design to produce the global data model
• Local data models
describes scope and boundaries of database system and the major user views
• The system definition
-Permits specification of data types, structures, and any data constraints -All specifications are stored in the database •Creates and defines the data structure (metadata) within a system catalog -Allows the DBA or user to describe and name entities, attributes, and relationships required for the application -Data integrity and security constraints may also be defined
•Data Definition Language (DDL)