BIS 324 Test 1
Relational Database Model
In 1970, E. F. Codd published "A Relational Model of Data for Large Shared Data Banks" in CACM. In the early 1980s, commercially viable relational database management systems became available. While relational database was very tempting in concept in the 1970s, it was not easily applicable in a real-world environment for reasons related to performance. The earlier hierarchical and network database management systems were just coming onto the commercial scene and were the focus of intense marketing efforts by the software and hardware vendors.
Comparisons
In addition to equal (=), the standard comparison operators can be used in the WHERE clause. Greater than (>) Less than (<) Greater than or equal to (>=) Less than or equal to (<=) Not equal to (<>)
Ternary Relationships
Involves three different entity types.
ORDER BY
Orders the results of an SQL query by one or more specified attributes. The default order for ORDER BY is ascending. The clause can include the term ASC at the end to make ascending explicit. The clause can include DESC for descending order.
Data Integrity
Refers to the accuracy of the data. Inaccurate data leaves the whole information system of limited value.
Data
Representations of meaningful objects/events
Punched cards and 2000 Pres. Election
There were problems in florida Hanging Door Chad Swinging Door Chad Tri Chad Dimpled Chad Pregnant Chad Made cards hard to read ---- Butterfly ballot: people used wrong hole or two holes, messing up data
1-M Binary Relationship
Use "crow's foot" to represent the multiple association. "many" = the maximum number of occurrences that can be involved, means a number that can be 1, 2, 3, ... n.
Database
a highly organized collection of assembled data (don't have to be electronic)
LIKE
"%" character used as a "wildcard" to represent any string of characters. The single "_" character in the operator LIKE "_a%" specifies that there will be one character followed by "a."
M-M Binary Relationship
"many" can be either an exact number or have a known maximum Has its own relation in the database. Can have its own attributes. It is a kind of entity -- an Associative Entity
The Database Concept
1. Data as a Manageable Resource: The creation of a data-centric environment in which a company's data can truly be thought of as a significant corporate resource. A key feature of this environment is the ability to share data among those inside and outside of the company who require access to it. Information systems environment: -Hardware -Networks -Applications software -Systems software -People -Data Historically, data has not been the highest priority concern. As the operational, tactical, and strategic corporate levels became more dependent on information systems, data increasingly became recognized as an important corporate resource. The corporate community became increasingly convinced that a firm's data could provide a significant competitive advantage to the firm. It became clear that data would have to be managed in an organized way. Needed a software utility that could manage and protect data. Data could be a critical corporate resource. Out of this need was born the database management system. 2. Data Integration and Data Redundancy: Data integration - the ability to tie together pieces of related data within an information system. Data redundancy - the same fact about the business environment is stored more than once within an information system. There are some problems: -Redundant data takes up a great deal of extra disk space. -If the redundant data has to be updated, it takes additional time to do so. This can be a major performance issue. -There is the potential for data integrity problems. 3. Multiple Relationships 4. Data Control Issues 5. Data Independence: Data Dependence - if for any reason the storage characteristics of the data had to be changed, the application program itself had to be modified, often extensively. Data Independence - to have a data storage and programming environment in which as many types of changes in the data structure as possible would not require changes in the application programs that use them.
Entity-Relationship (ER) Model
A diagramming technique Diagrams entities (with attributes) and the relationship between the entities. There are many variations of E-R diagrams in use.
Primary Key
A relation always has a unique primary key. A primary key (also called "the key") is an attribute or a group of attributes whose values are unique throughout all of the rows of the relation. The number of attributes involved in the primary key is always the minimum number of attributes that provide the uniqueness quality. In the worst case, all of the relation's attributes combined could serve as the primary key.
1-1 Binary Relationship
A single occurrence of one entity type can be associated with a single occurrence of the other entity type and vice versa.
What is Data?
A single piece of data is a single fact about something that interests us A fact can be any characteristic of an object
Reasons to Draw Super and Subtypes
A subtype has unique attributes that do not apply to all the instances of the entity -Note that only one or two attributes may not be enough - evaluate the number of nulls resulting in the database -Evaluate whether the attribute will only be null sometimes (STREETADDR2) or won't exist at all A subtype has a unique relationship that does not apply to all instances of the entity
Blaise Pascal
Adding machine that was an early version of today's mechanical automobile odometers (1642)
BETWEEN
Allows you to specify a range of numeric values in a search.
Foreign Key
An attribute or group of attributes that serves as the primary key of one relation and also appears in another relation (foreign key in this relation). Crucial in relational database, because the foreign key is the mechanism that ties relations together to represent unary, binary, and ternary relationships. Foreign key attribute must have same domain of values as Primary key attribute in other relation.
Unary Relationships
Associate occurrences of an entity type with other occurrences of the same entity type
Unary Relationships
Associate occurrences of an entity type with other occurrences of the same entity type.
Relationships
Association between entities Different kinds: -Binary -Unary -Ternary
Current Data Storage media
Compact Disc (CD) - introduced as a data storage medium in 1985. -Plastic disk with reflective surface -Data encoded by pits in surface -Read by laser -Store more data -More durable (last longer) Solid-state technology - Flash drives (1999 and later) -Charge carriers (electrons) in a solid material (like an integrated circuit) -No moving parts -Smaller, faster -More capacity DVD (Digital Versatile Disc) -Smaller laser, smaller pits -More capacity
Constraints in Super/Subtype Relationships
Completeness Constraints: Whether an instance of a supertype must also be a member of at least one subtype -Total Specialization Rule: Yes (double line) -Partial Specialization Rule: No (single line) Disjointness Constraints: Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes -Disjoint Rule: An instance of the supertype can be only ONE of the subtypes -Overlap Rule: An instance of the supertype could be more than one of the subtypes Subtype Discriminator: An attribute of the supertype whose values determine the target subtype(s) -Disjoint - a simple attribute with alternative values to indicate the possible subtypes -Overlapping - a composite attribute whose subparts pertain to different subtypes. Each subpart contains a Boolean value to indicate whether or not the instance belongs to the associated subtype
Data Management
Data Definition: Operationalized with a data definition language (DDL). Instructs the DBMS software on what tables will be in the database, what attributes will be in the tables, which attributes will be indexed, etc. --------- Data Manipulation: Refers to the four basic operations that can and must be performed on data stored in any DBMS. -Data retrieval -Data update -Insertion of new records -Deletion of existing records Requires data manipulation language (DML)
Data Control Issues
Data Security: A very broad topic Protecting the physical hardware environment Defending against hacker attacks Encrypting data transmitted over networks Educating employees on the importance of protecting the company's data ... and many more -------- Backup and Recovery Data can be lost or corrupted in any of a variety of ways: -a disaster such as a fire, a hurricane, or an earthquake -hackers -computer viruses -poorly written application programs unintentional error -------------- Concurrency Control Updates to a database can interfere with each other in such a way that the resulting data values will be incorrect. A database management system must be designed to protect its databases from such an eventuality.
Relational Database Concept
Data appears to be stored in what we have been referring to as simple, linear files. Relational databases are based on mathematics. A relational database is a collection of relations that, as a group, contain the data that describes a particular business environment.
Information
Data processed to increase a person's knowledge
Intersection Data
Describes the relationship between two entities. Used with many-to-many relationships. Represented on E-R diagram as an "associative entity" For example, we know not only that salesperson 137 sold some of product 24013 but also how many units of that product that salesperson sold.
Data as a Corporate Resource
Difficult corporate resource to manage Tremendous volume, billions, trillions, and more individual pieces of data Large diversity of data Need: -Software to manage data (DBMS) -Faster hardware with more capacity -Data mgmt specialists
Problems in Storing and Accessing Data
Difficult to store Difficult to provide efficient, accurate Volume is massive. Larger number of people want access to data: -Employees -Customers -Supply Chain partners Data Security:Involves a company protecting its data from theft, malicious destruction, deliberate attempts at making phony changes to the data. ex. someone trying to increase his own bank account balance. Data Privacy: Ensuring that even employees who normally have access to the company's data are given access only to the specific data that they need in their work. Backup and Recovery: The ability to reconstruct data if it is lost or corrupted. Wal-Mart estimates its data warehouse contains hundreds of terabytes (trillions of characters) of data.
Electronic Computers
ENIAC world's first electronic general purpose computer (1946) Commercial introduction of electronic computers in mid 1950s Witnessed a boom in economic development.
DMBS and Referential Integrity
Early relational DBMSs did not provide any control mechanisms for referential integrity. Modern relational DBMSs provide sophisticated control mechanisms for referential integrity: -Delete rules -Insert rules -Update rules
DISTINCT
Eliminate duplicate rows in a query result.
The Database Environment
Encourages data sharing Helps control data redundancy Has important improvements in data accuracy Permits storage of vast volumes of data with acceptable access. Allows database queries Provides tools to control: -data security -data privacy -backup and recovery
Super and Sub types
Enhanced ER model: extends original ER model with new modeling constructs Subtype: A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings Supertype: A generic entity type that has a relationship with one or more subtypes Attribute Inheritance: -Subtype entities inherit values of all attributes of the supertype -An instance of a subtype is also an instance of the supertype
Associative Entity
Entities can have attributes; many-to-many relationships can have attributes. Many-to-many relationship may be treated similarly to entities in an E-R diagram. The unique identifier of the associative entity is usually the combination of the unique identifiers of the two entities in the many-to-many relationship.
Things to Note about Super and Subtypes
Entities generally have attributes (if you draw an entity with no attributes, double check if you really have a subtype) The relationship between the supertype and subtype is ONE to ONE with the connector circle inbetween The PK of the supertype goes into the subtype as a PK FK Don't forget the subtype discriminator in the supertype
Data Before Database MGMT--Records and Files
Entity - a "thing" or "object" in our environment that we want to keep track of. Entity set - A collection of entities of the same type (e.g., all of the company's employees). Attribute - a property of, a characteristic of, or a fact that we know about an entity. Some attributes have unique values within an entity set. Record - each row of a structure like above Fields - the columns, representing the facts File - the entire structure Record type - a structural description of each and every record in the file Record occurrence / Record instance - a specific record of the salesperson file
Essence of Data Modeling
Exploring the different ways that entities can relate to each other as they always do in the real world Devising a way of recording, of diagramming, the entities and the ways in which they interrelate in the business environment
Before Databases
Flat Files -Plain text or mixed text/binary file One record per line Single fields separated by delimiters -Examples: student file, faculty file, class file -Application systems owned files -Departments used applications Limitations/Disadvantages -Program-Data Dependence -Duplication of Data -Limited Data Sharing -Long Application Development Times -More Program Maintenance
Retrieving and Manipulating Data
Four fundamental operations can be performed on stored data: -Retrieve or Read - looking at a record's contents without changing it -Insert - adding a new record to the file, as when a new salesperson is hired -Delete - deleting a record from the file, as when a salesperson leaves the company -Update - changing one or more of a record's field values
Generalization and Specialization
Generalization: The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP Specialization: The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN
Major DMBS Approaches
Hierarchical - navigational and Network - navigational Called navigational approaches because of the way that programs have to "navigate" through hierarchies and networks of data to find needed data. Developed in the 1960s and 1970s. Somewhat similar in structure. Suitable only for mainframe computers were an elegant solution to the redundancy/integration problem at the time Complex, difficult to work Now considered legacy systems ----------- Relational - became commercially viable in about 1980. Soon became the preferred DBMS approach and it has remained so ever since. -------------- Object-oriented - useful for a variety of niche applications. It is interesting to note that some of the key object-oriented database concepts have found their way into some of the mainstream relational DBMSs and some are described as taking a hybrid object/relational approach to database.
Candidate Key
If a relation has more than one attribute or minimum group of attributes that represents a way of uniquely identifying the entities, then they are each called a candidate key. When there is more than one candidate key, one of them must be chosen to be the primary key of the relation. Which candidate key to pick depends on the application using the database. Alternate key is a candidate key that was not chosen to be the primary key of the relation.
Database Operations
In addition to retrieving data we must be prepared to perform data maintenance operations, including: -inserting new records -deleting existing records -updating existing records
Punched Cards- Data Storage
Invented in 1805 by Joseph Marie Jacquard of France Stored fabric patterns, a form of graphic data, as holes in punched cards Of great importance for computing devices to follow
Magnetic Data Storage Media
Magnetic Tape - commercially available units in 1952. Direct Access Magnetic Devices - began to be developed at MIT in the late 1930s and early 1940s. Magnetic Drum - early 1950s; forerunners of magnetic disk technology. Magnetic Disk - commercially available in mid 1950s. Where is data stored using magnetism today?
Early Data Storage Media
Middle to late 1930s saw the beginning of the era of erasable magnetic storage media. By late 1940s, early work was done on the use of magnetic tape for recording data. By 1950, several companies were developing the magnetic tape concept for commercial use.
First Record Keeping
People live in cities- became specialized Recording of Data -How much a person has produced -What it can be sold for With time, different kinds of data were kept -Calendars, census data, etc.
Modified Product Relation
Product Numbers have been reduced to 2 digits for simplicity. Every individual unit item and every set of tools has its own row in the relation because every item and set is available for sale.
Modern Data Storage Media
Punched paper tape - The earliest form of modern data storage, introduced in the 1870s and 1880s. Punched cards were the only data storage medium used in the increasingly sophisticated electromechanical accounting machines of the 1920s, 1930s, and 1940s.
E-R Model Entity (and it's attributes)
Rectangular shape Salesperson = a type of entity Name of entity is in caps above the separator line. Entity type's attributes are shown below the separator line. PK and boldface denote the attribute(s) that constitute the entity type's unique identifier.
Relational Terminology
Relations - what we have been referring to as simple linear files. Also called tables. Row = record (files) = tuple (relation) Column = field (files) = attribute (relation)
Relationships and Subtypes
Relationships at the supertype level indicate that all subtypes will participate in the relationship The instances of a subtype may participate in a relationship unique to that subtype. In this situation, the relationship is shown at the subtype level
Cardinality
Represents the maximum number of entities that can be involved in a particular relationship. 1-1 1-M M-N
Three Delete Rules
Restrict: If an attempt is made to delete a record on the "one side" of the one-to-many relationship, the system will forbid the delete to take place if there are any matching foreign key values in the relation on the "many side." If an attempt is made to delete the record for salesperson 361 in the SALESPERSON relation, the system will not permit the deletion to take place because the CUSTOMER relation records for customers 1525 and 1700 include salesperson number 361 as a foreign key value. --- Cascade: If an attempt is made to delete a record on the "one side" of the relationship, not only will that record be deleted but all of the records on the "many side" of the relationship that have a matching foreign key value will also be deleted. The deletion will cascade from one relation to the other. If an attempt is made to delete the record for salesperson 361 in the SALESPERSON relation, that salesperson record will be deleted and so too, automatically, will the records for customers 1525 and 1700 in the CUSTOMER relation because they have 361 as a foreign key value. --- Set-to-Null If an attempt is made to delete a record on the "one side" of the one-to-many relationship, that record will be deleted and the matching foreign key values in the records on the "many side" of the relationship will be changed to null. If an attempt is made to delete the record for salesperson 361 in the SALESPERSON relation, that record will be deleted, and the Salesperson Number attribute values in the records for customers 1525 and 1700 in the CUSTOMER relation will have their Salesperson Number attribute values changed from 361 to null.
Relational Select Operator
Retrieves a horizontal slice of the relation. Select rows from the SALESPERSON relation in which Salesperson Number = 204. The result of a relational operation will always be a relation.
Relational Project Operator
Retrieves a vertical slice of the relation. Project the Salesperson Number and Salesperson Name over the SALESPERSON relation.
Referential Integrity
Revolves around the circumstance of trying to refer to data in one relation in the database, based on values in another relation. --- Deletion: A problem arises, e.g., because a deleted record, a salesperson record, is on the "one side" of a one-to-many relationship. --- Insertion: if a new record is inserted into the "one side" (SALESPERSON relation) of the one-to-many relationship, there is no problem. If a new customer record is inserted into the "many side" (CUSTOMER relation) of the one-to-many relationship and it happens to include a salesperson number that does not have a match in the SALESPERSON relation—that would cause the same kind of problem as the deletion example --- Update: Updating a foreign key value. For example, a salesperson number in the CUSTOMER relation with a new salesperson number that has no match in the SALESPERSON relation.
Data Retrieval Method
Sequential access - the retrieval of all or a portion of the records of a file one after another, in some sequence, starting from the beginning, until all of the required records have been retrieved. -Physical sequential access - records are retrieved, one after the other, just as they are stored on the disk device. -Logical sequential access - records are retrieved in an order based on the values of one or a combination of the fields. ----------------- Direct Access - the retrieval of a single record of a file or a subset of the records of a file based on one or more values of a field or a combination of fields in the file. -a crucial concept in information systems today -requires hardware storage device that will accommodate direct access -requires software that will take advantage of the hardware's capabilities and store and retrieve the data in such a way that it accomplishes direct access.
Binary Relationships
Simplest kind of relationship Relationship between two entity types A salesperson "sells" products or products are "sold" by salespersons
Database Management System
Sophisticated software that controls the database and the database environment A software system that is used to create, maintain, and provide controlled access to user databases Three essential parts: central repository, controlling software agent, storage of data in a standardized, convenient form DBMS manages data resources like an operating system manages hardware resources
SQL
Structured Query Language Incorporates both DDL and DML features. Very heavily used in practice today.
Modern Era of Info Processions
The 1880 U.S. Census took about seven years to compile by hand. Basing his work on Jacquard's, Herman Hollerith arranged to have the census data stored in punched cards and invented machinery to tabulate them. 1890 U.S. Census population count done in 1 month (2 years for all data to be tabulated) In 1896 Hollerith formed the Tabulating Machine Company to produce and commercially market his devices -- this later became IBM.
File/Relation: Differences
The columns of a relation can be arranged in any order without affecting the meaning of the data. This is not true of a file. The rows of a relation can be arranged in any order, which is not true of a file. Every row/column position (a cell) can have only a single value, which is not necessarily true in a file. No two rows of a relation are identical, which is not necessarily true in a file.
Data Retrieval from a Relational Database
The discussion thus far has concentrated on: -how a relational database is structured -loading a database with data Let's discuss the effort to retrieve the data in a way that is helpful and beneficial to the business organization that built the database. -------- Have the ability to accept high level data retrieval commands Process the commands against the database's relations and return the desired data.
Modality
The minimum number of entity occurrences that can be involved in a relationship. "inner" symbol on E-R diagram ("outer" symbol is cardinality)
Data Accuracy
The same data is stored several, sometimes many, times within a company's information system. When a new application is written, new data files are created to store its data. Data can be duplicated within a single file and across files.
Ternary Relationship
These new General Hardware Co. relations are all independent with no foreign keys in any of them. The SALES relation shows how this ternary relationship is represented in a relational database. The primary key of the additional relation (SALES) will be (at least) the combination of the primary keys of the entities involved in the relationship.
Domain of Values
Two attributes have the same domain of values if the attributes have values of the same type. e.g., Salesperson Number in SALESPERSON and in CUSTOMER - three digit whole numbers that are the identifiers for salespersons.
Anomalies
Typically occur in poorly structured files. Problems arise when two different kinds of data, like salesperson and customer data are merged into one file. Deletion Anomaly - e.g, if you delete a customer and that record was the only one for a salesperson, the salesperson's data is gone. Insertion Anomaly - e.g., General Hardware cannot add data about a new salesperson the company just hired until she is assigned at least one customer. Update Anomaly - redundant data in the database file must be updated each place it exists when it changes.
SQL SELECT
Used for data retrieval. You specify what data you are looking for rather than provide a logical sequence of steps that guide the system in how to find the data. Can be run in either a query or an embedded mode. Command will work with Oracle, MS Access, SQL Server, DB2, Informix, etc. SELECT <columns> FROM <table> WHERE <predicates identifying rows to be included>; The desired attributes are listed in the SELECT clause. The required table is listed in the FROM clause. The restriction (predicate) indicating which row(s) is involved is shown in the WHERE clause in the form of an equation. The "*" indicates that all attributes of the selected row are to be retrieved. For a Relational Algebra Project operation, there is no need for a WHERE clause to limit which rows of the table are included.
AND and OR
With the AND operator, both conditions have to be satisfied to be included in the result. The OR operator really means one or the other or both. AND is said to be "higher in precedence" than OR. So all ANDs are considered before any ORs are considered. If you really wanted the OR to be considered first, you could force it by writing the query with parenthesis around the OR
Using Data for Competitive Advantage
indispensable and fundamental corporate resource a crucial competitive advantage (but may be for a short time)