BIS 324 Test 1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Relational Database Model

In 1970, E. F. Codd published "A Relational Model of Data for Large Shared Data Banks" in CACM. In the early 1980s, commercially viable relational database management systems became available. While relational database was very tempting in concept in the 1970s, it was not easily applicable in a real-world environment for reasons related to performance. The earlier hierarchical and network database management systems were just coming onto the commercial scene and were the focus of intense marketing efforts by the software and hardware vendors.

Comparisons

In addition to equal (=), the standard comparison operators can be used in the WHERE clause. Greater than (>) Less than (<) Greater than or equal to (>=) Less than or equal to (<=) Not equal to (<>)

Ternary Relationships

Involves three different entity types.

ORDER BY

Orders the results of an SQL query by one or more specified attributes. The default order for ORDER BY is ascending. The clause can include the term ASC at the end to make ascending explicit. The clause can include DESC for descending order.

Data Integrity

Refers to the accuracy of the data. Inaccurate data leaves the whole information system of limited value.

Data

Representations of meaningful objects/events

Punched cards and 2000 Pres. Election

There were problems in florida Hanging Door Chad Swinging Door Chad Tri Chad Dimpled Chad Pregnant Chad Made cards hard to read ---- Butterfly ballot: people used wrong hole or two holes, messing up data

1-M Binary Relationship

Use "crow's foot" to represent the multiple association. "many" = the maximum number of occurrences that can be involved, means a number that can be 1, 2, 3, ... n.

Database

a highly organized collection of assembled data (don't have to be electronic)

LIKE

"%" character used as a "wildcard" to represent any string of characters. The single "_" character in the operator LIKE "_a%" specifies that there will be one character followed by "a."

M-M Binary Relationship

"many" can be either an exact number or have a known maximum Has its own relation in the database. Can have its own attributes. It is a kind of entity -- an Associative Entity

The Database Concept

1. Data as a Manageable Resource: The creation of a data-centric environment in which a company's data can truly be thought of as a significant corporate resource. A key feature of this environment is the ability to share data among those inside and outside of the company who require access to it. Information systems environment: -Hardware -Networks -Applications software -Systems software -People -Data Historically, data has not been the highest priority concern. As the operational, tactical, and strategic corporate levels became more dependent on information systems, data increasingly became recognized as an important corporate resource. The corporate community became increasingly convinced that a firm's data could provide a significant competitive advantage to the firm. It became clear that data would have to be managed in an organized way. Needed a software utility that could manage and protect data. Data could be a critical corporate resource. Out of this need was born the database management system. 2. Data Integration and Data Redundancy: Data integration - the ability to tie together pieces of related data within an information system. Data redundancy - the same fact about the business environment is stored more than once within an information system. There are some problems: -Redundant data takes up a great deal of extra disk space. -If the redundant data has to be updated, it takes additional time to do so. This can be a major performance issue. -There is the potential for data integrity problems. 3. Multiple Relationships 4. Data Control Issues 5. Data Independence: Data Dependence - if for any reason the storage characteristics of the data had to be changed, the application program itself had to be modified, often extensively. Data Independence - to have a data storage and programming environment in which as many types of changes in the data structure as possible would not require changes in the application programs that use them.

Entity-Relationship (ER) Model

A diagramming technique Diagrams entities (with attributes) and the relationship between the entities. There are many variations of E-R diagrams in use.

Primary Key

A relation always has a unique primary key. A primary key (also called "the key") is an attribute or a group of attributes whose values are unique throughout all of the rows of the relation. The number of attributes involved in the primary key is always the minimum number of attributes that provide the uniqueness quality. In the worst case, all of the relation's attributes combined could serve as the primary key.

1-1 Binary Relationship

A single occurrence of one entity type can be associated with a single occurrence of the other entity type and vice versa.

What is Data?

A single piece of data is a single fact about something that interests us A fact can be any characteristic of an object

Reasons to Draw Super and Subtypes

A subtype has unique attributes that do not apply to all the instances of the entity -Note that only one or two attributes may not be enough - evaluate the number of nulls resulting in the database -Evaluate whether the attribute will only be null sometimes (STREETADDR2) or won't exist at all A subtype has a unique relationship that does not apply to all instances of the entity

Blaise Pascal

Adding machine that was an early version of today's mechanical automobile odometers (1642)

BETWEEN

Allows you to specify a range of numeric values in a search.

Foreign Key

An attribute or group of attributes that serves as the primary key of one relation and also appears in another relation (foreign key in this relation). Crucial in relational database, because the foreign key is the mechanism that ties relations together to represent unary, binary, and ternary relationships. Foreign key attribute must have same domain of values as Primary key attribute in other relation.

Unary Relationships

Associate occurrences of an entity type with other occurrences of the same entity type

Unary Relationships

Associate occurrences of an entity type with other occurrences of the same entity type.

Relationships

Association between entities Different kinds: -Binary -Unary -Ternary

Current Data Storage media

Compact Disc (CD) - introduced as a data storage medium in 1985. -Plastic disk with reflective surface -Data encoded by pits in surface -Read by laser -Store more data -More durable (last longer) Solid-state technology - Flash drives (1999 and later) -Charge carriers (electrons) in a solid material (like an integrated circuit) -No moving parts -Smaller, faster -More capacity DVD (Digital Versatile Disc) -Smaller laser, smaller pits -More capacity

Constraints in Super/Subtype Relationships

Completeness Constraints: Whether an instance of a supertype must also be a member of at least one subtype -Total Specialization Rule: Yes (double line) -Partial Specialization Rule: No (single line) Disjointness Constraints: Whether an instance of a supertype may simultaneously be a member of two (or more) subtypes -Disjoint Rule: An instance of the supertype can be only ONE of the subtypes -Overlap Rule: An instance of the supertype could be more than one of the subtypes Subtype Discriminator: An attribute of the supertype whose values determine the target subtype(s) -Disjoint - a simple attribute with alternative values to indicate the possible subtypes -Overlapping - a composite attribute whose subparts pertain to different subtypes. Each subpart contains a Boolean value to indicate whether or not the instance belongs to the associated subtype

Data Management

Data Definition: Operationalized with a data definition language (DDL). Instructs the DBMS software on what tables will be in the database, what attributes will be in the tables, which attributes will be indexed, etc. --------- Data Manipulation: Refers to the four basic operations that can and must be performed on data stored in any DBMS. -Data retrieval -Data update -Insertion of new records -Deletion of existing records Requires data manipulation language (DML)

Data Control Issues

Data Security: A very broad topic Protecting the physical hardware environment Defending against hacker attacks Encrypting data transmitted over networks Educating employees on the importance of protecting the company's data ... and many more -------- Backup and Recovery Data can be lost or corrupted in any of a variety of ways: -a disaster such as a fire, a hurricane, or an earthquake -hackers -computer viruses -poorly written application programs unintentional error -------------- Concurrency Control Updates to a database can interfere with each other in such a way that the resulting data values will be incorrect. A database management system must be designed to protect its databases from such an eventuality.

Relational Database Concept

Data appears to be stored in what we have been referring to as simple, linear files. Relational databases are based on mathematics. A relational database is a collection of relations that, as a group, contain the data that describes a particular business environment.

Information

Data processed to increase a person's knowledge

Intersection Data

Describes the relationship between two entities. Used with many-to-many relationships. Represented on E-R diagram as an "associative entity" For example, we know not only that salesperson 137 sold some of product 24013 but also how many units of that product that salesperson sold.

Data as a Corporate Resource

Difficult corporate resource to manage Tremendous volume, billions, trillions, and more individual pieces of data Large diversity of data Need: -Software to manage data (DBMS) -Faster hardware with more capacity -Data mgmt specialists

Problems in Storing and Accessing Data

Difficult to store Difficult to provide efficient, accurate Volume is massive. Larger number of people want access to data: -Employees -Customers -Supply Chain partners Data Security:Involves a company protecting its data from theft, malicious destruction, deliberate attempts at making phony changes to the data. ex. someone trying to increase his own bank account balance. Data Privacy: Ensuring that even employees who normally have access to the company's data are given access only to the specific data that they need in their work. Backup and Recovery: The ability to reconstruct data if it is lost or corrupted. Wal-Mart estimates its data warehouse contains hundreds of terabytes (trillions of characters) of data.

Electronic Computers

ENIAC world's first electronic general purpose computer (1946) Commercial introduction of electronic computers in mid 1950s Witnessed a boom in economic development.

DMBS and Referential Integrity

Early relational DBMSs did not provide any control mechanisms for referential integrity. Modern relational DBMSs provide sophisticated control mechanisms for referential integrity: -Delete rules -Insert rules -Update rules

DISTINCT

Eliminate duplicate rows in a query result.

The Database Environment

Encourages data sharing Helps control data redundancy Has important improvements in data accuracy Permits storage of vast volumes of data with acceptable access. Allows database queries Provides tools to control: -data security -data privacy -backup and recovery

Super and Sub types

Enhanced ER model: extends original ER model with new modeling constructs Subtype: A subgrouping of the entities in an entity type that has attributes distinct from those in other subgroupings Supertype: A generic entity type that has a relationship with one or more subtypes Attribute Inheritance: -Subtype entities inherit values of all attributes of the supertype -An instance of a subtype is also an instance of the supertype

Associative Entity

Entities can have attributes; many-to-many relationships can have attributes. Many-to-many relationship may be treated similarly to entities in an E-R diagram. The unique identifier of the associative entity is usually the combination of the unique identifiers of the two entities in the many-to-many relationship.

Things to Note about Super and Subtypes

Entities generally have attributes (if you draw an entity with no attributes, double check if you really have a subtype) The relationship between the supertype and subtype is ONE to ONE with the connector circle inbetween The PK of the supertype goes into the subtype as a PK FK Don't forget the subtype discriminator in the supertype

Data Before Database MGMT--Records and Files

Entity - a "thing" or "object" in our environment that we want to keep track of. Entity set - A collection of entities of the same type (e.g., all of the company's employees). Attribute - a property of, a characteristic of, or a fact that we know about an entity. Some attributes have unique values within an entity set. Record - each row of a structure like above Fields - the columns, representing the facts File - the entire structure Record type - a structural description of each and every record in the file Record occurrence / Record instance - a specific record of the salesperson file

Essence of Data Modeling

Exploring the different ways that entities can relate to each other as they always do in the real world Devising a way of recording, of diagramming, the entities and the ways in which they interrelate in the business environment

Before Databases

Flat Files -Plain text or mixed text/binary file One record per line Single fields separated by delimiters -Examples: student file, faculty file, class file -Application systems owned files -Departments used applications Limitations/Disadvantages -Program-Data Dependence -Duplication of Data -Limited Data Sharing -Long Application Development Times -More Program Maintenance

Retrieving and Manipulating Data

Four fundamental operations can be performed on stored data: -Retrieve or Read - looking at a record's contents without changing it -Insert - adding a new record to the file, as when a new salesperson is hired -Delete - deleting a record from the file, as when a salesperson leaves the company -Update - changing one or more of a record's field values

Generalization and Specialization

Generalization: The process of defining a more general entity type from a set of more specialized entity types. BOTTOM-UP Specialization: The process of defining one or more subtypes of the supertype and forming supertype/subtype relationships. TOP-DOWN

Major DMBS Approaches

Hierarchical - navigational and Network - navigational Called navigational approaches because of the way that programs have to "navigate" through hierarchies and networks of data to find needed data. Developed in the 1960s and 1970s. Somewhat similar in structure. Suitable only for mainframe computers were an elegant solution to the redundancy/integration problem at the time Complex, difficult to work Now considered legacy systems ----------- Relational - became commercially viable in about 1980. Soon became the preferred DBMS approach and it has remained so ever since. -------------- Object-oriented - useful for a variety of niche applications. It is interesting to note that some of the key object-oriented database concepts have found their way into some of the mainstream relational DBMSs and some are described as taking a hybrid object/relational approach to database.

Candidate Key

If a relation has more than one attribute or minimum group of attributes that represents a way of uniquely identifying the entities, then they are each called a candidate key. When there is more than one candidate key, one of them must be chosen to be the primary key of the relation. Which candidate key to pick depends on the application using the database. Alternate key is a candidate key that was not chosen to be the primary key of the relation.

Database Operations

In addition to retrieving data we must be prepared to perform data maintenance operations, including: -inserting new records -deleting existing records -updating existing records

Punched Cards- Data Storage

Invented in 1805 by Joseph Marie Jacquard of France Stored fabric patterns, a form of graphic data, as holes in punched cards Of great importance for computing devices to follow

Magnetic Data Storage Media

Magnetic Tape - commercially available units in 1952. Direct Access Magnetic Devices - began to be developed at MIT in the late 1930s and early 1940s. Magnetic Drum - early 1950s; forerunners of magnetic disk technology. Magnetic Disk - commercially available in mid 1950s. Where is data stored using magnetism today?

Early Data Storage Media

Middle to late 1930s saw the beginning of the era of erasable magnetic storage media. By late 1940s, early work was done on the use of magnetic tape for recording data. By 1950, several companies were developing the magnetic tape concept for commercial use.

First Record Keeping

People live in cities- became specialized Recording of Data -How much a person has produced -What it can be sold for With time, different kinds of data were kept -Calendars, census data, etc.

Modified Product Relation

Product Numbers have been reduced to 2 digits for simplicity. Every individual unit item and every set of tools has its own row in the relation because every item and set is available for sale.

Modern Data Storage Media

Punched paper tape - The earliest form of modern data storage, introduced in the 1870s and 1880s. Punched cards were the only data storage medium used in the increasingly sophisticated electromechanical accounting machines of the 1920s, 1930s, and 1940s.

E-R Model Entity (and it's attributes)

Rectangular shape Salesperson = a type of entity Name of entity is in caps above the separator line. Entity type's attributes are shown below the separator line. PK and boldface denote the attribute(s) that constitute the entity type's unique identifier.

Relational Terminology

Relations - what we have been referring to as simple linear files. Also called tables. Row = record (files) = tuple (relation) Column = field (files) = attribute (relation)

Relationships and Subtypes

Relationships at the supertype level indicate that all subtypes will participate in the relationship The instances of a subtype may participate in a relationship unique to that subtype. In this situation, the relationship is shown at the subtype level

Cardinality

Represents the maximum number of entities that can be involved in a particular relationship. 1-1 1-M M-N

Three Delete Rules

Restrict: If an attempt is made to delete a record on the "one side" of the one-to-many relationship, the system will forbid the delete to take place if there are any matching foreign key values in the relation on the "many side." If an attempt is made to delete the record for salesperson 361 in the SALESPERSON relation, the system will not permit the deletion to take place because the CUSTOMER relation records for customers 1525 and 1700 include salesperson number 361 as a foreign key value. --- Cascade: If an attempt is made to delete a record on the "one side" of the relationship, not only will that record be deleted but all of the records on the "many side" of the relationship that have a matching foreign key value will also be deleted. The deletion will cascade from one relation to the other. If an attempt is made to delete the record for salesperson 361 in the SALESPERSON relation, that salesperson record will be deleted and so too, automatically, will the records for customers 1525 and 1700 in the CUSTOMER relation because they have 361 as a foreign key value. --- Set-to-Null If an attempt is made to delete a record on the "one side" of the one-to-many relationship, that record will be deleted and the matching foreign key values in the records on the "many side" of the relationship will be changed to null. If an attempt is made to delete the record for salesperson 361 in the SALESPERSON relation, that record will be deleted, and the Salesperson Number attribute values in the records for customers 1525 and 1700 in the CUSTOMER relation will have their Salesperson Number attribute values changed from 361 to null.

Relational Select Operator

Retrieves a horizontal slice of the relation. Select rows from the SALESPERSON relation in which Salesperson Number = 204. The result of a relational operation will always be a relation.

Relational Project Operator

Retrieves a vertical slice of the relation. Project the Salesperson Number and Salesperson Name over the SALESPERSON relation.

Referential Integrity

Revolves around the circumstance of trying to refer to data in one relation in the database, based on values in another relation. --- Deletion: A problem arises, e.g., because a deleted record, a salesperson record, is on the "one side" of a one-to-many relationship. --- Insertion: if a new record is inserted into the "one side" (SALESPERSON relation) of the one-to-many relationship, there is no problem. If a new customer record is inserted into the "many side" (CUSTOMER relation) of the one-to-many relationship and it happens to include a salesperson number that does not have a match in the SALESPERSON relation—that would cause the same kind of problem as the deletion example --- Update: Updating a foreign key value. For example, a salesperson number in the CUSTOMER relation with a new salesperson number that has no match in the SALESPERSON relation.

Data Retrieval Method

Sequential access - the retrieval of all or a portion of the records of a file one after another, in some sequence, starting from the beginning, until all of the required records have been retrieved. -Physical sequential access - records are retrieved, one after the other, just as they are stored on the disk device. -Logical sequential access - records are retrieved in an order based on the values of one or a combination of the fields. ----------------- Direct Access - the retrieval of a single record of a file or a subset of the records of a file based on one or more values of a field or a combination of fields in the file. -a crucial concept in information systems today -requires hardware storage device that will accommodate direct access -requires software that will take advantage of the hardware's capabilities and store and retrieve the data in such a way that it accomplishes direct access.

Binary Relationships

Simplest kind of relationship Relationship between two entity types A salesperson "sells" products or products are "sold" by salespersons

Database Management System

Sophisticated software that controls the database and the database environment A software system that is used to create, maintain, and provide controlled access to user databases Three essential parts: central repository, controlling software agent, storage of data in a standardized, convenient form DBMS manages data resources like an operating system manages hardware resources

SQL

Structured Query Language Incorporates both DDL and DML features. Very heavily used in practice today.

Modern Era of Info Processions

The 1880 U.S. Census took about seven years to compile by hand. Basing his work on Jacquard's, Herman Hollerith arranged to have the census data stored in punched cards and invented machinery to tabulate them. 1890 U.S. Census population count done in 1 month (2 years for all data to be tabulated) In 1896 Hollerith formed the Tabulating Machine Company to produce and commercially market his devices -- this later became IBM.

File/Relation: Differences

The columns of a relation can be arranged in any order without affecting the meaning of the data. This is not true of a file. The rows of a relation can be arranged in any order, which is not true of a file. Every row/column position (a cell) can have only a single value, which is not necessarily true in a file. No two rows of a relation are identical, which is not necessarily true in a file.

Data Retrieval from a Relational Database

The discussion thus far has concentrated on: -how a relational database is structured -loading a database with data Let's discuss the effort to retrieve the data in a way that is helpful and beneficial to the business organization that built the database. -------- Have the ability to accept high level data retrieval commands Process the commands against the database's relations and return the desired data.

Modality

The minimum number of entity occurrences that can be involved in a relationship. "inner" symbol on E-R diagram ("outer" symbol is cardinality)

Data Accuracy

The same data is stored several, sometimes many, times within a company's information system. When a new application is written, new data files are created to store its data. Data can be duplicated within a single file and across files.

Ternary Relationship

These new General Hardware Co. relations are all independent with no foreign keys in any of them. The SALES relation shows how this ternary relationship is represented in a relational database. The primary key of the additional relation (SALES) will be (at least) the combination of the primary keys of the entities involved in the relationship.

Domain of Values

Two attributes have the same domain of values if the attributes have values of the same type. e.g., Salesperson Number in SALESPERSON and in CUSTOMER - three digit whole numbers that are the identifiers for salespersons.

Anomalies

Typically occur in poorly structured files. Problems arise when two different kinds of data, like salesperson and customer data are merged into one file. Deletion Anomaly - e.g, if you delete a customer and that record was the only one for a salesperson, the salesperson's data is gone. Insertion Anomaly - e.g., General Hardware cannot add data about a new salesperson the company just hired until she is assigned at least one customer. Update Anomaly - redundant data in the database file must be updated each place it exists when it changes.

SQL SELECT

Used for data retrieval. You specify what data you are looking for rather than provide a logical sequence of steps that guide the system in how to find the data. Can be run in either a query or an embedded mode. Command will work with Oracle, MS Access, SQL Server, DB2, Informix, etc. SELECT <columns> FROM <table> WHERE <predicates identifying rows to be included>; The desired attributes are listed in the SELECT clause. The required table is listed in the FROM clause. The restriction (predicate) indicating which row(s) is involved is shown in the WHERE clause in the form of an equation. The "*" indicates that all attributes of the selected row are to be retrieved. For a Relational Algebra Project operation, there is no need for a WHERE clause to limit which rows of the table are included.

AND and OR

With the AND operator, both conditions have to be satisfied to be included in the result. The OR operator really means one or the other or both. AND is said to be "higher in precedence" than OR. So all ANDs are considered before any ORs are considered. If you really wanted the OR to be considered first, you could force it by writing the query with parenthesis around the OR

Using Data for Competitive Advantage

indispensable and fundamental corporate resource a crucial competitive advantage (but may be for a short time)


Kaugnay na mga set ng pag-aaral

Finance Exam 3 Chapter 16 Practice Problems

View Set

Culinary Arts Chap 10 Sandwiches

View Set

Psych Exam 2 Practice Questions Set 1, Psych Exam 2 Practice Questions Set 4, Psych Exam 2 Practice Questions Set 2, Psych Exam 2 Practice Questions Set 3

View Set

Bladder function and Dysfunction

View Set

Nclex Review: Cognitive impairments, Delirium, Demenita, Alzheimers

View Set

ATI Learning System 4.0 : Pharmacology Final

View Set

Chapter 1: Thinking like an Economist

View Set

AP II: Module 2.1 The Brain, parts 1 & 2

View Set

CFA 41: Portfolio Management Overview

View Set