CIS 330 FINAL
Comparison operators
-= Equal to -> Greater than ->= Greater than or equal to -< Less than -<= Less than or equal to -<> Not equal to != Not equal to
great candidates for indexes
1) Where clauses 2) Order by 3) Group by
Steps in Table Creation
1.Identify data types for attributes 2.Identify columns that can and cannot be null 3.Identify columns that must be unique (candidate keys) 4.Identify primary key-foreign key mates 5.Determine default values 6.Identify constraints on columns (domain specifications) Create the table and associated indexes
When to Use Indexes (1 of 2)
1.Use on larger tables 2.Index the primary key of each table 3.Index search fields (fields frequently in WHERE clause) 4.Fields in S Q L ORDER BY and GROUP BY commands 5.When there are >100 values but not when there are <30 values
When to Use Indexes (2 of 2)
6.Avoid use of indexes for fields with long values; perhaps compress values first 7.If key to index is used to determine location of record, use surrogate (like sequence number) to allow even spread in storage area 8.D B M S may have limit on number of indexes per table and number of bytes per indexed field(s) 9.Be careful of indexing attributes with null values; many D B M S s will not recognize null values in an index search 10.Use a query optimizer
Conditional Expressions Using Case Keyword
A CASE expression acts like an if-then statement. It allows you to choose what will appear in a column of the result set, depending on a condition.
Boolean Query a Without Use of Parentheses
By default, processing order of Boolean operators is NOT, then AND, then OR
Table structure
CREATE, ALTER, DROP
UNION — Combining Queries
Combine the output (union of multiple queries) together into a single result table With UNION queries, the quantity and data types of the attributes in the SELECT clauses of both queries must be identical.
multiple table join
Each pair of tables requires an equality-check condition in the WHERE clause, matching primary keys against foreign keys. Note that the full query results include columns from four different tables.
t/f: everything in a database should be normalized
FALSE More joins means slower processing. Need to find a balance between normalization and speed sacrificing integrity for speed
Table contents/values
INSERT, UPDATE, DELETE
Creating Tables with Identity Columns
Inserting into a table does not require explicit customer I D entry or field list.
MERGE statement
Makes it easier to update a table. It allows combination of Insert and Update in one statement. Useful for updating master tables with new data.
Is S Q L a standard?
No longer certified by N I S T.
Equi-Join
RECOMMENDED FOR QUIZ/TEST/PROJECT INNER JOIN clause is an alternative to WHERE clause, and is used to match primary and foreign keys. An INNER join will only return rows from each table that have matching rows in the other. This query produces the same results as the previous equi-join example.
Ensuring Data Integrity Through Updates
Relational integrity is enforced via the primary-key to foreign-key match
Non-Nullable Specifications
Some primary keys are composite- composed of multiple attributes
Cartesian Product
The combination of all rows in the first table and all rows in the second table unwanted effect; a bad query. will slow down runtime significantly
FIND ANSWER: SHOULD KNOW:
WHAT 6,2 means the six indicates... *FIND ANSWER*
Is CHAR where the string value is a set length and a VARCHAR is a string value that can have different value lengths?
YES CHAR = number in parentheses is the set length VARCHAR = number in parentheses is the maximum length
does the order of parent/child tables matter?
YES need to create parent table first. would drop the child first (DROP IN THE OPPOSITE DIRECTION)
anything with a where clause is what?
a candidate to be an index
a view can be considered what?
a virtual table
Outer-Join
every record will be returned whether they match or not LEFT OUTER JOIN clause causes rows from the first mentioned table (customer) to appear even if there is no corresponding order data. Unlike an INNER join, this will include customer rows with no matching order rows. For the tables in figure 6.1, this will return 16 rows. That's because there are 15 customers, and one of these customers has 2 orders. example use: check to see if people have paid you or not
Subqueries
one query calls another query. a select statement is inside of another select statement at one time •Subquery - placing an inner query (SELECT statement) inside an outer query •Options: -In a condition of the WHERE clause -As a "table" of the FROM clause -Returning a field for the SELECT clause -Within the HAVING clause •Subqueries can be: -Noncorrelated - executed once for the entire outer query -Correlated - executed once for each row returned by the outer query example: what are the name and address of the customer who placed order number 1008
SELECT Statement
retrieves data and displays it for us •Used for queries on single or multiple tables •Clauses of the SELECT statement: -SELECT: List the columns (and expressions) to be returned from the query -FROM: Indicate the table(s) or view(s) from which data will be obtained -WHERE: Indicate the conditions under which a row will be included in the result -GROUP BY: Indicate categorization of results -HAVING: Indicate the conditions under which a category (group) will be included ORDERBY: Sorts the result according to specified criteria
Self Join
way to write SQL for unary relationships
History of S Q L
•1970 - E. F. Codd develops relational database concept •1974-79 - System R with Sequel (later S Q L) created at I B M Research Lab •1979 - Oracle markets first relational D B with S Q L •1981 - S Q L/D S first available R D B M S system on D O S/V S E -Others followed: I N G R E S (1981), I D M (1982), D G/S G L (1984), Sybase (1986) •1986 - ANSI S Q L standard released -Major ANSI standard updates in 1989, 1992, 1999, 2003, 2006, 2008, 2011, 2016 •Today - S Q L is supported by most major database vendors
Second Normal Form
•1N F plus every non-key attribute is fully functionally dependent on the ENTIRE primary key -Every non-key attribute must be defined by the entire key, not by only part of the key -No partial functional dependencies everything depends on primary key. NO PARTIAL DEPENDENCIES. if they exist, it is FIRST NORMAL FORM (at most)
Third Normal Form
•2N F PLUS no transitive dependencies (functional dependencies on non-primary-key attributes) •Note: This is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third •Solution: Non-key determinant with transitive dependencies go into a new table; non-key determinant becomes primary key in the new table and stays as foreign key in the old table NO TRANSITIVE DEPENDENCIES. If they exist, it is Second Normal Form (at most)
Changing Tables
•ALTER TABLE statement allows you to change column specifications: ALTER TABLE table_name alter_table action; Table Actions: ADD [COLUMN] column_definition ALTER[COLUMN] column_name SET DEFAULT default-value Example (adding a new column with a default value): ALTER TABLE CUSTOMER_T ADD COLUMN CustomerType VARCHAR2 (10) DEFAULT "Commercial";
SELECT Example - Boolean Operators
•AND, OR, and NOT Operators for customizing conditions in WHERE clause
Threats to Data Security
•Accidental losses attributable to: -Human error -Software failure -Hardware failure •Theft and fraud •Loss of privacy or confidentiality -Loss of privacy (personal data) -Loss of confidentiality (corporate data) •Loss of data integrity •Loss of availability (e.g., through sabotage)
INSERT Statement
•Adds one or more rows to a table • Inserting into a table: INSERT INTO Customer_T VALUES (001, 'Contemporary Casuals', '1355 S. Himes Blvd.', 'Gainesville', 'FL', 32601); •Inserting a record that has some null attributes requires identifying the fields that actually get data: INSERT INTO Product_T(ProductID, ProductDescription, ProductFinish, ProductStandardPrice) VALUES (1, 'End Table', 'Cherry', 175, 8); •Inserting from another table: INSERT INTO CaCustomer_T SELECT * FROM Customer_T WHERE CustomerState = 'CA'; to make something null: replace value with Null or don't add a value at all in that field
Advantages and Disadvantages of Stored Procedures
•Advantages -Performance improves for compiled S Q L statements -Reduced network traffic -Improved security -Improved data integrity -Thinner clients •Disadvantages -Programming takes more time -Proprietary, so algorithms are not portable
SELECT Example Using Alias
•Alias is an alternative column or table name
Deadlock
•An impasse that results when two or more transactions have locked common resources, and each waits for the other to unlock their resources •Figure 7-22 shows the problem of deadlock •John and Marsha will wait forever for each other to release their locked resources!
Transaction Integrity: A C I D Rules
•Atomic -Transaction cannot be subdivided •Consistent -Constraints don't change from before transaction to after transaction •Isolated -Database changes not revealed to users until after transaction has completed •Durable Database changes are permanent
Tips for Developing Queries
•Be familiar with the data model (entities and relationships) •Understand the desired results •Know the attributes desired in results •Identify the entities that contain desired attributes •Review E R D •Construct a WHERE equality for each link •Fine tune with GROUP BY and HAVING clauses if needed Consider the effect on unusual data
S Q L Environment
•Catalog -A set of schemas that constitute the description of a database •Schema -The structure that contains descriptions of objects created by a user (base tables, views, constraints) CREATE: •Data Definition Language (D D L) -Commands that define a database, including creating, altering, and dropping tables and establishing constraints QUERIES: •Data Manipulation Language (D M L) -Commands that maintain and query a database ADMINISTER PRIVILEGES: •Data Control Language (D C L) -Commands that control a database, including administering privileges and committing data
Critical Decisions for Physical Design
•Choosing the storage format (called data type) for each attribute from the logical data model •Giving the D B M S guidance regarding how to group attributes from the logical data model into physical records •Giving the D B M S guidance on how to arrange similarly structured records in secondary memory (file organization) •Selecting structures (including indexes and the overall database architecture) for storing and connecting files to make retrieving related data more efficient •Preparing strategies for handling queries against the database that will optimize performance (query optimization)
Schema Definition
•Control processing/storage efficiency: -Choice of indexes -File organizations for base tables -File organizations for indexes -Data clustering -Statistics maintenance •Creating indexes -Speed up random/sequential access to base table data -Example
Guidelines for Better Query Design (2 of 2)
•Create temporary tables for groups of queries •Combine update operations •Retrieve only the data you need •Don't have the D B M S sort without an index •Learn! •Consider the total query processing time for ad hoc queries
Common Data Types (2 of 2)
•DATE -Can represent from Jan 1 4712 B C to Dec 31 9999 A D -Stores century, year, month, day, hour, minute, second •TIMESTAMP -Like a date. Can include fractional seconds, and time zones. •BLOB -Binary large object, can store up to 4 gigabytes Used for photos, sound clips, etc.
Removing Tables
•DROP TABLE statement allows you to remove tables from your schema: DROP TABLE CUSTOMER_T
S Q L Database Definition
•Data Definition Language (D D L) •Major CREATE statements: -CREATE SCHEMA - defines a portion of the database owned by a particular user -CREATE TABLE - defines a new table and its columns -CREATE VIEW - defines a logical table from one or more tables or views •Other CREATE statements: CHARACTER SET, COLLATION, TRANSLATION, ASSERTION, DOMAIN
Data Security
•Database Security: Protection of the data against accidental or intentional loss, destruction, or misuse •Increased difficulty due to Internet access and client/server technologies
Managing Deadlock
•Deadlock prevention: -Lock all records required at the beginning of a transaction -Two-phase locking protocol ▪Growing phase ▪Shrinking phase -May be difficult to determine all needed resources in advance •Deadlock Resolution: -Allow deadlocks to occur -Mechanisms for detecting and breaking deadlock ▪Resource usage matrix ▪Back out one deadlock at a time Rerun transaction
Controlling Data Integrity
•Default value - assumed value if no explicit value •Range control - allowable value limitations (constraints or validation rules) •Null value control - allowing or prohibiting empty fields •Referential integrity - range control (and null value allowances) for foreign-key to primary-key match-ups
Denormalize with Caution
•Denormalization can -Increase chance of errors and inconsistencies -Reintroduce anomalies -Force reprogramming when business rules change •Perhaps other methods could be used to improve performance of joins -Organization of tables in the database (file organization and clustering) Proper query design and optimization
Using and Defining Views
•Dynamic View -A "virtual table" created dynamically upon request by a user -No data actually stored; instead data from base table made available to user -Based on S Q L SELECT statement on base tables or other views •Materialized View -Copy or replication of data, data actually stored -Must be refreshed periodically to match corresponding base tables
Designing Fields
•Field: smallest unit of application data recognized by system software •Field design -Choosing data type -Coding, compression, encryption -Controlling data integrity
Qualifying Results by Categories Using the HAVING Clause
•For use with GROUP BY •Like a WHERE clause, but it operates on groups (categories), not on individual rows. Here, only those groups with total numbers greater than 1 will be included in the final result.
Categorizing Results Using GROUP BY Clause
•For use with aggregate functions -Scalar aggregate: single value returned from S Q L query with aggregate function -Vector aggregate: multiple values returned from S Q L query with aggregate function (via GROUP BY) •You can use single-value fields with aggregate functions if they are included in the GROUP BY clause
Functional Dependencies and Keys
•Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute •Candidate Key: -A unique identifier. One of the candidate keys will become the primary key ▪E.g., perhaps there is both credit card number and S S# in a table...in this case both are candidate keys. -Each non-key field is functionally dependent on every candidate key.
Three Areas of S O X Audits
•I T change management -Processes by which changes to operational systems and databases are authorized •Logical access to data -Security procedures to prevent unauthorized access -Personnel controls and physical access controls •I T operations -Policies and procedures for day-to-day management of infrastructure, applications, and databases
Anomalies in This Relation (1 of 2)
•Insertion - can't enter a new employee without having the employee take a class (or at least empty fields of class information) •Deletion - if we remove employee 140, we lose information about the existence of a Tax Acc class •Modification - giving a salary increase to employee 100 forces us to update multiple records
Anomalies in This Relation (2 of 2)
•Insertion - if new product is ordered for order 1007 of existing customer, customer data must be re-entered, causing duplication •Deletion - if we delete the Dining Table from Order 1006, we lose information concerning this item's finish and price •Update - changing the price of product I D 4 requires update in multiple records
Query Efficiency Considerations
•Instead of SELECT *, identify the specific attributes in the SELECT clause; this helps reduce network traffic of result set •Limit the number of subqueries; try to make everything done in a single query if possible •If data is to be used many times, make a separate query and store it as a view
Processing Multiple Tables (1 of 2)
•Join -A relational operation that causes two or more tables with a common domain to be combined into a single table or view •Equi-join -A join in which the joining condition is based on equality between values in the common columns; common columns appear redundantly in the result table •Natural (inner) join -An equi-join in which one of the duplicate columns is eliminated in the result table
Locking Mechanisms
•Locking level: -Database - used during database updates -Table - used for bulk updates -Block or page - very commonly used -Record - only requested row; fairly commonly used -Field - requires significant overhead; impractical •Types of locks: -Shared lock - Read but no update permitted. Used when just reading to prevent another user from placing an exclusive lock on the record -Exclusive lock - No access permitted. Used when preparing to update
UPDATE Statement
•Modifies data in existing rows UPDATE Product_T SET ProductStandardPrice = 775 WHERE ProductID = 7;
First Normal Form
•No multivalued attributes •Every attribute value is atomic (no composite, no multivalue. if any exist, it is NOT first normal form) •Fig. 4-25 is not in 1st Normal Form (multivalued attributes) → it is not a relation. •Fig. 4-26 is in 1st Normal form. All relations are in 1st Normal Form.
Correlated versus. Noncorrelated Subqueries
•Noncorrelated subqueries: -Do not depend on data from the outer query -Execute once for the entire outer query •Correlated subqueries: -Make use of data from the outer query -Execute once for each row of the outer query Can use the EXISTS and ALL operators
Information Needed for Physical Design
•Normalized relations, including estimates for the range of the number of rows in each table •Definitions of each attribute, along with physical specifications such as maximum possible length •Descriptions of where and when data are used in various ways (entered, retrieved, deleted, and updated), including typical frequencies of these events •Expectations or requirements for response time and data security, backup, recovery, retention, and integrity •Descriptions of the technologies (database management systems) used for implementing the database
Versioning
•Optimistic approach to concurrency control •Instead of locking •Assumption is that simultaneous updates will be infrequent •Each transaction can attempt an update as it wishes •The system will create a new version of a record instead of replacing the old one •When a conflict occurs, accept one user's update and inform the other user that its update needs to be tried again. •Use of rollback and commit for this
Processing Multiple Tables (2 of 2)
•Outer join -A join in which rows that do not have matching values in common columns are nonetheless included in the result table (as opposed to inner join, in which rows must have matching values in order to appear in the result table) •Union join -Includes all data from each table that was joined
Designing Physical Database Files
•Physical File -A named portion of secondary memory allocated for the purpose of storing physical records -Tablespace - named logical storage unit in which data from multiple tables/views/objects can be stored •Tablespace components -Segment - a table, index, or partition -Extent - contiguous section of disk space Data block - smallest unit of storage
Data Normalization
•Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data •The process of decomposing relations with anomalies to produce smaller, well-structured relations
Controlling Concurrent Access
•Problem -In a multi-user environment, simultaneous access to data can result in interference and data loss (lost update problem) •Solution - Concurrency Control -Managing simultaneous operations against a database so that data integrity is maintained and the operations do not interfere with each other in a multi-user environment
More Complicated S Q L Queries
•Production databases contain hundreds or even thousands of tables, and tables could include hundreds of columns. •So, sometimes query requirements can be very complex. •Sometimes it's useful to combine queries, through the use of Views. •If you use a view (which is a query), you could have another query that uses the view as if it were a table.
S Q L Enhancements/Extensions (2 of 2)
•Programming extensions •Persistent Stored Modules (S Q L/P S M) •Capability to create and drop code modules •New statements: CASE, IF, LOOP, FOR, WHILE, etc. •Makes S Q L into a procedural language •Oracle has propriety version called P L/S Q L, and Microsoft S Q L Server has Transact/S Q L
Physical Database Design
•Purpose - translate the logical description of data into the technical specifications for storing and retrieving data •Goal - create a design for storing data that will provide adequate performance and ensure database integrity, security, and recoverability
Benefits of a Standardized Relational Language
•Reduced training costs •Productivity •Application portability •Application longevity •Reduced dependence on a single vendor •Cross-system communication •However, SQL varies between DBMS - Oracle verse SQL Server
Data Integrity Controls
•Referential integrity - constraint that ensures that foreign key values of a table must match primary key values of a related table in 1:N relationships •Restricting: -Deletes of primary records -Updates of primary records Inserts of dependent records
Well-Structured Relations
•Relations that contain minimal data redundancy and allow users to insert, delete, and update rows without causing data inconsistencies •Goal is to avoid anomalies -Insertion Anomaly - adding new rows forces user to create duplicate data -Deletion Anomaly - deleting rows may cause a loss of data that would be needed for other future rows -Modification Anomaly - changing data in a row forces changes to other rows because of duplication
DELETE Statement
•Removes rows from a table •Delete certain rows DELETE FROM CUSTOMER_T WHERE CUSTOMERSTATE = 'HI'; Delete all rows: DELETE FROM CUSTOMER_T;
Routines and Triggers
•Routines -Program modules that execute on demand •Functions -routines that return values and take input parameters •Procedures -routines that do not return values and can take input or output parameters •Triggers -routines that execute in response to a database event (INSERT, UPDATE, or DELETE)
Physical Design for Regulatory Compliance
•Sarbanes- Oxley Act (S O X) - protect investors by improving accuracy and reliability •Committee of Sponsoring Organizations (C O S O) of the Treadway Commission •I T Infrastructure Library (I T I L) •Control Objectives for Information and Related Technology (C O B I T)
Concurrency Control Techniques
•Serializability -Finish one transaction before starting another •Locking Mechanisms -The most common way of achieving serialization -Data that is retrieved for the purpose of updating is locked for the updater -No other user can perform update until unlocked
Advantages of Dynamic Views (1 of 2)
•Simplify query commands •Assist with data security •Enhance programming productivity •Contain most current base table data •Use little storage space •Provide customized view for user Establish physical data independence
Sorting Results with ORDER BY Clause
•Sort the results first by STATE, and within a state by the CUSTOMER NAME •Note: The IN operator in this example allows you to include rows whose CustomerState value is either F L, T X, C A, or H I. It is more efficient than separate OR conditions.
Original Purpose of S Q L Standard
•Specify syntax/semantics for data definition and manipulation •Define data structures and basic operations •Enable portability of database definition and application modules •Specify minimal (level 1) and complete (level 2) standards •Allow for later growth/enhancement to standard (referential integrity, transaction management, user-defined functions, extended join operations, national character sets)
Client-Server Application Security
•Static H T M L files are easy to secure -Standard database access controls -Place Web files in protected directories on server •Dynamic pages are harder -User authentication -Session security -S S L for encryption -Restrict number of users and open ports Remove unnecessary programs
Indexed File Organizations
•Storage of records sequentially or nonsequentially with an index that allows software to locate individual records •Index: a table or other data structure used to determine in a file the location of records that satisfy some condition •Primary keys are automatically indexed •Other fields or combinations of fields can also be indexed; these are called secondary keys (or nonunique keys)
SQL Data Types
•Strings -CHARACTER (n), VARYING CHARACTER (n) •Binary -Binary Large Object (B L O B) •Number -Numeric (precision, scale), Decimal (p, s), Integer •Temporal -Timestamp, Timestamp with local time zone •Boolean True or False values
S Q L Overview
•Structured Query Language - often pronounced "Sequel" •The standard for Relational Database Management Systems (R D B M S) •R D B M S: A database management system that manages data as a collection of tables in which all relationships are represented by common values in related tables
Handling Missing Data
•Substitute an estimate of the missing value (e.g., using a formula) •Construct a report listing missing values •In programs, ignore missing data unless the value is significant (sensitivity testing)
Data Dictionary Facilities
•System tables that store metadata •Users usually can view some of these tables •Users are restricted from updating them •Examples in Oracle 12c -D B A_TABLES - descriptions of tables -D B A_USERS - information about the users of the system •Examples in Microsoft S Q L Server 2016 -sys.columns - table and column definitions sys.indexes - table index information
Denormalization
•Transforming normalized relations into non-normalized physical record specifications •Benefits: -Can improve performance (speed) by reducing number of table lookups (i.e. reduce number of necessary join queries) •Costs (due to data duplication) -Wasted storage space Data integrity/consistency threats ie putting Student Table and Application Table into one table
Guidelines for Better Query Design (1 of 2)
•Understand how indexes are used in query processing •Keep optimizer statistics up to date •Use compatible data types for fields and literals •Write simple queries •Break complex queries into multiple simple parts •Don't nest one query inside another query •Don't combine a query with itself (if possible avoid self-joins)
Advantages of Dynamic Views (2 of 2)
•Use processing time each time view is referenced •May or may not be directly updateable •As with all S Q L constructs, you should use views with discretion
S Q L Enhancements/Extensions (1 of 2)
•User-defined data types (U D T) -Subclasses of standard types or an object type •Analytical functions (for O L A P) -CEILING, FLOOR, S Q R T, RANK, DENSE_RANK, ROLLUP, CUBE, SAMPLE, -WINDOW - improved numerical analysis capabilities •New Data Types -BIG INT, MULTISET (collection), X M L •CREATE TABLE LIKE -create a new table similar to an existing one •MERGE
SELECT Example Using a Function
•Using the COUNT aggregate function to find totals
Common Data Types (1 of 2)
•VARCHAR2(length) max 400 characters -Variable-length character data. A string that is shorter than the maximum length will consume only the required space. N VARCHAR2 is Unicode. •CHAR(length) max 200 characters -Fixed length character data. N CHAR is Unicode. •C L O B -Character large object, capable of storing up to 4 gigabytes of one variable length character data field •NUMBER -Positive or negative number. NUMBER(5) means a 5 digit integer. NUMBER(5,2): 5 digits, two to the right of the decimal point.
Merging Relations
•View Integration - Combining entities from multiple E-R models into common relations •Issues to watch out for when merging entities from different E-R models: -Synonyms - two or more attributes with different names but same meaning -Homonyms - attributes with same name but different meanings -Transitive dependencies - even if relations are in 3N F prior to merging, they may not be after merging -Supertype/subtype relationships - may be hidden prior to merging
Data Privacy
•W3C Web Privacy Standard -Platform for Privacy Protection (P3P) •Addresses the following: -Who collects data -What data is collected and for what purpose -Who is data shared with -Can users control access to their data -How are disputes resolved -Policies for retaining data Where are policies kept and how can they be accessed
SELECT Example-Boolean Operators
•With parentheses...these override the normal precedence of Boolean operators With parentheses, you can override normal precedence rules. In this case parentheses make the OR take place before the AND.