ITP 249

¡Supera tus tareas y exámenes ahora con Quizwiz!

AGGREGATION FUNCTIONS and the 7 types

Aggregation Means 'grouping together data around a constraint' Once data is grouped, SQL has special commands to analyze the groups Aggregating functions and grouping are usually used together 1. SUM 2. AVG 3. COUNT 4. MIN 5. MAX 6. FIRST 7. LAST

Entity

Anything about which data can be collected and stored

NORMALIZATION PROCESS

Apply a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form. If the tests pass, move on to check the requirements of the next level normal form. If the tests fail, the relation is decomposed into simpler relations that individually meet the normalization tests. All these normal forms are based on the functional dependencies among the attributes of a relation.

SURROGATE KEY

Artificial field added to a table to serve as a Primary Key Has values that are meaningless to users Normally hidden from the user interface Example: Employee ID (auto generated) in the previous slide

AVG

Average of all values for a column

Data vs. Information

DATA IS A COLLECTION OF FACTS These facts are independent of overall meaning INFORMATION IS MEANING DERIVED FROM DATA Information is extracted by processing data - organizing, analyzing, summarizing

DATA DEFINITION LANGUAGE (DDL)

Data Definition Language (DDL) is used to define database schema. It consists of SQL commands that create and modify database structures like tables and relationships. SQL commands CREATE, DROP and ALTER are examples of DDL.

DATA MANIPULATION LANGUAGE (DML)

Data Manipulation Language (DML) is used to query and manipulate existing data. SQL commands SELECT, INSERT, UPDATE and DELETE are examples of DML.

FUNCTIONAL DEPENDENCY - FULL vs PARTIAL

FULL Functional Dependency Given a composite key A Attribute B is functionally dependent on A B is NOT functionally dependent on any part of A It's dependent on both! PARTIAL Functional Dependency Given a composite key A Attribute B is functionally dependent on A B is functionally dependent on one part of A (not only B depends on the whole A, but B also depends on a part of A)

BUSINESS RULES

First iteration of the models usually comes from the Business Rules Business Rules are descriptions of policies, principles, procedures, restrictions for an organization They describe characteristics of data as viewed by different people

FIRST NORMAL FORM

Requirements: No repeating groups of columns Each column contains atomic values PK for each table Steps: Identify Primary Keys for all tables Identify all dependencies

(Start of Lecture 7) SELECT CLAUSE VS WHERE

Select is for columns Where is for rows

SUM

Sum of all values for a column

Outer Join

The OUTER JOIN returns all rows from both the participating tables which satisfy the join condition along with rows which do not satisfy the join condition LEFT JOIN: The table before the JOIN command is the main table Lists all the rows from the left table, even if there's no match in the right table RIGHT JOIN: The table after the JOIN command is to the main table Lists all the rows from the right table, even if there's no match in the left table Still need column list and a matching condition

Set Operators (Purpose, 3 types, and compatibility)

To combine separate queries into one result. 3 types of set operations: 1. UNION 2. INTERSECT 3. EXCEPT Set compatible: Number of columns in each SELECT statement must match Data types on aligned columns must match Each SELECT statement must have the same expressions and aggregate functions

Foreign Key

is a Primary Key from another table (related/referenced table) Placed in a relating table to form a relationship to the related table Valid values are NULL and the Primary Key values of the reference table So a Foreign Key value must either be the Primary Key of the referenced table or be NULL

Data Model

simple representation of complex real-world structures

database

specialized structure that allow computer-based systems to store, manage and retrieve data very quickly

RELATIONSHIP TYPES

1 : 1 (one to one) 1 : M (one to many) M : N (many to many)

SUBQUERY in HAVING CLAUSE

HAVING clause restricts the output of the GROUP BY clause. Applies conditions to the grouped rows. The SubQuery appears in the HAVING clause is similar to how it appears in the WHERE clause.

FUNCTIONAL DEPENDENCY RULES

If A determines B and C, then A determines B and A determines C. If A and B determine C, then neither A nor B by itself determines C.

Self Join

In a self join, a table is joined with itself (unary relationship)

RMDBS

"Relational Database Management System." An RDBMS is a DBMS designed specifically for relational databases. Therefore, RDBMSes are a subset of DBMSes. A relational database refers to a database that stores data in a structured format, using rows and columns

3. INNER JOIN - NATURAL JOIN

A NATURAL JOIN is an EQUIJOIN operation that creates an implicit ON clause for you based on the common columns in the two tables being joined. Common columns are columns that have the same name in both tables. Takes advantage of DBMS's table structure to figure out common column (same name, same data type) Natural Join: Guidelines - The associated tables have one or more pairs of identically named columns. - The columns must be the same data type. - Don't use ON clause in a natural join.

SUBQUERY USAGE

A Subquery occurs when the result from an Inner Query is used as part of an Outer Query. In other words, the Inner Query's results are used to inform the Outer Query. Subquery is a query inside a query, normally inside parentheses: First query is the Outer Query. Inside query is the Inner Query. Parentheses are used to distinguish the queries and indicate order of execution: Inner Query is executed first. Output of Inner Query is used as input for Outer Query. Normal queries allow us to ask a question of the database. Subqueries allow us to ask multiple questions of the database. Who sells saw parts? (normal query that will be used as a subquery) Now tell me about those companies? (outer query that uses the subquery's result as input) Subqueries can nest several levels. They're not limited to 1 inner and 1 outer. Building queries using Subqueries: Approach it incrementally, one part at a time. There are many ways to get the same answer. Solutions that work on one DBMS may not work on others.

FUNCTION DEPENDENCY

A functional dependency (FD) is a relationship between two attributes, typically between the PK and other non-key attributes within a table. For any relation R, attribute Y is functionally dependent on attribute X (usually the PK), if for every valid instance of X, that value of X uniquely determines the value of Y. This relationship is indicated by the representation below :

STRONG relationship

A relationship is a Strong relationship if Primary Key of the relating entity CONTAINS Foreign Key (Primary Key of related entity) i.e. FK is part of PK Noted with solid line in Crow's Foot notation

WEAK relationship

A relationship is a Weak relationship if Primary Key of the relating entity does NOT contain Foreign Key (Primary Key of related entity) i.e. FK is not part of PK Noted with dashed line in Crow's Foot notation

Constraint

A restriction placed on the data

INSERTION ANOMALY

An Insert Anomaly occurs when certain attributes cannot be inserted into the database without the presence of other attributes.

UPDATE ANOMALY

An Update Anomaly exists when one or more instances of duplicated data is updated, but not all.

ALIAS and the 2 clauses it can be used in

An alias is an alternative table name for the current query Use alias Column name in the SELECT clause (both are equivalent) Use alias Table name in the FROM clause (both are equivalent)

INDEX

Analogy: Index in a printed book A listing of the arrangement of a table with respect to a particular table's field One index to one field in one table Points to the record's location in the database Used to improve performance Index key: Index's reference point. Points to data location identified by the key Unique Index: An index in which the index key can have only one pointer value (row) associated with it

CONVERT A MANY-TO-MANY RELATIONSHIP

Break the Many-to-Many relationship into two 1:M relationships by introducing a new entity.

BY LOCALITY (2 types)

CENTRALIZED Usually owned and maintained by the organization using the database Example - SIS (Student Information System) at USC DISTRIBUTED Database is decentralized and data is distributed (sometimes redundantly) Example - Domain name servers Blockchain? Bitcoin?

CANDIDATE KEY

Candidate Key (or Key) functionally determines the rest of (all other) fields in a table. Is a potential Primary Key Must be unique, and cannot be NULL Should be secure, never change, and never be recycled

Sources of Business Rules

Company managers Policy makers Department managers Written documents - procedures, standards, manuals End users Any other stakeholders

CONCEPTUAL - LOGICAL - PHYSICAL MODELS

Conceptual: a conceptual data model identifies the highest-level relationships between the different entities (also called entity-based or object-based data models.) Logical: a logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Physical: physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables.

Sub-Queries (Definition, Objective, 4 Clauses it can appear in)

Definition: A Subquery is a SELECT statement (query) within another statement. I.E. a query inside a query Subquery = Inner query = Nested query Objective: To write queries in SQL to retrieve data from a table, but restricting the data based upon data in another table (subquery) i.e. to get results from table A, conditioned on results from table B Subquery can occur as part of: SELECT clause FROM clause WHERE clause HAVING clause

Relationship

Describes an association among entities

DESIGN ERD - GENERAL STEPS

Determine all possible Entities, their main Attributes, and Primary Keys. Collapse into one larger view Consider Relationships in pairs Determine Relationship Type, and Cardinality (Max/Min) Assign Foreign Keys. Determine Relationship Strength. Remove all Many-to-Many Relationships. Replace with one to many relationships Fill in remaining Attributes.

3. Except

EXCEPT returns any distinct values from the left query that are NOT found on the right query. Similar to the UNION command, EXCEPT also operates on two SQL statements. The difference is that, while UNION essentially acts as an OR operator (value is selected if it appears in either the first or the second statement), the EXCEPT command acts as an AND operator (value is selected only if it does NOT appear in BOTH statements).

RULES for WELL-FORMED TABLES (3 things)

Each table represents a single subject COURSE table only contains data about courses 2. No data is redundantly stored Allows updates to occur in only one place 3. All non-key attributes dependent on the Primary Key Yields simplest tables and data that is uniquely identifiable by the PK

ENTITY vs ENTITY INSTANCE

Entity refers to ALL items of a kind E.g. ALL Students, ALL Vehicles, ALL Transactions Corresponds to a whole table Synonyms: a Table, an Object Entity Instance refers to a SINGLE item E.g. ONE Student, ONE Vehicle, ONE Transaction Corresponds to a single row in the table Synonyms: a Record, an Object Instance, a Thing

1. Inner Join -Equijoin

Equijoin - matches based on equality Condition is where Primary Key and Foreign Key match INNER JOIN can simply be written as JOIN (by default JOIN is INNER JOIN) FROM A JOIN B ON <condition> is equivalent to FROM A, B WHERE <condition>

HIERARCHICAL MODEL

Every parent has one or many children of a certain type Example - 'Client' can have one or more 'Appointments' PROS Fast access Ensures referential integrity CONS Whole hierarchy must be satisfied to enter data Can only retrieve data from root Data redundancy

RELATIONAL MODEL

Every relation can be thought of as a table Every table has a column that uniquely identifies each row in the table By sharing the identifying field, data can be related PROS Implementation independence Easy data retrieval CONS Computationally expensive

Database Transaction Properties (ACID)

In order to guarantee validity of database transactions, the following ACID properties are required: qAtomicity: Each transaction succeeds or does not. There is no midway point (i.e. All or Nothing) q qConsistency: Ensures database correctness. All integrity constraints must be maintained. q qIsolation: Transactions happening concurrently without leading to inconsistency. Changes are not visible until complete. q qDurability: Updates and modifications are stored so that they persist even if a failure occurs ✘

1. LEFT OUTER JOIN or LEFT JOIN

LEFT JOIN: The table before the JOIN command is the main table Lists all the rows from the left table, even if there's no match in the right table

2. Intersect

List titles that appear in BOTH tables INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.

ESTABLISH - ONE-TO-ONE RELATIONSHIP (where to place foreign/primary keys)

Mandatory - Optional Place the Primary Key of the mandatory side as the Foreign Key on the optional side Optional - Optional Place the Foreign Key on whichever side to have the fewest nulls Mandatory - Mandatory Most likely should be just one entity

Maximum Cardinality

Maximum number of entity instances that can participate in a relationship Possible values are 1 (one) and M (many)

Cardinality

Means "count," and is expressed as a number 0, 1, or M/N Refers to the count of one entity to the other entity in a relationship How to determine Cardinality: Business Rules Common sense

Minimum Cardinality

Minimum number of entity instances that must participate in a relationship Possible values are 0 (optional) and 1 (required/mandatory)

MIN, MAX

Minimum, Maximum attribute value

2. Inner Join - Non-Equijoin

Non-Equijoin - matches based on in-equality

NORMALIZATION

Normalization the process of evaluating and correcting table structures to reduce redundancies and to better separate subjects. Normalization usually occurs after the 1st ERD but before the final ERD.

NO-SQL

Not Only SQL: qNot based on relational databases qThey may support SQL like querying qSchemaless qACID transactions may be compromised to increase performance, availability, speed. Eventually consistent. NoSQL Data Stores: qKey-Value: amazon dynamo qNo schema; data values are identified and accessed via a key qColumn: cassandra qGraph DB: neo4j qDocument: mongodb

BUSINESS RULES TO DATA MODELS

Nouns translate to Entities Usually nouns translate to tables Even so... some Nouns will translate to Attributes And some extra Tables may exist that aren't in the business rules (think M:N) Verbs translate to Relationships Relationships are bidirectional and must identify relationship type Adverbs (can, may, must) translate to Cardinality of a Relationship Cardinality refers to the count of one entity to the other entity in a relationship

COUNT

Number of rows with non-null values

BY USAGE (2 types)

OPERATIONAL DATABASES Maintains data relevant to moment-to-moment operations Optimized for fast data manipulation (inserting and editing data) Example - single Walmart store database ANALYTICAL DATABASES Contains historical data to observe trends and make business decisions Optimized for fast data processing (access and computation) Example - Walmart historical sales database

OBJECT ORIENTED MODEL

Overlays object oriented functionality on traditional relational databases Uses object relational mapping (ORM) middleware between consumer programs and database structures PROS Allows programmers to develop database programs without knowing SQL CONS There is a high cost - financial expense, computation time, external dependencies, maintenance overhead

ANOMALIES (3 types)

Poorly designed DBs can create problems, issues. Anomalies are inconvenient and lead to error-prone situations when we process data in tables. There are three types of anomalies: Update Anomaly Delete Anomaly Insert Anomaly

2. RIGHT OUTER JOIN or RIGHT JOIN

RIGHT JOIN: The table after the JOIN command is to the main table Lists all the rows from the right table, even if there's no match in the left table Still need column list and a matching condition

PLACES WHERE SUBQUERY CAN OCCUR

Recall that Subquery can occur as part of: SELECT clause FROM clause WHERE clause HAVING clause

SECOND NORMAL FORM

Requirements: It is in 1NF It has NO Partial Dependencies Steps: Start at 1NF Identify Partial Dependencies Make New Tables to eliminate Partial Dependencies Assign corresponding dependent attributes

SUBQUERY in SELECT CLAUSE

SELECT clause uses <columnlist> to indicate columns of the result Columns can be attributes of tables Result of an aggregate function <columnlist> can also include Subquery expression: This is called Inline Subquery Subquery must return one single value If creating an Alias for the Subquery's result, cannot use that alias in the <columnlist>

Data Analytics

The goal of this course is to learn how to use database systems not just as systems of record but a system of information. i.e. a repository of data that can be analyzed to derive information, knowledge and perhaps wisdom!

Difference between equijoin and natural join

The key difference between Inner Join and Natural Join is that Inner Join provides the result based on the matched data according to the equality condition specified in the query while Natural Join provides the result based on the column with the same name and same data type present in tables to be joined

SUBQUERY ADVANTAGES

They allow queries that are structured so that it is possible to isolate each part of a statement. They provide alternative ways to perform operations that would otherwise require complex joins and unions. Many people find subqueries more readable than complex joins or unions. Subqueries are not the complete replacement of Joins. Most of the time, it's being used as a complement/alternative approach to complicated Joins.

UNION

UNION: list each job title ONCE even if the title appears in both tables UNION ALL: to show overlap - if CEO appears in each table, it will appear twice in the combined results Think of UNION similar to the OR operator: if value appears in one statement result, it will appear in the final result.

Primary Key

Uniquely identifies each row A Primary Key may: consist of a single column or multiple columns combined Each table must have a Primary Key column Primary Key values must be unique, and cannot be NULL

KEY-VALUE STORES

Use case: qQuick lookups with no 'relational' component (no joins) qQuick and high scalability qOften (mostly) in memory Application: qUser session data between shared applications ie amazon dynamo

COLUMN STORES

Use case: qSuper scalable qMapReduce support Application: qLarge scale real time data logging (Finance, Web Analytics) Cassandra Database uses Cassandra Query Language (CQL) CQL offers a model close to SQL in the sense that data is put in tables containing rows of columns. ie cassandra

SUBQUERY in WHERE-IN CLAUSE

Used when comparing a single attribute to a list of values (result of a Subquery) The Subquery appears in: in the WHERE-IN Clause on the right side of the comparison Subquery result: Requires a Subquery that returns a list of values Values returned by the Subquery must be of comparable data type for the comparison Can be used in combination with Joins

SUBQUERY in WHERE CLAUSE

Used when comparing a single attribute to a single value (result of a Subquery). Most common type: the Subquery appears in the WHERE Clause on the right side of the comparison Subquery result: Requires a Subquery that returns only one single value Value returned by the Subquery must be of comparable data type for the comparison Can be used in combination with Joins.

SUBQUERY in WHERE-ALL&ANY CLAUSE

Used when comparing a single value to a list of values (result of a Subquery) using inequality comparison Meaning: > ALL : (reads "greater than ALL") is equivalent to "greater than the highest in list" < ALL : (reads "less than ALL") is equivalent to "less than lowest" = ANY : (reads "equal to ANY") is equivalent to using the IN operator

NETWORK MODEL

Uses inverted tree structure to represent owner/member relationships through structures with multiple links allowed PROS Complex (real-world) relationships are better modeled Any data can be accessed through structures CONS Users must be familiar with network layout to retrieve data Changing structures is difficult and involves reworking whole network

View

Virtual table made of fields of other tables Database stores base tables Database temporarily assembles views to display query results Allows the viewing of interrelated data from multiple tables Data protection by only displaying results (no edits allowed)

DELETION ANOMALY

A Delete Anomaly exists when certain attributes are lost because of the deletion of other attributes. What other data is removed when we delete a row of Student's Activity data?

JOIN

A JOIN occurs when data is retrieved from multiple tables. Foreign Key and Primary Key of related tables are compared and matched. This implies a relationship exists(weak or strong) A JOIN occurs by listing tables in the FROM clause, e.g. FROM Student, Course The comma between the table names creates a Cartesian Product Table which is filtered by the JOIN matching condition in the WHERE clause The JOIN matching condition is WHERE TABLE1.PrimaryKey = TABLE2.ForeignKey

DOT NOTATION

The dot "." notation allows us to refer to a column within a table Examples: -STUDENT.SID means we're referring to the SID column in the table STUDENT -INVOICE.CUS_CODE means we're referring to the CUS_CODE column in the table INVOICE

Attribute

A characteristic of an entity, Table column rules apply to Attributes

The 2 types of joins and their 3 and 2 respective subcategories

1. Inner Join a. Equijoin b. Non-Equijoin c. Natural Join 2. Outer Join a. left b. right

analytics

1. the analysis of data, typically large sets of business data, by the use of mathematics, statistics, and computer software. 2. the patterns and other meaningful information gathered from the analysis of data.

TABLE FEATURES

2D structure composed of rows and columns Each row represents one occurrence within the entity set(instance) Each column represents an attribute and has a distinct name Each intersection of a row and column represents a single data value All values in a column must conform to the same data format The order of rows and columns does not matter Each table has an attribute that uniquely identifies each row

FIRST, LAST

First, Last value of the results

GROUP BY

GROUP BY by default collects all identical items together Can restrict groups with the HAVING Clause by adding criteria/conditions to restrict the results The <conditionlist> in HAVING Clause is similar to that in the WHERE Clause, but it applies to GROUP BY only

POST RELATIONAL MODEL

Handles volume, velocity and variety of data (3V's) Uses key-value storage of data instead of structured tables, allowing for sparse unstructured data Not always an appropriate replacement for relational database PROS Enabled the 'Big Data' revolution CONS Can involve a large overhead

IMBS

IN-MEMORY DATABASES (IMDB) Using RAM instead of hard disk for the database All relevant data are in memory all the time Speeds up queries to provide real time or near real time analytics capabilities Innovations: qData are stored in RAM qUse of columnar storage for the relational database. qIndexing (is free with columnar storage) qData compression qParallel data processing qPartitioning data

Graph DB

In computing, a graph database (GDB[1]) is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the graph(or edge or relationship), which directly relates data items in the store. The relationships allow data in the store to be linked together directly, and in many cases retrieved with one operation. Graph databases, by design, allow simple and fast retrieval[citation needed] of complex hierarchical structures that are difficult to model[according to whom?] in relational systems. Graph databases are similar to 1970s network model databases in that both represent general graphs, but network-model databases operate at a lower level of abstraction[2] and lack easy traversal over a chain of edges.[3] Use Case: qDense network of strongly connected entities Application: qFacebook graph search qGoogle knowledge graph qFraud detection qNetwork infrastructure Graph DB vs SQL: qThe size and degree of connectedness => exponential increase of query times for SQL. qConstant time for Graph DBs on size and linear for connectedness. neo4j

SUBQUERIES VS JOINS

SUBQUERIES and JOINS share many similarities and differences. Joins are typically used: to return rows (of a merged/joined table) Subqueries are typically used: to return a single value (an aggregation) or rows (of a table, or a merged/joined table). to perform tests for set membership or make set comparisons (in vs not in).

FILE SYSTEM MODEL

Sometimes called 'flat model', referring to a 'flat file' that contains data Does not indicate relationships - just maintains records Example - an address book Today -> popular for data transmission (as a delimited file or XML or JSON)

MULTI-USER

Sometimes called 'workgroup' or 'enterprise' depending on scale Example - USC course schedule

SUBQUERY in FROM CLAUSE

Subquery in the FROM clause forms a Temporary Table Can create Alias for the Temporary Table The Alias can be used in places where a table is expected

SINGLE USER

Supports one 'user' (could be a program) Example - desktop database

Where vs Having

The WHERE clause applies the condition to individual rows before the rows are summarized into groups by the GROUP BY clause. However, the HAVING clause applies the condition to the groups after the rows are grouped into groups. Therefore, it is important to note that the HAVING clause is applied after whereas the WHERE clause is applied before the GROUP BY clause.

COMPOSITE DETERMINANT

a determinant composed of multiple fields State, LicenseNumber) => (FirstName, LastName, Address) (StudentID, CourseNumber) => Grade (Manufacturer, SerialNumber) => (Model, WarrantyInfo)


Conjuntos de estudio relacionados

A & P - Unit 2 Exam // multiple choice

View Set

Psychology 1115 Chapter 4: Nature, Nurture, and Human Diversity

View Set

Chemistry (CHM1020) Chapter 6 Study Guide

View Set