Database Foundations Soper

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

WHERE clause IN shortcut

the IN condition provides a shortcut as the example shows you can include multiple values in parenthesis rather than using several OR or AND statements

WHERE clause compound conditions

these give us flexibility and allow us to create compound conditions where more than one criteria come into play using keywords: AND OR

WHERE clause NOT operator

will return all information except for the information matching the specified criteria. The example will return employee names for all employees except for those in departments 4, 8 or 9

HAS-A relationships

The term is used because each entity instance has a relationship to a second entity instance: -An employee has a locker -A locker has an employee

resource locking

To avoid concurrency issues, resource locking will disallow transactions from reading, modifying and/or writing to a data set that is in use

functions of a DBMS

create databases create tables create supporting structures read database data modify database data (insert, update, delete) maintain database structures enforce rules control concurrency provide security perform data backup and recovery

entity columns and rows

columns = attributes of the entity rows = instances of the entity (tuple)

data warehouse

commonly contains data from many different sources: organizations operational database other internal data external data; maybe from publicly available sources, or may purchase data. i.e. what competitors/market is doing or future trends all this data is run through an ETL (extract, transform, load) system to clean and prepare it for inclusion in the data warehouse

consistent transaction

commonly referred to as an ACID transaction Atomic Consistent Isolated Durable

composite key

composed of two or more columns (attributes) that are combined together to create uniqueness, like a flight number and a flight date where by itself either of these attributes are not unique

database system

comprised of users: use a database, like accessing bank info database applications: talks to the DBMS database management system (DBMS): the gatekeeper to the database, manages and controls database activities and database

3 most important database administration functions

concurrency control security backup and recovery

table synonymns

file and relation

determinant

in the girl scout cookie example the price of one cookie determines the price of a box of 12. The ind. cookie price is the determinant the value of the attribute can be used to determine the values of other attributes within the table

deletion anomaly

in the example, if we remove Leonard from the table, we would also lose all data associated with Leonard in that row

FULL OUTER JOIN

instructs the database to include all of the records from both the left table and the right table regardless of whether a match exists

SQL characteristics

not a programming language, rather a data sub-language SQL is comprised of: Data Definition Language (DDL) Data Manipulation Language (DML) Data Control Language (DCL)

SUM, AVG, STDEV

numeric functions in SQL cont: SUM - calculates the sum (total) for a specific column for those rows matching the criteria AVG - calculates the numerical average (mean) of a specific column for those rows matching the criteria STDEV - calculates the standard deviation of the values in a numeric column whose rows match the criteria

COUNT, MIN, MAX

numeric functions in SQL: COUNT - counts the number of rows that match the specified criteria MIN - finds the minimum value for a specific column for those rows matching the criteria MAX - finds the maximum value for a specific column for those rows matching the criteria

purpose of a database

store data provide organizational structure for data provide a mechanism for querying, creating, modifying, and deleting data

relational database

stores information in tables each informational theme is stored in its own table, breaking up a lists into several parts or tables

row synonymns

record and tuple

column synonymns

field, attribute

implementation stage

implementing our database designs

transition a data model to a database

-need to define the primary key of all tables -specify column properties -data type: integer, float, varchar, data/time etc. -NULL status: allowed or no? -default values if any -data constraints if any -relation is then analyzed using the normalization rules

bitmap index

An index that uses a bit array (0s and 1s) to represent the existence of a value or condition.

decision tree analysis

Classifies entities into groups based on past history

CREATE TABLE statement

Employee is the name of the table as it comes after CREATE TABLE Everything part of the employee table must fall within an open and closed parenthesis and should always end with a semi colon -This table contains two columns empId and empName -Integer represents the data type of the column -NOT NULL means that every row added into the Employee table must have a value for the empId -the comma means we have finished defining this attribute and there are others to define

3 things we can do when changing data in a database

INSERT INTO - adds a new row into a table UPDATE - updates the rows in a table which match the specified criteria DELETE FROM - deletes the rows in a table which match the specified criteria

insertion anomaly

If we add Sheldon, but he doesn't drive or have a car, all the other cells are left blank and becomes wasted space, if later we then add vehicle information for Sheldon and we misspell Volkswagen. This insertion anomaly would not have occurred in a normalized database

update anomaly

If we were to update the carId in a database that had not been normalized we would then have a mismatch in carId for other instances of that vehicle

OUTER JOIN

In an inner join records can only be included in the result set if there are matched pairs of values in the two tables that are involved in the join In an outer join the matched pairs of values will be included in the results, but so too, might be additional records that have no matching value in the other table involved in the join Outer joins can be used to obtain data that exists in one table without matching data in the other table

DELETE FROM

In this example we are saying to delete the row from the Employee table where the empId attribute matches 29

UPDATE example 2

In this example we are setting the deptId to 4 for every employee whose name begins with 'Da' This example shows that updates can be made to more than one row at a time

Indexes and Columns

Indexes can be created on most columns, it depends on the columns data type. Columns with LOB data types cannot be indexed. (exception is when using a hashing algorithm) LOB data types include: text ntext image varchar(max) nvarchar(max) varbinary(max)

JOIN ON

JOIN ON can be used to move the JOIN syntax into the FROM clause, which declutters the WHERE clause so it can be used for other filtering etc. this example says look in the Employee and Department tables and link those tables together by their department IDs and then constrain the results only to those departments that do not being with the word Account. Result will be list of employees and the department they work in if the department does not begin with 'Account'

N:M SQL example

SQL example of the Student and Class relationship this example links all three tables together

1:N SQL example

The join is being done in the WHERE clause, but could also be done in the FROM clause using the JOIN keyword

non ID-Dependent weak entities

The relationship between a strong and weak entity is termed a non-identifying relationship if the weak entity is non-ID-dependent Represented by a dashed line. A dashed line is also used to model relationships between strong entities If the relationship between a strong entity and a weak entity is a non-ID dependent relationship then the primary key of the strong entity will appear as a foreign key within the weak entity but it will not be a part of that weak entities primary key

ID-Dependent weak entities

These will have as part of its primary key, the primary key of its parent entity. In this way the database enforces the constraint that instances of the ID-dependent weak entity cannot exist without being associated with an instance of its parent entity Represented by a solid line and these are also referred to as an identifying relationship, they are called this because a part of the identifier of the weak entity is derived from the parent

deadlock

a locking strategy is useful but it can introduce new problems into the database processing environment deadlock occurs when two transactions have each locked resources in the database that the other transaction needs in order to complete its tasks. The two transactions will wait on each other indefinitely

primary key

a candidate key chosen to be the main key for the relation if you know the value of the primary key, you will be able to uniquely identify a single row within the table

candidate/primary key and functional dependency

a primary/candidate key of a relation will functionally determine all other non-key attributes in the row. i.e. if we know an employee ID (primary/candidate key) we should be able to use that to find the employees name and phone number or the other attributes in the row associated with that employee

data normalization

a process of analyzing a relation to ensure that it is 'well formed' normalization removes anomalies to be considered a 'well formed' relation every determinant must be a candidate key and any relation that is not 'well formed' should be broken into two or more well formed relations a good rule of thumb for normalization is that a well-formed relation will not encompass more than one business concept

SELECT query

a query pulls info from one or more relations and creates (temporarily) a new relation. this allows a query to create a new relation feed information into another query (as a subquery) the resultset may not be in 3NF, especially if performing a join, when you are pulling data from more than one table

functional dependency

a relationship between attributes in which one attribute or group of attributes determines the value of another attribute in the same table. i.e. the price of one girl scout cookie can determine the price of a box of 12 cookies If the value of one attribute can be determined by the value of another, then that first attribute is said to be functionally dependent upon the second attribute

database transaction

a series of SQL commands or instructions that involves several operations (operations are atomic), every step of these operations must be successful or the entire transaction will fail. A rollback is then performed and no changes are saved to the database operations are sometimes called a logical unit of work ( LUW)

database application

a set of one or more computer programs or websites that serves as the intermediary between the user (i.e. customers accessing bank account data) and the DBMS

attribute

a specific piece of information we'd like to keep track of for each instance of an entity. Like an employee's name, age, phone number etc. would each be attributes each attribute has datatype, i.e. string, integer etc. and other properties like are NULL values allowed? two kinds of attributes: Identifiers (keys) non-key attributes

data mart

a subset of the organizations data warehouse constructed to support a specific need within the organization; marketing dept, specific group of employees etc. the concept of a data mart is derived from the idea that not all managers need access to all of the organizations data

1NF

a table is in first normal form when it doesn't contain any multivalued attributes, meaning every attribute value is atomic (we are not storing more than one value in a cell), a table does not contain a relation if a cell has more than one value all relations by default are in first normal form example of 1NF

Durable

a transaction in which all committed changes are permanent

Consistent

a transaction is consistent if no other transactions are permitted on the records until the current transaction finishes its task this ensures the transaction integrity has statement level consistency among all records

table to qualify as a relation (not all tables are relations)

a two dimensional table that has certain characteristics, for a table to qualify as a relation it must have certain characteristics -rows contain data about instances of the entity -columns contain data about attributes of the entity -cells of the table hold a single value -all values in a column are of the same data type, string, integer etc. -each column has a unique name -order of columns and rows is unimportant -no two rows can be identical, (when looking at the entire row)

associative table/entity

a type of intersection table that contains additional non-key attributes, called an association relationship In the student class example we have added a Grade attribute. Here we are saying that for each intersection between a student and a class we want to record a grade

SQL view

a virtual table created by a DBMS-stored SELECT statement which can combine access to data in multiple tables and even in other views you can run a query against a view just as you can run a query against a table

data anomalies

abnormalities when all changes in redundant data are not made correctly most modification problems are solved by breaking an existing table into two or more tables through a process known as normalization

LIKE

allows for searches on partial data values. for example returning a list of employee last names that start with letter 'D' SQL provides us a with a few different wildcard characters that can be used with these kinds of partial searches: % means to match values to any number of characters that follow as example shows _ means to match values against a fixed number of unknown characters as example shows

UPDATE

allows us to alter or modify data values in an existing row of a table in this example we are saying we want to change a value in the Employee table using the keyword UPDATE SET then defines the name of the attribute we want to change followed by the new value, in this case the phone number WHERE specifies which row or rows we want to update, in this case the row that matches empId 29

cardinality

allows us to be more precise in defining relationships between entities two types: maximum cardinality - defines the max number of instances of one entity that are allowed to participate in a relationship. Values are typically either one, many, or some other positive fixed number, can never be zero minimum cardinality - defines the min number of instances of one entity that must participate in a relationship. These values typically are either zero (optional) or one (mandatory)

GROUP BY clause

allows us to combine results in a categorized output this example will give us the department ID and the number of employees that work in each department and grouped together by deptId The resultset includes an alias, we're telling the database we want the result of the count function to be called numberOfEmployees

WHERE clause BETWEEN shortcut

allows you to specify a maximum and minimum value on one line. The max and min values specified in example, 10 and 45 are included

OLAP report

also known as an OLAP cube, an OLAP reports input are called dimensions and outputs are called measures

one to one binary relationship

also referred to as 1:1 a single entity instance in one entity class is related to a single entity instance in another entity class represented by a straight line in crow's foot symbol

one to many binary relationship

also referred to as 1:N a single entity instance in one entity class is related to many entity instances in another entity class

many to many binary relationship

also referred to as N:M many entity instances in one entity class are related to many entity instances in another entity class example: a supplier may supply several items and a particular item may be supplied by several suppliers this type of relationship requires a 3rd table to sit between them, a lookup or join table

join

another way (subquery being one) of combining data from multiple tables is by using a join two different kinds of join: inner joins outer joins (left, right and full)

indexes

arguably the single most critical tool for improving database performance it is a data structure that contains a copy of some of the data from one or more existing database tables can be thought of like an index at the back of a textbook

ID-Dependent vs Non-ID-Dependent weak entities example

as you can see in example on the left part of the primary key of the weak entity comes from the primary key of the strong entity, so it is an ID-dependent weak entity in example on right the primary key of strong entity does not appear in the primary key of the weak entity. Instead the primary key of the strong entity is showing as a foreign key in the weak entity, so it is a non-ID-dependent weak entity And ID-dependent has the solid line where non-ID-dependent has the dashed line

candidate key

called the candidate key because it has the potential to become the primary key

database roles

can be either flexible or fixed flexible roles are custom roles that are defined by the database administrator fixed roles are predefined roles to which database users can be assigned

CHECK constraint

can be used to create restrictions on the values that are allowed to appear in a column In this example we are altering the project table to include a CHECK constraint named projectCheckDates and instructing the database to check whether the start date is less than the end date. If this statement is true everything is acceptable if not the row will not be added to the Project table

ORDER BY

can be used to determine how the rows in a resultset will appear, either ascending or descending ascending is used by default, so the example shown will return a list of all employee names from the Employee table in ascending order to sort by descending you would need to put DESC after empName

nonintegrated data

data from two or more sources that need to be combined so they can be added to the data warehouse

metadata

data that describes the structure of the data in the database. i.e. an employee ID being labeled as an integer is metadata

unique key

data value is unique for each row so the key will uniquely identify a row unique keys: candidate key composite key primary key surrogate key

unary relationships

degree 1 is a unary relationship, in which an entity is related to itself, aka recursive relationship example: a person being married to another person, 'person' is just one entity

binary relationships

degree 2 is a binary relationship, where one entity is related to another entity, these are by far the most common of the three example: an employee that has a parking space, 'employee' and 'parking space' are separate entities but are related

ternary relationships

degree 3 is a ternary relationship, where three entities are involved in the relationship example: a doctor, a patient and a drug. All three are different entities and the intersection between them is a prescription for that drug, prescribed by the doctor and used by the patient

DBMS products

desktop DBMS: microsoft access organizational DBMS: SQL server Oracle MySQL DB2

market basket analysis

determines patterns of associated buying behavior or co-occurrence, also known as affinity analysis. Results of affinity analysis can then be used for a recommendation engine, used to suggest movies/books you might like etc.

dirty data & inconsistent data

dirty data - a data value which is obviously incorrect, i.e. 'V' stored as gender code instead of 'M' or 'F' inconsistent data - data that have changed but is not reflected across all our systems

three concurrency issues

dirty reads - the transaction reads a modified record that has not yet been committed or permanently changed to the database inconsistent reads - the transaction re-reads a data set and finds that the data have been changed phantom read - the transaction re-reads a data set and finds that a new record has been added

DROP

drops an object from a database DROP TABLE Employee; this would remove the entire Employee table from the database this example removes the empFk from the Employee table ADD CONSTRAINT is used to add this foreign key and DROP CONSTRAINT is used to remove it

database security

ensures that only authenticated users are allowed to perform only authorized activities

referential integrity constraint

ensures that values of a column in one table are valid based on the values in another table i.e. the value of a foreign key must match a value of an existing primary key

relationships

entities can be connected to one another through relationships. The degree of the relationship refers to the number of entities that are participating in the relationship degree 1 is a unary relationship degree 2 is a binary relationship degree 3 is a ternary relationship

entity class vs entity instance

entity class - this is a description of the structure and format of the occurrences of the entity. Similar to a recipe or architectural blueprints. entity instance - a specific occurrence of an entity class entity class is like a recipe and entity instance is like an actual cake

too much data

example - an excessive number of columns concept of parsimony where simplicity is often better than complexity

incorrect format data

example - time data stored as hours when needed in minutes

ETL

extract transform load

COUNT, MIN, MAX, AVG example

first select statement is just saying how many employees in the Employees table? In second example the database will first scan for all projects whose projId value is greater than 7 and then will look within that set of rows at values in the hours column and will use that to determine the minimum, maximum and average number of hours worked for those projects

defining a foreign key using SQL

foreign key like a primary key is also considered a constraint in table you can see we have empFk and skillfK constraints FOREIGN KEY (empId) empId represents the name of the attribute within this table that we want to be a part of the primary key/foreign key relationship REFERENCES Employee (empId) we are saying we want this foreign key to point to the empId attribute within the Employee table

requirements analysis stage

gathering information so that we can understand the data problem for which we are attempting to design a solution sources of info can include: interviews forms reports queries use cases business rules observation JAD sessions - joint application development sessions where you assemble all the stakeholders in a project together in a single room

common causes of database failure

hardware failures programming bugs human errors/mistakes malicious actions regular backups is the remedy for all of these issues

implicit vs explicit resource locking

implicit locks are issued automatically by the DBMS based on an activity explicit locks are issued by users who are requesting exclusive rights to specified data. Can be used for different levels of granularity; table, row, column or cells. example of explicit locking using item 100 from the lost update problem example, but by imposing this strategy we no longer have the lost update problem

cluster analysis

identifies groups of entities that have similar characteristics

1:1 relationships and foreign keys

if both sides of the relationship are optional, meaning if the minimum cardinality on both sides of the relationship is a 0, it doesn't matter which table receives the foreign key in a 1:1 binary relationship If only one side of the relationship is optional, meaning if one side of the relationship has a minimum cardinality of 0 while the other side has a min. cardinality of 1, the optional side generally receives the foreign key In the example whichever 1:1 relationship option makes more sense to you or an organization should be what you go with

exclusive & inclusive supertype subtype relationship

if exclusive this means that each instance of the supertype can be related to, at most, one of the subtypes. In the vehicle example a vehicle can be car or truck or motorcycle but it cannot simultaneously be a car and motorcycle etc. if inclusive this means the supertype can be related to one or more subtypes if the subtype is a specific case of the supertype then we can call that an IS-A relationship. i.e. a truck IS-A vehicle a subtype will inherit all of the properties/attributes of its supertype, and then in addition has it's own unique attributes in the example the cirlce with X represents exclusive and no X represents inclusive

eventual consistency

if no new updates are made to a specific data item for a period of time, eventually all of the requests for that data item will return the most up to date value, regardless of which node is servicing the request

DISTINCT

if the result of your query contains many duplicate rows you can use this to return just unique instances by placing it in front of column name as example shows

lost update problem

if two or more users are attempting to update the same datum at the same time it is possible for one update to overwrite the other update In the example the result should be 2 items left for inventory for item 100 but instead user B's update overwrote user A's update and changed the value to 7 should have had an intelligent strategy in place for handling this concurrent access, this error would not occur

BCNF

if we reach a point in normalization where every determinant within the table is also a candidate key then the table will be in Boyce-Codd Normal Form a relation is considered 'normalized' when every determinant is a candidate key, it has reached BCNF, aka 3.5NF and is slightly more stringent than 3NF

linear search vs binary search of indexes

if you have alphabetized list of names vs a random list of names the alphabetized list can be searched much quicker than the random list. Can use a binary search strategy when list is alphabetized. begin search in the middle and then determine if the data we're looking for is above or below our beginning search, once determined you then begin search in the middle of remaining rows compared to linear search, binary search is much quicker

clustered vs nonclustered indexes

in a clustered index the actual data rows that comprise the table are stored at the leaf level of the index only one clustered index per table PK columns are good candidates for clustered indexes in a nonclustered index the leaf nodes contain the values from the indexed columns, along with a row locator which points to the location of the actual data row nonclustered are slower than clustered indexes because the DBMS must follow a pointer to retrieve the actual data row a table can have more than one nonclustered index the leaf nodes of a nonclustered index are allowed to contain values from non-indexed columns

cursors

in a result set in SQL a cursor is simply a pointer into that set of rows forward only cursors scrollable cursors

supertype subtype relationship

in a supertype subtype relationship the subtype is a specific version of a supertype example: a vehicle might be a supertype and then different types of vehicles would be the subtypes where there are unique attributes about each type of vehicle that we want to track in addition to a set of attributes that we want to track for every vehicle

surrogate keys and associative entities example

in first example composite key is used for the rating table; customerId and productId and allows a customer to rate a product only one time in second example a surrogate key is used (ratingId) with customerId and productId as foreign keys. This example would allow customers to rate the same product multiple times

recursive relationship example

in first example each person in the table can be sponsored by 0 to 1 other people, or at most 1 other person in second example each customer might refer many other customers but each customer is referred by a maximum of 1 other customer in third example a doctor might treat many other doctors, while each doctor might be treated by many other doctors

1:1 relationship SQL example

in first example the LockerId is a foreign key in the employee table and so we need to do the join on equality between the locker IDs. And in 2nd example we're placing the employee ID as a foreign key in the locker table so we need to do the join by looking for matching values of employee IDs between the two tables

WHERE clause inner join

in this example we are asking for the employee name and department name from the Employee and Department table where the department ID in the Employee table matches the department ID in the Department table. This will then extract the employee name from the Employee table and department name from Department table and include those values as a new row in the result set

LEFT OUTER JOIN

in this example we are asking the database for a list of employee names and department names from the Employee table (referred to as e) and the Department table (referred to as d) and we're joining the tables together using a LEFT OUTER JOIN operation, where the deptId of the Employee table matches the deptId of the Department table the resultset (table on right) will be list of employee names along with their department names in this example Employee table is listed on the left side of join statement, so it is left and Department is right table and by using LEFT OUTER JOIN we are saying give me a list of all employee regardless of whether they have a matching department but if they have a matching department include it as well. This is why Raj is listed but he does not have a department listed

RIGHT OUTER JOIN

includes all of the records in the right table regardless of whether there is a matching record in the left table so in this example we are including results from the deparment table even though there isn't an employee assigned to the marketing department

join/linking table

joins two tables together using matched pairs of data values

defining a primary key using SQL

keys in SQL are considered to be constraints it is used inside the parenthesis because it is part of the CREATE TABLE statement, the constraint is given a name, in this case empPk, every constraint must have a unique name PRIMARY KEY (empId) tells the database we want to use the empId attribute as the primary key to create a composite primary key you would just need to separate the two attributes by a comma

Isolated

least restrictive isolation level is read uncommitted most restrictive (or isolated) is serializable. Serializable ensures likelihood of concurrency control problems is very low, the cost of that is accessibility

null values

means that no data exists, an empty cell in a table, this is different from a zero, space character, empty string etc. problem with null values: maybe the column value hasn't been decided or it is just unknown, or not appropriate for a specific row, so null values can create ambiguity

OLAP

mentioned earlier there are 2 broad categories of business intelligence applications; reporting applications and data mining applications online analytical processing - it is a technique that supports 'reporting' applications and allows us to dynamically examine database data in real time and apply simple transformations like sorting, filtering, grouping etc. data mining by contrast supports the data mining of BI applications. A mathematically sophisticated technique for analyzing database data

Objects vs Entities

methods is what differentiates Objects from Entity classes in entity relationship model

advantages of relational database

minimizes data redundancy preserves complex relationships among topics allows for partial data (null values)

concurrency control

more than one user, may be a person or software application etc. may be trying to access the same data in the database at the same time the need to provide for concurrent access leads to a problem known as interdependency, where modifications to the database made by one user may impact other users who are simultaneously trying to use the same information concurrency control ensures that one users actions do not adversely impact another users actions. This involves a spectrum, on one end when one user accesses or touches a particular set of data within the database we block all other users from being able to use those data until the first user has finished whatever tasks they were performing On the other end we don't impose any constraints and users are allowed to read and work with data regardless of if other users are also accessing or working with usually a balance between these two ends of concurrency control is put in place

3NF

must meet criteria of 2NF and have no transitive dependencies (a functional dependency on an attribute that is not the primary key). They are called transitive because the primary key is a determinant for a non-key attribute, which in turn is a determinant for another non-key attribute solution is the same as other forms of normalization, need to break up dependencies by creating additional smaller tables in example we put capacity in its own table since we can determine the capacity based on the room # alone, now no non-key field is dependent on another non-key field

N:M relationship example

need to have an intersection table that sits between the two parent tables and typically has a composite key comprised of the primary keys from each of the tables that it's connected to

surrogate key

often used when there is not a column in a table that could serve as a unique identifier, so it serves as the primary key, often a numeric value

Identifiers (keys)

one of two attribute types used to identify an instance of an entity class, common examples are SSN, student ID, email address etc. something like a department ID can be used as an identifier even though it is a non-unique key keys can either be unique or non-unique

dynamic cursor

opposite of a static cursor all changes made to the rows in a result set are visible when scrolling through the cursor

optimistic locking vs pessimistic locking

optimistic locking - the assumption is made that no conflict will occur during the execution of the transaction, data not locked before updating, useful if mostly using database for reads pessimistic locking - assumes that a concurrency control problem will arise during execution of the transaction, so the data is locked before updating. Useful if lots of updates are done to database optimistic is useful when the majority of our transactions involve just reading data, not making many updates to the data but is frequently updating pessimistic locking is the way to go

BI data mining applications

perform sophisticated analyses on data; usually involve complex statistical and mathematical processing rely on archived historical data rather than real time data

regression

produces mathematical equations that can be used to predict future events based on past observations

Data Manipulation Language (DML)

provides us with a series of statements that we can use for creating new data, reading existing data, updating existing data, and deleting data or CRUD

dimensional databases

purpose is to intentionally implement redundant data in a non-normalized design, such that we can vastly improve query speed

CRUD

purpose of a database CREATE READ UPDATE DELETE

recovery via Rollback/Rollforward

recovery strategy used by most enterprise relational database systems

lists

redundancy and multiple themes in lists create problems with anomalies like deletion, insertion and update problems to the information. i.e. if you want to update info for someone in a list you'll end up having to update several more fields than if you were using a relational database

three stages of database development

requirements analysis stage component design stage implementation stage

entity

should represent a single theme, topic or business concept

1:N relationship example

similar to 1:1 but difference is it is easy to tell which table should receive the foreign key, it always belongs at the 'many' side of the relationship. The one side of the relationship typically referred to as parent table and many side as child table

DELETE FROM example 2

similar to update example 2 where this will delete rows from the Employee table where the empName attribute begins with 'Da'

BI reporting systems

sort, filter, group and make elementary calculations on operational data, like SUM, AVG or COUNT rely on real time data

WHERE clause

specifies the matching or filtering criteria for the records (rows) that are to be displayed WHERE clause comparisons in screenshot

star schema

the most common data model for a dimensional database a 'fact table' is at the center of our other 3 tables. Called a fact table because a fact is quite literally the intersection of information

B-Tree index

the most common type of index structure used stands for balanced tree structure use pointers and layers of nodes in order to quickly locate desired data comprised of root nodes - less granular intermediate nodes leaf nodes - more granular

denormalization

the opposite of the normalization process we have identified a business need or justification for why two or more tables within our database design should be combined into a single table common reasons for this is more simple design and faster querying, the tradeoff though is that we are introducing the possibility for anomalies, an increase in redundant data since (for example a table with customers and their city, state will require each instance/row to include Los Angeles for all customers in that area, also each customer may enter the city differently; LA vs Los Angeles or CA vs California

reprocessing

the process of recovering a database that has failed i.e. if a database failed and the latest backup was from 2 days ago we'd ask all our salesforce to go back through their records for the past three days and manually add each sale they made into the database this process can be costly (lots of time spent manually updating) and risky (high chance for human error when inputting the data manually) this strategy is best in very small database implementations

noncorrelated subquery

the results from one query being used as input for another query in this subquery the inner query (contained in parenthesis) only needs to run once in order for the database engine to answer the question

correlated subquery

the results from one query being used as input for another query the inner query needs to be run repeatedly in order for the database engine to solve the problem. The reason is the inner query is going to need a value from the outer query in order to do its job 'e' in the example is an alias for the Employee table, means you can use just 'e' in the inner query to refer to the table in outer query

crows foot cardinality symbols

the symbol that is closest to the entity is always the maximum cardinality, while the other is the minimum cardinality the zero symbol is only ever used to represent the minimum cardinality

1:1 recursive relationship SQL example

the table needs to be joined to itself in order to extract info we want. In example we are referencing the same table twice, each time giving the table a different alias and this allows us to join the table to itself

nonunique key

the value may be the same among several rows, so the key will identify a set of rows used for grouping nonunique keys: foreign key

facts represented as 3 dimensional database

this graph represents how dimensional databases got their name

cardinality example

this is an entity relationship diagram entities are represented by rectangles and relationships are represented by lines and the cardinality represented by crows foot symbols

HAVING clause

this statement allows us to filter the categorized results of the GROUP BY clause this example will constrain the set of results to those salespeople who have a total amount of sales that is greater than or equal to 10,000 in whatever unit the sales amount column is measured HAVING clause serves the same purpose as a WHERE clause but for a GROUP BY statement

example of table not in 3NF

this table has a quantity and unit price, which are non-key attributes and they are determining the total price

2NF

to be in second normal form must meet criteria of 1NF and every non-key attribute in the table is fully functionally dependent on the entire primary key. Meaning every non-key attribute must be defined by the entire key, not by only part of the key, there can be no partial dependencies. There cannot be any attributes dependent on the primary key within the table, if there are the table must be broken into additional smaller tables and remove the partial dependencies if not using composite keys can move straight to 3NF from 1NF example shows us creating a second table for course, use courseID as foreign key

business intelligence systems

two kinds: reporting systems data mining applications

cascading deletes and cascading updates

under empFk constraint ON DELETE CASCADE means we want to establish a cascading delete relationship between this table, EmployeeSkill, and the Employee table This is telling the database if we delete an employee from the Employee table then to also delete any references (using empId attribute) to that employee on the EmployeeSkill table ON UPDATE CASCADE works similar but is used for updating tables In this example if the skillId of the Skill table is updated it will 'cascade' and update the EmployeeSkill table as well

neural networks

use training data to learn how to create accurate predictions/estimates

Data Control Language (DCL)

used for creating user accounts, managing permissions

ALTER

used for the changing the structure of a database object for example if we forgot to define a primary key in our table we could use ALTER to define this after the fact In this example we are altering the Employee table we are adding a primary key constraint named empPk and we want the empId attribute to serve as the primary key

Data Definition Language (DDL)

used to define and manage database structures provides us with a series of commands we can use to create database objects and constraints statements include: CREATE - to create database objects, like a table ALTER - to modify the structure and/or characteristics of existing database objects DROP - to delete existing database objects

foreign key

used to establish relationships between tables it is a primary key from one table that is placed into another table. The key is called a foreign key in the table that receives it a nonunique key

SELECT

used to extract info from the database using the framework SELECT FROM WHERE In the example we are requesting the empName attribute from Employee table where empId attribute equals 33, this should return to us the employees name to show values for 2 or more specific columns, you can use a comma separated list of column names, and to show values for all columns in a table use *

database contents

user data metadata indexes and other overhead data (data that tracks the status of database, improve performance etc.) application metadata

N:M recursive relationship SQL example

uses an intersection table/lookup table

weak vs strong entities

weak entity is an entity whose instances cannot exist in the database without the existence of an instance of another entity strong entities can exist in the database independently in the example the customer entity (strong entity) can exist in the database independently but the order entities (weak entity) cannot exist without an existing customer in the database

component design stage

what we learn from the requirements analysis then becomes our data model or entity relationship (E-R) diagram where we'll create data models which are graphical representations of the database solution E-R models consist of 1) entities 2) attributes - Identifiers (keys) and non-key attributes 3) relationships between entities

keyset cursor

when a cursor is opened, rather than fetching the entire result set, the database will fetch all of the primary key values for the rows in the result set and store those key values in a temporary table updates that are made after the cursor was opened are visible when using a keyset cursor, except for a new row that has been added since that rows primary key will not have been added to the cursor

INSERT INTO

when adding a row to an existing table non-numeric data must be in single quotes ' the name of the table follows the INSERT INTO statement then follows a list of comma separated attributes we want to insert into the table VALUES then represents the actual values in the same order that will show in the database

recursive relationships

when an entity has a unary relationship to itself. In the example the managerId is also an employeeId recursive relationships can be used to create hierarchies as shown in example. This example shows that we can implement a hierarchy of any arbitrary level of complexity using this data modeling approach (recursive relationships)

static cursor

when the cursor is requested a snapshot of the results is taken and the complete result set is fetched from the database and stored in a temporary table disadvantage: updates that are made after the cursor was opened are not visible when using a static cursor advantage: simple and consume few resources on the database server


संबंधित स्टडी सेट्स

Neuro Nclex (Multiple Sclerosis, PD, MG , ALS)

View Set

Environmental Pollution Final Exam

View Set

Geology - Sedimentary Rocks and Processes

View Set

Nursing 4530 - Mental Health - Exam 1

View Set

Psych EXAM 2 (9,10, Readings #2,3,4)

View Set