Database Theory

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

* Six Clauses that form an SQL statement

1) SELECT 2) FROM 3) WHERE 4) GROUP BY 5) HAVING 6) ORDER BY

* DCL

(Data Control Language) Used to create controls (security definitions, access privileges, etc) for access to the database system. GRANT, REVOKE, DENY

* DDL

(Data Definition Language) Contains all commands for creating, updating, and deleting objects. CREATE, DROP, ALTER

* DML

(Data Manipulation Language) Contains all commands for selecting/ querying, merging, joining, and appending, union data together. INSERT, UPDATE, DELETE, SELECT

ETL

(Extracting, Transforming, Loading) The process of extracting data from a transaction database, transforming it, and loading it into an analytical database.

MOLAP

(Multidimensional OLAP (Online Analytical Processing) A source database that is dedicated to the compilation of answers or events.

* OLAP

(Online Analytical Processing) Relational databases that are used for data analytics. Data is not real-time but historic. Created by copying data from an OLTP (Online Transnational Processing) database. The focus of this database type is data retrieval speed.

* OLTP

(Online Transactional Processing) Relational databases focus on real-time data entry. Accuracy of the data and the speed of inserting, updating, and deleting records is the primary concern.

RDBMS

(Relational Database Management System)

ROLAP

(Relational OLAP (Online Analytical Procesing) - A database that stores extra

SSMS

(SQL Server Management Studio) A Microsoft GUI to help manage SQL databases.

SQL

(Structured Query Language) A standardized language for use with ANSI and ISO (International Standards Office) relational databases. Stable and predictable. Will work everywhere.

TPA

(Transaction Path Analysis) A planning method to determine the entity/ attribute assignments.

GROUP BY

<group by list> Optional; Creates groups for output rows according to unique combination of values specified in the GROUP BY clause. Calculates a summary value for aggregate functions in subsequent phases . Detail rows are only available to aggregate functions after the GROUP BY clause is processed. HAVING, SELECT, and ORDER BY must return a single value per group (since they're executed downstream of the GROUP BY clause). Creates groups. Order matters (think of a nested sort in Excel).

ORDER BY

<order by list> Optional; Indicates the sort order of the columns that are going to be returned.

HAVING

<search condition> Optional; Filters the groups created in the GROUP BY clause. Filtering that operates on groups of rows. Searches against the groups that were created by the GROUP BY clause. Must use an aggregate function in the HAVING clause because we're only working with groups.

WHERE

<search condition> Optional; Filters the records based on what you want to return; Filtering that operates on rows.

SELECT

<select list> Required; A list of columns/attributes you want to return; Indicates columns to preserve

FROM

<table source> Required; A list of the entities/tables you want to pull the data from; Gathers rows (cartesian product)

Junction table/ Intermediary table

A 2-column table that houses the correlation of one table as the primary key and the

Statement

A collection of clauses that execute a unit of work.

Clause

A component of an SQL statement.

* Primary Key

A designation assigned to a unique attribute in the table. Uniquely identifies every row in a table. Must be unique, must be non-nullable, only get one per table. Used to identify a column that uniquely identifies each record

Trigger

A pre-defined set of instructions that execute on a certain event.

Entity Relationship Modeling

A process for showing the entities and relationships required for an application via a diagram

Self-Referencing relationship

A relationship that can exist within a table wherein a foreign key references a primary key in the same table. Example 1: An employee has a manager and that manager is an employee. Example 2: A customer refers another customer on the Customer table.

Composite Keys

A table that has more than 1 primary key. Cannot be created inline (outline definition).

Indexes

A way of finding data within a table. Vital when querying.

Optional relationship

Allows NULLs in foreign key columns.

Fourth normal form

Boyce-Codd Normal Form - Each attribute must represent a fact about the primary key (expansion of 2nd normal form). It's a composite (concatenated) primary key on a unique list of possibilities. Allows for parallel processing.

Clustered index

CREATE CLUSTERED INDEX ClusteredIndexName ON schema.TableName(col1, col2, col3); An organization of data within the table (physical sort of records in ascending order). Defines the logical record order in the table. Row-by-row indexing. It is the table itself in ascending order. One clustered index per table. A clustered index on a column causes a table to be stored with rows logically organized by that column's values (commonly defaults to primary key, but can be any unique candidate key). Can only look up on 1 column.

Non-Clustered indexes

CREATE INDEX NonClusteredIndexName ON schema.TableName(col1, col2, col3); Stores pointers to the clustered indexes (typically, non unique). This is a separate structure from the table that is comprised of a few columns in the table that points back to a clustered index. Much like a VLookup (can have more than 1 column to lookup). If no clustered index exists, then it points to where in the heap to search for the target data.

* Surrogate keys / Synthetic identifiers

Candidate keys that can be created when there are no other suitable candidate keys (eg VIN, SSN, ISBN, CUSIP).

XML Schema

DECLARE @MyXMLVariable XML(schema.TableName) A way of describing the structure of an XML document. XML is self-describing.

Business Phase

Data modeling phase 1. Determine the purpose and requirements of the database (business focus).

Users Phase

Data modeling phase 2. Investigate performance requirements and usability issues (user focus).

Modeling Phase

Data modeling phase 3. Document the logical and physical database implementation plan as one.

* Relationship

Defines the common values between two tables (primary key and foreign key pair). A way of identifying the commonality between two attributes in 2 tables by selecting a single column in each table. There is no physical component to a relationship. A verb (attend, has, meets, owns)

XML

Describes data structure and values. xml data type - Stores XML in variables and columns XQuery - Node-tree query language for XML FOR XML clause - Generates XML from relational data. Allows you to get relational data out of a SQL database and into XML form (for use with an app that interacts our database info in XML document format) Extensible Markup Language - Language to define what a document is. Putting content online and describing that content. Describes the hierarchical relationship and categorization of the content. Document storage (NoSQL databases) is primarily related to XML storage.

Xquery

Designed to query XML. Supported by SQL Server 2016

Required relationship

Does not allow NULLs in foreign key columns.

Constraints

Enforce rules about what the data can do (eg. unique constraint, primary key constraint, foreign key constraint)

* Referential Integrity

Ensures that data remains meaningful. Foreign key constraints and triggers can enforce referential integrity in SQL Server 2016. Table relationships must always be consistent. In other words, any foreign key field must agree with the primary key that is referenced by the foreign key. The restriction is on the foreign key, not the primary key (foreign key cannot be NULL or absent). Ensures that key values are consistent across tables. Foreign Key and Primary Key constraints

ERA

Entity Row Attribute

* One-to-one

For a column with a lot of NULL values, remove the sparse column and make it it's own table to cut down on the number of NULLs in the database. For every one record in Customers, there is only one record in Cust Details. These are extremely rare in transactional databases, but they are non-existent in analytical databases. If the relationship is optional, you should create the foreign key in the side of the relationship that allows NULL values.

Fifth normal form

Further decompose tables so that every record is unique and non-dependent. For big data.

Super-type relationship

Generalization; A way to aggregate records based on a specific attribute of that record. Good for creating AND logic to traverse up a tree. Only applicable if the data already exists in the database. Not the same as querying. Improves performance and is akin of de-normalizing. An analytical process for bringing the sub-types back together for reporting.

Dependencies

How the attributes within the entity are related to each other. One goal of normalization is to ensure non-key values are dependent on the primary key.

--

How to comment out in code in SQL Server 2016

* Logical

How we describe what's inside the container.

HOLAP

Hybrid OLAP - A balance between the MOLAP and ROLAP

Third normal form

Identify fields that don't directly describe the entity (remove transitive dependencies and put them in a separate table and join them with an intermediary table). Also, no calculations.

Second normal form

Identify fields that should be grouped into separate tables (a person, place, thing, or event). All non-key columns must relate to the entire primary key (remove composite keys and put them in a separate table and join them with an intermediary table).

* Foreign Key

Indicates that a column that contains a value that is defined in another table (go look it up). It has to be a unique value, but it doesn't have to be the primary key from the other table. Does not have to be unique in the foreign table.

Processed Data

Information. Processing is how data becomes information. Has business content and meaning applied.

I/O Ops

Input-Output operations; how many times the information is written to or retrieved from the physical disk.

* Physical Column/ Field

Logical Attribute - One aspect of the entity. Customer name, product category, order date, or salary amount. Does not imply ordinal positioning.

* Physical Value

Logical Data - The entry for a specific attribute of a single record (where the column and row meet). A single customer name, a single product, a single order, a single employee.

* Physical Table

Logical Entity - What the data within the table describes. A person, place, thing, or event.

* Physical Row

Logical Record - A single instance of the entity (e.g. a single customer, product, event).

MDX, DAX, DMX

MDX (Multidimesional Language Expressions), DAX (Data Analysis Expression), DMX (Data Mining Expressions) Languages used to write functionality within BI (Business Intelligence) platforms.

Synthetic ID

Made up numbers that generally serve as a primary key (must unique).

MPP

Massively Parallell Processing

Transact-SQL/ TSQL

Microsoft's flavor of SQL

Hadoop

Not a database. A big data technology that offers massive processing power for processing very large data loads.

OLAP / SSAS Cube

OLAP (Online Analytical Processing) / SSAS (SQL Server Analysis Services) Multi-dimensional cube. Optimized for data analysis. Alternative way of storing data for analytics. Kind of like a pivot table (stores aggregates and raw at the same time).

Data Warehouse

OLAP (Online Analytical Processing) database. Historical facing. Populated by the ETL (Extracting, Transforming, Loading) process. Read-only databases (users cannot update, insert, or delete).

* No action

One of the 5 ways to deal with primary key value changes. Action will fail and result in a referential integrity violation error.

* ON UPDATE Cascade Updates

One of the 5 ways to deal with primary key value changes. Applies the new value in the primary key and uses it to update all foreign key references in other tables. The only time the primary table is aware of the foreign tables. This doesn't work for triggers because it only works when there's an explicit relationship.

* Cascade Deletes

One of the 5 ways to deal with primary key value changes. Deletes the primary key value in the primary key and deletes all referenced rows in the foreign key referenced tables. All rows associated with the deletion would be gone from the database. This doesn't work for triggers because it only works when there's an explicit relationship. DANGER!

* ON DELETE Set NULL

One of the 5 ways to deal with primary key value changes. Updates all rows associated with the primary key in the foreign key referenced tables to NULL values. Slightly better than cascade deleting.

* ON DELETE Set Default

One of the 5 ways to deal with primary key value changes. Updates all rows associated with the primary key in the foreign key referenced tables to whatever default value the user selects.

* One-to-many

One side is the primary key, and the other side is the foreign key. The constraint to maintain the referential integrity is the responsibility of the foreign key. Many records in Customers, relate to many records in Products. Requires the Orders table to define.

PL/SQL

Oracle's flavor of SQL

* Logical Attribute

Physical Column/ Field - One aspect of the entity. Customer name, product category, order date, or salary amount. Does not imply ordinal positioning.

* Logical Record

Physical Row - A single instance of the entity (e.g. a single customer, product, event)

* Logical Entity

Physical Table - What the data within the table describes. A person, place, thing, or event.

* Logical Data

Physical Value - The entry for a specific attribute of a single record (where the column and row meet). A single customer name, a single product, a single order, a single employee.

R

Popular statistical programming language

4 FOR XML clauses

RAW AUTO EXPLICIT PATH

Data

Raw and unprocessed. Words, numbers, dates with no context. Lacks business content and meaning.

First normal form

Remove redundant fields or comma separated values (no repeating attributes, no cellular lists, no repeating columns eg phone1, phone2, phone3... phoneN)

SELECT DISTINCT <column list>

Returns a set of values that are unique (no dupes) to the identified column(s). This can help identify a candidate key (when SELECT DISTINCT <column list>.SUM == SELECT *.SUM)

RAW mode

SELECT col1, col2 FROM schema.TableName WHERE col1 = 122 OR col1 = 119 FOR XML RAW; Produces: <row col1="122" col2 = "value1"> <row col1="119" col2 = "value2"> SELECT col1, col2 FROM schema.TableName WHERE col1 = 122 OR col1 = 119 FOR XML RAW, ELEMENTS; Produces: <row> <col1> 122 </col1> <col2> "value1" </col12> </row> <row> <col1> 119 </col1> <col2> "value2" </col12> </row> Generates a single <row> element per row in the rowset that is returned by the select statement.

AUTO mode

SELECT col1, col2 FROM schema.TableName WHERE col1 = 122 OR col1 = 119 ORDER BY col2 FOR XML AUTO; Produces <Table 1> <Table 2> <Table 3 result> <Table 3 result> <Table 3 result> SELECT col1, col2 FROM schema.TableName WHERE col1 = 122 OR col1 = 119 ORDER BY col2 FOR XML AUTO, ELEMENTS; Produces Every output as an attribute. For joined tables, the first table referenced becomes the outer group. Tables on subsequent joins become nested inner groups until you get to the innermost table being joined which becomes the innermost child element Generates query results as nested XML elements. This does not provide much control over the shape of the XML generated from a query result.

PolyBase

SQL Server 2016 includes PolyBase, which integrates SQL Server with Hadoop.

Data models

The entities are no longer store houses of events vs. objects (de-normalized). May not have a primary key.

Sub-type relationship

Specialization; A way to split records based on a specific attribute of that record. Good for creating OR logic (as opposed to the avoidance of NULL values) to traverse down a tree. Only applicable if the data already exists in the database. Not the same as querying. Improves performance and is in the area of 5th or 6th normal form. A transactional process for getting rid of NULL values.

Sixth normal form

Sub-typing. Break out categories and sub-categories (creating new tables for items that are commonly queried together). For big data.

* Physical

The container.

Normal Form

The normal form is in the state a table is in once the rule defined by the form is applied. Sequential rules apply (in order to have 2nd normal form, you need 1st normal form).

* Candidate key

The potential primary keys for a table (they COULD be a primary key). Use DISTINCT to find candidate keys

De-normalization

The process of breaking the rules of normal form. A common practice in data warehouse design. From a normalized database, selectively de-normalize some of the tables in it.

* Normalization

The process of reducing redundant data in a database. tHE PROCESS OF USING STANDARDIZED rules to build a database design that will remove redundancies, improve stability, eliminate anomolies, and improve data integrity.

EXPLICIT mode

Ugly and horrible because you have to EXPLICITLY define what the elements and attributes are. Query will look obnoxious.

Column-Oriented Databases

Used to optimize the storage and query response for data. Great when columns store a high number of duplicate values (for better aggregation). aka Column-store indexes. It points you to the index of an attribute instead of a record.

M

What PowerBI and PowerQuery use to interface with SQL queries (Microsoft unique language)

PATH mode

When combined with the nested FOR XML query capability provides the flexibility of the EXPLICIT mode in a simpler manner. Column name or column aliases are treated as XPath expression. These expressions indicate how the values are being mapped to XML.

Intelligence

You analyze information to obtain intelligence. Organizations can use intelligence to understand past performance to predict future trends.

* Many-to-many

You'll be jumping though an intermediary table for this one. For every one record in Customers, there can be many records in Orders


Kaugnay na mga set ng pag-aaral

Interpersonal Relationships chapter 10

View Set

Fundamentals of Nursing Values, Ethics, and Advocacy

View Set

DRAFTING THE DECLARATION OF INDEPENDENCE

View Set

SBM Chapter 13 - Planning for the Harvest

View Set

Chapters 14 & 16 Adaptive Quizzing

View Set

Exam 2: Mastering Biology Questions

View Set