week 2 SQL
What is a database?
A database is an organized collection of data, stored in an organized format. It allows us to input, manage, organize, and retrieve data quickly. Traditionally, databases are organized into tables with records as rows, fields as columns, and data as the intersection between rows and columns.
SELF JOIN
A join in which a table is joined with itself. It's a query from a single table that duplicates the table and joins on a certain criteria.
What is database normalization?
Database normalization is a design pattern for databases that aims to reduce both duplicate data and redundancy to ensure referential integrity. There are 4 levels of normalization
Select query with aggregate functions over groups
GROUP BY groups rows that have the same value in the column specified SELECT AGG_FUNC(column_or_expression) AS aggregate_description, ... FROM mytable WHERE constraint_expression GROUP BY column;
Select query with HAVING constraint
HAVING applies constraints to grouped rows SELECT group_by_column, AGG_FUNC(column_expression) AS aggregate_result_alias, ... FROM mytable WHERE condition GROUP BY column HAVING group_condition;
Select query with limited rows
LIMIT reduces the number of rows to return OFFSET specifies where to begin counting the number of rows from SELECT column, another_column, ... FROM mytable WHERE condition(s) ORDER BY column ASC/DESC LIMIT num_limit OFFSET num_offset;
Is SQL a programming language?
No, SQL itself is a query language, not a programming language. However, SQL standard allows for procedural extensions such as PL/SQL, which is a procedural programming language.
What are the differences between ORDER BY and GROUP BY?
ORDER BY orders your table based on a single column in ascending or descending order GROUP BY determines how groups will be grouped together (and has to be used with HAVING)
What is PL/SQL?
Procedural Language SQL PL/SQL includes procedural language elements such as conditions and loops. It allows declaration of constants and variables, procedures and functions, types and variables of those types, and triggers. It provides some tools that we can use with SQL like: - functions - stored procedures - sequences - triggers
CROSS JOIN
Returns a result set which is the number of rows in the first table multiplied by the number of rows in the second table
FULL JOIN
Returns all records from both tables.
LEFT JOIN
Returns all records from the left table, and the matched records from the right table
RIGHT JOIN
Returns all records from the right table, and the matched records from the left table
INNER JOIN
Returns records that have matching values in both tables
Complete SELECT query
SELECT DISTINCT column, AGG_FUNC(column_or_expression), ... FROM mytable JOIN another_table ON mytable.column = another_table.column WHERE constraint_expression GROUP BY column HAVING constraint_expression ORDER BY column ASC/DESC LIMIT count OFFSET COUNT;
What is SQL?
SQL stands for Structured Query Language and is the standard language for relational database systems. It's a language we use to add, update, or delete information in the database and to query the database for various information quickly. There are several SQL dialects developed by different vendors. Examples include Oracle SQL (which is the dialect we learned), Postgres, MySQL, Microsoft SQL Server. There are some syntax difference between dialects, but the dialects follow an ANSI standard.
TCL
Transaction Control Language Statements that manage the changes made on the database, such as finalizing or reverting changes Commands: COMMIT - persist data, ROLLBACK - revert data, SAVEPOINT - identifies a point in the transaction where you can rollback to, SET_TRANSACTION - set isolation level
Read uncommitted isolation level
Transaction may read uncommitted changes made by other transactions allows dirty reads, nonrepeatable reads, and phantom reads to occur
Updating rows
UPDATE mytable SET column = value_or_expr, other_column = another_value_or_expr, ... WHERE condition;
Read committed isolation level
default isolation level. any data read is committed at the moment it is read. The transaction holds a read or write lock on the current row, thus preventing other transactions from reading, updating or deleting it prevents dirty reads but allows non-repeatable reads and phantom reads
Dropping tables
remove an entire table including all of its data and metadata DROP TABLE mytable;
Repeatable reads isolation level
transaction holds read locks on all rows it references and writes locks on all rows it inserts, updates, or deletes prevents dirty reads and non-repeatable reads but allows phantom reads
What is multiplicity?
Another name for cardinality, it describes the numerical relationship between two tables
DCL
Data Control Language Statements that control security and regulates access to data through user privileges and permissions Commands: GRANT, REVOKE
DDL
Data Definition Language Statements that create or modify database objects, such as tables, indexes, and users Commands: CREATE, ALTER, DROP (removes table structure), TRUNCATE (removes data in table)
DML
Data Manipulation Language Statements that manipulate data within a database. Manipulations include CRUD operations (create, read, update, delete) Commands: INSERT, SELECT, UPDATE, DELETE
Serializable isolation level
Serializable execution is an execution of operations where no other transaction can modify data that has been read by the current transaction until the current transaction completes. Does not allow any of the three reads. Nothing is happening concurrently. maximum security
DQL
Data Query Language Statement that retrieves data from a database Command: SELECT
What is a DAO?
(Data Access Objects) is an object that provides an abstract interface to some type of database or other persistence mechanism. By mapping application calls to the persistence layer, the DAO provides some specific data operations without exposing details of the database.
Query order of execution
1. FROM and JOIN 2. WHERE 3. GROUP BY 4. HAVING 5. SELECT 6. DISTINCT 7. ORDER BY 8. LIMIT and OFFSET
Referential Integrity
A relational database concept that states that table relationships must be consistent. In other words, any foreign key field must agree with the primary key that is referenced by the foreign key. If we violate referential integrity (e.g. deleting records from the primary key table), we end up with a orphan record (a foreign key pointing to nothing)
What is RDBMS?
A relational database management system is a database management system that upholds specified relationships between tables. It includes functions that maintain security, accuracy, integrity, and consistency of the data.
What is the BETWEEN keyword?
BETWEEN selects values within a given range inclusively
Joins vs Set Operations
Both are used to combine data from two or more tables, but they differ in how the data is combined. JOIN combines data horizontally, by adding a new set of rows adjacent to the existing set. The horizontal new set of rows have same number of rows but can have different number of columns. UNION combines data vertically, by adding a new set of rows just below to the existing set. The vertical new set of rows have same number of columns but can have different number of rows.
What is a candidate key?
Columns that are potential candidates for primary keys, meaning that are not null and unique
What are constraints in SQL?
Constraints are the rules enforced on the data columns of a table. The 6 kinds are: 1. Primary Key: data in this column is unique, not null, and uniquely identifies all records in the table. A primary key of more than 1 column is called a composite key. 2. Foreign Key: any column in a table referencing a primary key in another table 3. Unique: no duplicate values (except NULL, which you can have multiple of) 4. Not Null: makes sure a column does not contain null values 5. Default: will default the column to some specified value of your choosing 6. Check: adds an extra condition on inserted data
Deleting rows
DELETE FROM mytable WHERE condition;
Select query with unique results
DISTINCT blindly removes duplicate rows SELECT DISTINCT column, another_column, ... FROM mytable WHERE condition(s);
What is the difference between IN and EXISTS?
IN is used to check if a column exists in a collection of items. That collection can either be explicitly listed or it can be a nested query, which is executed first and returns a list that is used by the outer query. Used when the outer query is larger than the inner query. EXISTS is used to check whether the result of a nested query is nested. It returns true if a subquery returns at least one row. Used when the outer query is smaller than the inner query.
Inserting new data
INSERT INTO mytable VALUES (value_or_expr, another_value_or_expr, ...), (value_or_expr_2, another_value_or_expr_2, ...), ...; INSERT INTO mytable (column, another_column, ...) VALUES (value_or_expr, another_value_or_expr, ...), (value_or_expr_2, another_value_or_expr_2, ...), ...;
Implicit cursors vs explicit cursors
Implicit Cursor - cursor created "automatically" for you by Oracle when you execute a query. It is simpler to code. Explicit Cursor - Cursor you create yourself. It takes more code, but gives more control - for example, you can just open-fetch-close if you only want the first record and don't care if there are others.
Non-repeatable read
The user execute a query twice and finds changes due to committed transactions
What is a stored procedure?
a named transaction that can be invoked when called like a function
What is a cursor
a pointer to a ResultSet. used to iterate through queries or tables. When we want to pass entire queries, we need to use cursors. a normal cursor must be created and die within the scope of the procedure that it was created in. SYS_REFCURSOR is allowed to be passed outside of the scope of the procedure that it is in
Transactions in SQL
an individual operation consisting of multiple statements performed against a database Properties: 1. Atomicity: "all or nothing" - the operations either all occur successfully or not at all. Otherwise transactions will be aborted and changes will be rolled back. ex. transferring money 2. Consistency: every transaction should uphold referential integrity 3. Isolation: transactions should be independent of each other. No transaction should affect the existence of another transaction. 4. Durability: No persistent data should be lost in the event of a crash. Transaction should commit successful actions so that they are immediately persisted in case of system failure
What is a sequence?
an object that will maintain a counter for you
Case sensitive exact string inequality comparison
col_name != "abcd" col_name <> "abcd"
Case sensitive exact string comparison
col_name = "abc"
String exists in a list
col_name IN ("A", "B", "C")
Used anywhere in a string to match a sequence of zero or more characters (only with LIKE or NOT LIKE)
col_name LIKE "%AT%" (matches "AT", "ATTIC", "CAT" or even "BATS")
Case insensitive exact string comparison
col_name LIKE "ABC"
Used anywhere in a string to match a single character (only with LIKE or NOT LIKE)
col_name LIKE "AN_" (matches "AND", but not "AN")
String does not exist in a list
col_name NOT IN ("D", "E", "F")
Case insensitive exact string inequality comparison
col_name NOT LIKE "ABCD"
0 Normal Form
no normalization
What is a trigger?
on object that we can create that waits for actions to occur on a specific table that the trigger was created for. can be coded to react to most CRUD operations
What are the 6 components/interfaces of JDBC?
1) Driver - Interface for providing the driver used for interacting with a database. 2) Connection - The connection to the database. 3) Statement - The execution on the database 4) PreparedStatement - A sub interface of the Statement interface. PreparedStatements are pre-compiled and hence their execution is much faster than that of Statements. 5) Result set - The results from a query 6) SQL Exception - Exception representation of SQL errors.
Set Operations: Unions, Intersections & Minus
1) Union - compares 2 query results together and joins them together as one result. It omits duplicates. 2) Union All - Does the same as union, but allows duplicates .3) Intersect - Shows only records found in both table results. 4) Minus - Subtracts records from left query, that are in the right query. SELECT column, another_column FROM mytable UNION / UNION ALL / INTERSECT / EXCEPT SELECT other_column, yet_another_column FROM another_table ORDER BY column DESC LIMIT n;
1st Normal Form
1) all data must be atomic (broken down into the smallest possible unit) 2) should have a unique identifier (primary key)
2nd Normal Form
1) all data must be atomic (broken down into the smallest possible unit) 2) should have a unique identifier (primary key) 3) no partial dependencies (all values must be identified by a single column - no composite keys)
3rd Normal Form
1) all data must be atomic (broken down into the smallest possible unit) 2) should have a unique identifier (primary key) 3) no partial dependencies (all values must be identified by a single column - no composite keys) 4) no transitive dependencies (no column is dependent on a column that is not the primary key -- another column that is unique and not null -- no candidate keys)
What is a composite key?
A primary key that consists of two columns
What are scalar functions?
A scalar function allows you to perform different calculations on a single row of data and returns a single result. Strings: UPPER LOWER LENGTH Numeric: ROUND Conversions: TO_CHAR TO_NUMBER
What is a View?
A searchable object in a database that is defined by a query. Though a view doesn't store data, some refer to a view as "virtual tables," you can query a view like you can a table (Changing something inside a view WILL affect the the actual tables!)
What is a nested query?
A subquery, also known as a nested query or subselect, is a SELECT query embedded within the WHERE or HAVING clause of another SQL query. The data returned by the subquery is used in the main query as a condition to further restrict the data to be retrieved.
Altering table
ALTER TABLE mytable ADD column DataType OptionalTableConstraint DEFAULT default_value; ALTER TABLE mytable DROP column_to_be_deleted; ALTER TABLE mytable RENAME TO new_table_name;
What is CASCADE CONSTRAINTS?
Added to a DROP statement to drop any constraints in other tables that reference to the currently dropped table
What is the difference between an Aggregate Function and a Scalar Function?
Aggregate functions operate on a single column of data and return a single result Scalar functions operate on each individual record in a query and returns a single result
What are some conventions in SQL?
Although SQL is case insensitive, as a standard, we capitalize keywords such as SELECT, DELETE, and UPDATE to differentiate between keywords and table or field names.
What are aggregate functions?
An aggregate function performs a calculation on a set of values and returns a single value. SELECT AGG_FUNC(column_or_expression) AS aggregate_description, ... FROM mytable WHERE constraint_expression; COUNT(*): counts the number of rows COUNT(column): count the number of rows in the group with non-NULL values MIN(column) MAX(column) AVG(column) SUM(column) FIRST LAST
What is cardinality in SQL?
Cardinality describes the numerical relationship between two tables. There are 3 kinds of cardinality: one-to-one 1 : 1 - individual records are associated with only 1 other record. Primary key is linked to unique foreign key one-to-many or many-to-one 1 : n or n : 1 - individual records in one table are associated with many records in the other (one-war). Primary key is linked to non-unique foreign key many-to-many n : n - records in both tables are associate with many records in the other (two-way). Best represented using a join table
What is a Schema?
In SQL, the database schema is what describes the structure of each table, and the datatypes that each column of the table can contain.
What are indexes?
Indexes are used to speed up the performance of queries. It does this by reducing the number of database data pages that have to be visited/scanned. Pros: In larger databases, lookup times start to falter. Bring in joins, or nested selects and you really start to see the lag. So using an index will immensely speed it up. Cons: By using indexes, it now takes longer to delete or add new records, since it has to reorder the index table every time.
What is the difference between inner join and intersect?
Inner Join joins both the tables together based on a matching column of the user's choice. It will not omit duplicates, but will omit any data that is unique to either table individually. Intersect takes two queries and combine them, showing only records, in their entirety, that were present in both queries, while still omitting duplicate data. In other words, with joins we are comparing column data to one another, with set operations, we are comparing entire records with each other.
Dependency Injection Design Pattern
It allows the creation of dependent objects outside of a class and provides those objects to a class through different ways We can design an interface where the implementations can be subbed out without affecting the behavior The specific implementation is injected into the program at runtime
What is JDBC?
Java Database Connectivity API provides data access to other data sources, such as relational databases. The java.sql package contains classes and interfaces for JDBC API. JDBC interfaces access the connected database through a JDBC driver.
JavaBean
JavaBeans are classes that encapsulate many objects into a single object (the bean). It is a java class that should follow following conventions: 1. Must implement Serializable. 2. It should have a public no-arg constructor. 3. must be private with public getters and setter methods.
Select query with expression aliases
SELECT col_expression AS expr_description, ... FROM mytable;
Select query with LEFT/RIGHT/FULL JOINs on multiple tables
SELECT column, another_column, ... FROM mytable INNER/LEFT/RIGHT/FULL JOIN another_table ON mytable.id = another_table.matching_id WHERE condition(s) ORDER BY column, ... ASC/DESC LIMIT num_limit OFFSET num_offset;
Select queries with constraints
SELECT column, another_column, ... FROM mytable WHERE condition AND/OR another_condition AND/OR ... ;
Select query with ordered results
SELECT column, another_column, ... FROM mytable WHERE condition(s) ORDER BY column ASC/DESC;
What is SQL injection?
SQL Injection is a common security vulnerability that occurs when you ask a user for input and, instead of the requested input, the user enters an SQL statement that you will unknowingly run on your database.
What are the different statements of JDBC?
Statement - Used for general-purpose access to your database. Useful when you are using STATIC SQL statements at runtime. The Statement interface CANNOT accept parameters. PreparedStatement - Used when you plan to use the SQL statements many times. The PreparedStatement interface accepts input parameters at runtime. CallableStatement - Used when you want to access the database stored procedures. The CallableStatement interface can also accept runtime input parameters.
Stored procedure vs user defined functions
Stored Procedure - Does NOT have to return anything. - Can have as many IN/OUT parameters as it wants. - Can alter the database using most DML statements. - Can NOT be call mid query (inline in a query) - Can call other stored procedures within it. -A stored procedure can call functions within it. User Defined Functions - MUST return ONE and only ONE resource. - CAN use OUT parameters, but this is highly frowned upon. - Can NOT perform DDL/DML - CAN be called inline in a query. - CAN call other functions - can NOT call stored procedures.
What does the LIKE keyword do?
The LIKE keyword is used in a WHERE clause to search for a specified pattern in a column. There are two wildcards used in conjunction with the LIKE operator: % - The percent sign represents zero, one, or multiple characters _ - The underscore represents a single character
dirty read
The user reads a changed record that has not been committed to the database.
What are the differences between WHERE and HAVING?
WHERE is used on non-aggregate data. It checks each individual record to see if it matches the WHERE condition HAVING is used on aggregate data (or a grouping of data) to check a condition
Phantom Read
When one user executes a query twice and it gets a different number of rows in the result set from a committed transaction