SQL

Ace your homework & exams now with Quizwiz!

SQL

Structured Query Language, used to communicate with databases, founded on the relational database model

outer query

the main query in a subquery

ON

this clause can be used when the common columns in 2 tables use the same column name (instead of USING)

key

this is something that uniquely identifies each individual record (can be one or more attributes, must work in all cases to avoid database anomalies)

*

called a splat; achieves same results as listing all column names

ALL

can be used after set operators to keep duplicates

SELECT

can be used on its own as print

HAVING

clause can use with GROUP BY to filter grouped results.

PostGres

compiles a query by processing anything before a semi-colon

modern data science team

data engineer, data analyst, data scientist

data analysis

1. Identify what you will be looking for in the database What is the desired output of your analysis? What is the base set of data/tables required? Will you need to filter out some records? Will you need to aggregate or summarize data? Use that information to build your database queries. • SELECT ... FROM ... WHERE ... GROUP BY ... Be curious!

entity-relationship model (ER)

A top-down approach to database designBegins big picture (entities and relationships)Then details are added (attributes and constraints)

PostGres keywords syntax

ALL CAPITAL LETTERS

data summarization/visualization

Be sure to choose the correct delivery method for your results. Allow the key takeaway from your analysis to dictate the type of visualization that is chosen. Relationship: Show correlations or connections between factors Comparison: Show how variables perform against the same metric over a period of time Distribution: Show how a series of data points are spread out Composition: Show how individual parts make up a whole Visualization isn't necessary if the answer is a simple number or a list of items

Postgre SQL

Besides aggregate functions this also supports Date/Time functions, String functions, Math functions

SELECT title, release_year

FROM films LIMIT 3; Query the title and release year for the first 3 rows of the films table

SELECT id, title

FROM films OFFSET 55; Query for all the ids and titles from the 56th row to the end.

SELECT *

FROM films OFFSET 665 LIMIT 1; Query for all information in the 666th row

SELECT *

FROM films ORDER BY RANDOM() LIMIT 5; Query 5 rows randomly.

SELECT country, release_year, title

FROM films ORDER BY country, release_year desc, title; Query for country, release year and title from films table, sort by country a -> z, release year largest -> smallest and title a -> z.

SELECT title, duration

FROM films ORDER BY duration Query for title and duration, sort duration from the shortest to the longest.

SELECT id.title

FROM films ORDER BY title ; Query the film table for titles and ids sorted by titles a -> z.

SELECT title, certification

FROM films WHERE certification != 'R'; Query for titles and certifications of films which are not R rated movies

SELECT title, country, release_year

FROM films WHERE country='USA' (release_year>2000 OR release_year<1980); Query for the title, country and year of movies made in the US or the UK, which were released after 2000 or before 1980.

SELECT title, duration

FROM films WHERE duration<60; Query for titles and duration of films shorter than 1 hour.

SELECT title, country

FROM films WHERE language IN ('German', 'Spanish', 'French'); Query for titles and countries of films in German, Spanish and French.

SELECT title, release_year

FROM films WHERE release_year BETWEEN 1960 AND 1969; Query for titles and release year of films in 1960s.

SELECT title

FROM films WHERE title LIKE '%Alexander%Horrible%Bad Day%; I cannot remember the exact title of a movie, which contains the words 'Alexander...Horrible..... Bad Day', write a query to find this movie.

SELECT DISTINCT language

FROM films; Query for all the different languages from the films table

SELECT *

FROM films; Query for everything from the films table

SELECT title, duration

FROM films; Query the title and duration columns from the films table

types of joins

INNER JOIN, OUTER JOIN (LEFT JOIN, RIGHT JOIN, FULL JOIN), CROSS JOIN

1:M relationship

Ideal for relational modeling (what we are trying to do) Should be the norm in any relational database design

OFFSET

If a statement contains both OFFSET and LIMIT, which will be conducted first regardless of their order

data cleaning

If the data set is new or not yet standardized, you'll have to clean it Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Data cleaning may be performed interactively with data wrangling tools, or as batch processing through scripting.

SELECT DISTINCT certification FROM films;

Query for all the unique certifications from the films table

PostGres' version of print() statement

SELECT 'sing quotes only'

FULL JOIN syntax

SELECT columns FROM table1 FULL JOIN table2 ON table1.column = table2.column;

LEFT JOIN syntax

SELECT columns FROM table1 LEFT JOIN table2 ON table1.column = table2.column;

RIGHT JOIN syntax

SELECT columns FROM table1 RIGHT JOIN table2 ON table1.column = table2.column;

BETWEEN...AND.../NOT BETWEEN...AND...

To return all rows, at the specified SELECT columns, where a value in a column falls with in a specific range of values

=, !=, ,>=,<=

To return all rows, at the specified SELECT columns, where a value in a column is =, !=, ,>= or <= a specified value

IN/NOT IN (a1, a2...)

To return all rows, at the specified SELECT columns, where a value in a column is included/excluded from a set of values

IS NULL/NOT IS NULL

To return all rows, at the specified SELECT columns, where a value in a column is or is not NULL

subquery

a query inside the main query; also known as an inner query

Database related syntax

all lowercase letters

M:N relationship

cannot be implemented as such in the relational model; M:N relationships can be changed to 1:M relationships

AS

column name can be modified using

foreign key

created when a primary key from one table is carried over to another table

ON table1.id_x

is the same as USING(id_x)

joining tables

is used to gather related data from distinct tables in one place (a flat table)

LIKE

is used to search for a specified pattern in a column. And the operator uses wildcard characters to define text patterns.

(ASC)

or descending (DESC) order , separated by ' , '"

set caluse syntax

query1 as q1 set operator (ALL) query2 as q2

entities

represent objects or things; Physical things: like students, lecturers, employees, products; Abstract things:like modules, orders, courses, projects they have: Instances: samples or itemsAttributes: characteristics or descriptions

COUNT

returns number of values in specified column

SELECT * from table_x LIMIT m OFFSET n= SELECT * from table_x OFFSET n LIMIT m r

returns rows from the (n+1)th row to the (n+m)th row

MIN

returns smallest value in specified column

STDDEV

returns standard deviation of values in specified

SUM

returns sum of values in specified column

VARIANCE

returns variance of values in specified column

1:1 relationship

should be rare in any relational database design

LIMIT n

similar to the usage of data.head(n) in Python pandas package

GROUP BY

statement is often used with aggregate functions to group the result-set by distinct values in one or more columns.

SELECT...FROM...

statements to retrieve and display data from one or more database tables

general join syntax

table1 join_type table2 [join_condition]

primary key

uniquely defines a single table

Different databases

use different SQL dialects

DISTINCT

use this in front of column names to check all different values, missing value will show as null

OFFSET n

use this when you want n (where n must be a non-negative integer) at the end of the query to ignore the first n rows and return the rest.

JOINS

used to retrieve data from multiple tables

LIMIT n

using this n (where n must be a non-negative integer) at the end of the query to return the first n rows of all results

WHERE or HAVING clause FROM clause SELECT clause

where a subquery can reside

ON or USING

which clauses specify the join condition

ORDER BY col_name

ASC is default ordering option for ORDER BY so ORDER BY col_name ASC is the same as

set clauses

are used to combine or reorganize the results from multiple SELECT statements into a single result

ORDER BY

"is to used to sort results by ASC can be followed by a list of columns with ascending

[^charlist] or[!charlist]

"which wildcard Defines sets and ranges of characters NOT to match"

[charlist]

"which wildcard Defines sets and ranges of characters to match"

_

"which wildcard represents a single character"

%

"which wildcard represents zero, one, or multiple characters"

data analysis task framework

1. requirements gathering 2. data collection 3. data cleaning 4. data analysis 5. data summarization or visualization

data collection

After identifying necessary data points to answer the question, you must check to see if they are readily available. If the data are not available, you may need to have engineers log then extract, transform, and load (ETL) the data into your data warehouse Example Data Sources Transactional Data Log Files 3rd-Party Websites or Services

SELECT FROM

SELECT [what] FROM [where]

CROSS JOIN syntax

SELECT columns FROM table1 CROSS JOIN table2

LIKE/NOT LIKE

To return all rows, at the specified SELECT columns, where a string value in a column matches a specified pattern

set operators

UNION(ALL), INTERSECT(ALL), EXCEPT(ALL)

requirements gathering

Understanding the "why" behind the problem Understand the underlying business process Define the decisions or actions that are being taken Understand the business problem that you are trying to solve Define the specific information that is required to make the decisions Don't just answer the immediate question being asked to you. Be sure to get the full context behind any incoming data requests

Throw an error

if something is after the semi-colon what will happen

candidate key

is an attribute or set of attributes that could be used to uniquely identify each observation in a table

Semi-colon

is not required in Postgres, but is in other dialects, thus it is good practice to use it

INNER JOIN

most common type of join and is the default type of join; key word inner can be used optionally

Postgres dialect

not case sensitive and therefore as long as logic is correct, a query will compile

relationship types

one to many (1:M); many to many (M:N or M:M), one to one (1:1)

attribute

property of an entity

AVG

returns average of values in specified column

MAX

returns largest value in specified column


Related study sets

MGMT 3860 TRAINING AND ONBOARDING

View Set

PrepU Review Questions, Chapter 14

View Set

Scientific Foundations of Professional Nursing Practice Exam 1

View Set

Chapter 63: Concepts of Care for Patients with Acute Kidney Injury and Chronic Kidney Disease

View Set

Maternity Chap. 4: Common Gynecologic Issues

View Set