SQL Review

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

" VS '

"group" appears in quotations above because GROUP is actually the name of a function in SQL. The double quotes (as opposed to single: ') are a way of indicating that you are referring to the column name "group", not the SQL function. In general, putting double quotes around a word or phrase will indicate that you are referring to that column name.

Subqueries

(also known as inner queries or nested queries) are a tool for performing operations in multiple steps. Subqueries can be used in several places within a query, but it's easiest to start with the FROM statement. Here's an example of a basic subquery: SELECT sub.* FROM ( SELECT * FROM tutorial.sf_crime_incidents_2014_01 WHERE day_of_week = 'Friday' ) sub WHERE sub.resolution = 'NONE'. If you were to run this on its own, it would produce a result set like any other query. It might sound like a no-brainer, but it's important: your inner query must actually run on its own, as the database will treat it as an independent query. Subqueries are required to have names, which are added after parentheses the same way you would add an alias to a normal table. In this case, we've used the name "sub."

AVG Nuances

1) Numerical columns only 2) Does not account for null values 3) Must change nulls to zeroes

Not equal to

<> OR !=

Field

A column in a database table.

Table

A database view of information arranged in a grid of rows and columns

Foreign Key

A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables

Record

A row in a database table.

window function

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

Aggregate vs. simple arithmetic

Aggregate- Vertical only Simple Arithmetic- Horizontal only

IS NULL

Allows you to select rows that contain no data in a given column. Some tables contain null values—cells with no data in them at all. SELECT * FROM tutorial.billboard_top_100_year_end WHERE artist IS NULL

NOT

Allows you to select rows that do not match a certain condition. IS NOT NULL. Commonly used with Like.

GROUP BY

Allows you to separate data into groups, which can be aggregated independently of one another. You can group by multiple columns, but you have to separate column names with commas—just as with ORDER BY. Limit can change result.

(||)

Alternatively, you can use two pipe characters (||) to perform the same concatenation: SELECT incidnt_num, day_of_week, LEFT(date, 10) AS cleaned_date, day_of_week || ', ' || LEFT(date, 10) AS day_and_date FROM tutorial.sf_crime_incidents_2014_01

Rules for appending data

Both tables must have the same number of columns The columns must have the same data types in the same order as the first table

SUM Nuances

Can only use on numerical values. Sum up values in column. No need to worry about nulls. Sum treats nulls as 0.

Count

Can use on non-numerical columns & counts non-null rows. Recommend renaming- Select count(column) as XYZ. When using spaces, use a double quote.

MIN/MAX

Can use on non-numerical values Select MIN(column) AS min_column

5 aggregate functions

Count, sum, min, max, avg across the column

WHERE

Filters row that apply to the column filter. When using SQL, entire rows of data are preserved together. If you write a WHERE clause that filters based on values in one column, you'll limit the results in all columns to rows that satisfy the condition. The idea is that each row is one data point or observation, and all the information contained in that row belongs together.

How to simplify ORDER BY?

Finally, you can make your life a little easier by substituting numbers for column names in the ORDER BY clause. The numbers will correspond to the order in which you list columns in the SELECT clause. For example, the following query is exactly equivalent to the previous query: SELECT * FROM tutorial.billboard_top_100_year_end WHERE year_rank <= 3 ORDER BY 2, 1 DESC

Schema

Houses multiple tables

Why dates are formatted year-first?

If you live in the United States, you're probably used to seeing dates formatted as MM-DD-YYYY or a similar, month-first format. It's an odd convention compared to the rest of the world's standards, but it's not necessarily any worse than DD-MM-YYYY. The problem with both of these formats is that when they are stored as strings, they don't sort in chronological order. You might think that converting these values from string to date might solve the problem, but it's actually not quite so simple. Mode (and most relational databases) format dates as YYYY-MM-DD, a format that makes a lot of sense because it will sort in the same order whether it's stored as a date or as a string.

PARTITION BY

If you'd like to narrow the window from the entire dataset to individual groups within the dataset, you can use PARTITION BY to do so: SELECT start_terminal, duration_seconds, SUM(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time) AS running_total FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08'. Using the PARTITION BY clause will allow you to begin counting 1 again in each partition. The following query starts the count over again for each terminal.

Date - SQL data types

Imported as DATE/TIME and stored as TIMESTAMP. Stores year, month, day, hour, minute and second values as YYYY-MM-DD hh:mm:ss.

VARCHAR

Imported as string. Stored as VARCHAR. Character data type with variable length.

What does it mean to "wrangle" data?

In other words, data wrangling (or munging) is the process of programmatically transforming data into a format that makes it easier to work with.

Reason's for changing a column's data type?

It's certainly best for data to be stored in its optimal format from the beginning, but if it isn't, you can always change it in your query. It's particularly common for dates or numbers, for example, to be stored as strings. This becomes problematic when you want to sum a column and you get an error because SQL is reading numbers as strings. When this happens, you can use CAST or CONVERT to change the data type to a numeric one that will allow you to perform the sum.

SUBSTR

LEFT and RIGHT both create substrings of a specified length, but they only do so starting from the sides of an existing string. If you want to start in the middle of a string, you can use SUBSTR. The syntax is SUBSTR(*string*, *starting character position*, *# of characters*)

Outer Joins (3x)

Left Join, Right Join, Outer Join

inner join

Most common type of join; includes rows in the query only when the joined field matches records in both tables. In mathematical terms, an inner join is the intersection of the two tables. rows from either table that are unmatched in the other table are not returned. Must rename same named columns.

Does count include nulls?

No count does not include nulls BUT count * includes nulls

Filtering in the ON clause

Normally, filtering is processed in the WHERE clause once the two tables have already been joined. It's possible, though that you might want to filter one or both of the tables before joining them. What's happening above is that the conditional statement AND... is evaluated before the join occurs. You can think of it as a WHERE clause that only applies to one of the tables. If you move the same filter to the WHERE clause, you will notice that the filter happens after the tables are joined.

COALESCE

Occasionally, you will end up with a dataset that has some nulls that you'd prefer to contain actual values. This happens frequently in numerical data (displaying nulls as 0 is often preferable), and when performing outer joins that result in some unmatched rows. In cases like this, you can use COALESCE to replace the null values: SELECT incidnt_num, descript, COALESCE(descript, 'No Description') FROM tutorial.sf_crime_incidents_cleandate ORDER BY descript DESC

How to give a table an alias?

Put a space after table name and then name

RANK() and DENSE_RANK()

RANK() would give the identical rows a rank of 2, then skip ranks 3 and 4, so the next result would be 5 DENSE_RANK() would still give all the identical rows a rank of 2, but the following row would be 3—no ranks would be skipped.

ROW_NUMBER()

ROW_NUMBER() does just what it sounds like—displays the number of a given row. It starts are 1 and numbers the rows according to the ORDER BY part of the window statement. ROW_NUMBER() does not require you to specify a variable within the parentheses: SELECT start_terminal, start_time, duration_seconds, ROW_NUMBER() OVER (ORDER BY start_time) AS row_number FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08'

RIGHT JOIN

Right joins are similar to left joins except they return all rows from the table in the RIGHT JOIN clause and only matching rows from the table in the FROM clause.is rarely used because you can achieve the results of a RIGHT JOIN by simply switching the two joined table names in a LEFT JOIN. For example, in this query of the Crunchbase dataset, the LEFT JOIN section

Query clause order

SELECT FROM WHERE GROUP BY HAVING ORDER BY

BETWEEN

SELECT * FROM tutorial.billboard_top_100_year_end WHERE year_rank BETWEEN 5 AND 10 <-- Includes 5 & 10

How to group by case?

SELECT CASE WHEN year = 'FR' THEN 'FR' ELSE 'Not FR' END AS year_group, COUNT(1) AS count FROM benn.college_football_players GROUP BY CASE WHEN year = 'FR' THEN 'FR' ELSE 'Not FR' END

Pivoting Data

SELECT COUNT(CASE WHEN year = 'FR' THEN 1 ELSE NULL END) AS fr_count, COUNT(CASE WHEN year = 'SO' THEN 1 ELSE NULL END) AS so_count, COUNT(CASE WHEN year = 'JR' THEN 1 ELSE NULL END) AS jr_count, COUNT(CASE WHEN year = 'SR' THEN 1 ELSE NULL END) AS sr_count FROM benn.college_football_players vs SELECT CASE WHEN year = 'FR' THEN 'FR' WHEN year = 'SO' THEN 'SO' WHEN year = 'JR' THEN 'JR' WHEN year = 'SR' THEN 'SR' ELSE 'No Year Data' END AS year_group, COUNT(1) AS count FROM benn.college_football_players GROUP BY 1

STRPOS

SELECT incidnt_num, descript, STRPOS(descript, 'A') AS a_position FROM tutorial.sf_crime_incidents_2014_01. Importantly, both the POSITION and STRPOS functions are case-sensitive. If you want to look for a character regardless of its case, you can make your entire string a single by using the UPPER or LOWER functions described below.

Using GROUP BY with ORDER BY

SELECT year, month, COUNT(*) AS count FROM tutorial.aapl_historical_stock_price GROUP BY year, month ORDER BY month, year

LAG and LEAD

SQL Window Functions Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning. In this lesson we'll cover: Intro to window functions Basic windowing syntax The usual suspects: SUM, COUNT, and AVG ROW_NUMBER() RANK() and DENSE_RANK() NTILE LAG and LEAD Defining a window alias Advanced windowing techniques This lesson uses data from Washington DC's Capital Bikeshare Program, which publishes detailed trip-level historical data on their website. The data was downloaded in February, 2014, but is limited to data collected during the first quarter of 2012. Each row represents one ride. Most fields are self-explanatory, except rider_type: "Registered" indicates a monthly membership to the rideshare program, "Casual" incidates that the rider bought a 3-day pass. The start_time and end_time fields were cleaned up from their original forms to suit SQL date formatting—they are stored in this table as timestamps. Intro to window functions PostgreSQL's documentation does an excellent job of introducing the concept of Window Functions: A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result. The most practical example of this is a running total: SELECT duration_seconds, SUM(duration_seconds) OVER (ORDER BY start_time) AS running_total FROM tutorial.dc_bikeshare_q1_2012 You can see that the above query creates an aggregation (running_total) without using GROUP BY. Let's break down the syntax and see how it works. Basic windowing syntax The first part of the above aggregation, SUM(duration_seconds), looks a lot like any other aggregation. Adding OVER designates it as a window function. You could read the above aggregation as "take the sum of duration_seconds over the entire result set, in order by start_time." If you'd like to narrow the window from the entire dataset to individual groups within the dataset, you can use PARTITION BY to do so: SELECT start_terminal, duration_seconds, SUM(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time) AS running_total FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' The above query groups and orders the query by start_terminal. Within each value of start_terminal, it is ordered by start_time, and the running total sums across the current row and all previous rows of duration_seconds. Scroll down until the start_terminal value changes and you will notice that running_total starts over. That's what happens when you group using PARTITION BY. In case you're still stumped by ORDER BY, it simply orders by the designated column(s) the same way the ORDER BY clause would, except that it treats every partition as separate. It also creates the running total—without ORDER BY, each value will simply be a sum of all the duration_seconds values in its respective start_terminal. Try running the above query without ORDER BY to get an idea: SELECT start_terminal, duration_seconds, SUM(duration_seconds) OVER (PARTITION BY start_terminal) AS start_terminal_total FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' The ORDER and PARTITION define what is referred to as the "window"—the ordered subset of data over which calculations are made. Note: You can't use window functions and standard aggregations in the same query. More specifically, you can't include window functions in a GROUP BY clause. Practice Problem Write a query modification of the above example query that shows the duration of each ride as a percentage of the total time accrued by riders from each start_terminal Try it out See the answer The usual suspects: SUM, COUNT, and AVG When using window functions, you can apply the same aggregates that you would under normal circumstances—SUM, COUNT, and AVG. The easiest way to understand these is to re-run the previous example with some additional functions. Make SELECT start_terminal, duration_seconds, SUM(duration_seconds) OVER (PARTITION BY start_terminal) AS running_total, COUNT(duration_seconds) OVER (PARTITION BY start_terminal) AS running_count, AVG(duration_seconds) OVER (PARTITION BY start_terminal) AS running_avg FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' Alternatively, the same functions with ORDER BY: SELECT start_terminal, duration_seconds, SUM(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time) AS running_total, COUNT(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time) AS running_count, AVG(duration_seconds) OVER (PARTITION BY start_terminal ORDER BY start_time) AS running_avg FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' Make sure you plug those previous two queries into Mode and run them. This next practice problem is very similar to the examples, so try modifying the above code rather than starting from scratch. Practice Problem Write a query that shows a running total of the duration of bike rides (similar to the last example), but grouped by end_terminal, and with ride duration sorted in descending order. Try it out See the answer ROW_NUMBER() ROW_NUMBER() does just what it sounds like—displays the number of a given row. It starts are 1 and numbers the rows according to the ORDER BY part of the window statement. ROW_NUMBER() does not require you to specify a variable within the parentheses: SELECT start_terminal, start_time, duration_seconds, ROW_NUMBER() OVER (ORDER BY start_time) AS row_number FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' Using the PARTITION BY clause will allow you to begin counting 1 again in each partition. The following query starts the count over again for each terminal: SELECT start_terminal, start_time, duration_seconds, ROW_NUMBER() OVER (PARTITION BY start_terminal ORDER BY start_time) AS row_number FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' RANK() and DENSE_RANK() RANK() is slightly different from ROW_NUMBER(). If you order by start_time, for example, it might be the case that some terminals have rides with two identical start times. In this case, they are given the same rank, whereas ROW_NUMBER() gives them different numbers. In the following query, you notice the 4th and 5th observations for start_terminal 31000—they are both given a rank of 4, and the following result receives a rank of 6: SELECT start_terminal, duration_seconds, RANK() OVER (PARTITION BY start_terminal ORDER BY start_time) AS rank FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' You can also use DENSE_RANK() instead of RANK() depending on your application. Imagine a situation in which three entries have the same value. Using either command, they will all get the same rank. For the sake of this example, let's say it's "2." Here's how the two commands would evaluate the next results differently: RANK() would give the identical rows a rank of 2, then skip ranks 3 and 4, so the next result would be 5 DENSE_RANK() would still give all the identical rows a rank of 2, but the following row would be 3—no ranks would be skipped. Practice Problem Write a query that shows the 5 longest rides from each starting terminal, ordered by terminal, and longest to shortest rides within each terminal. Limit to rides that occurred before Jan. 8, 2012. Try it out See the answer NTILE You can use window functions to identify what percentile (or quartile, or any other subdivision) a given row falls into. The syntax is NTILE(*# of buckets*). In this case, ORDER BY determines which column to use to determine the quartiles (or whatever number of 'tiles you specify). For example: SELECT start_terminal, duration_seconds, NTILE(4) OVER (PARTITION BY start_terminal ORDER BY duration_seconds) AS quartile, NTILE(5) OVER (PARTITION BY start_terminal ORDER BY duration_seconds) AS quintile, NTILE(100) OVER (PARTITION BY start_terminal ORDER BY duration_seconds) AS percentile FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' ORDER BY start_terminal, duration_seconds Looking at the results from the query above, you can see that the percentile column doesn't calculate exactly as you might expect. If you only had two records and you were measuring percentiles, you'd expect one record to define the 1st percentile, and the other record to define the 100th percentile. Using the NTILE function, what you'd actually see is one record in the 1st percentile, and one in the 2nd percentile. You can see this in the results for start_terminal 31000—the percentile column just looks like a numerical ranking. If you scroll down to start_terminal 31007, you can see that it properly calculates percentiles because there are more than 100 records for that start_terminal. If you're working with very small windows, keep this in mind and consider using quartiles or similarly small bands. Practice Problem Write a query that shows only the duration of the trip and the percentile into which that duration falls (across the entire dataset—not partitioned by terminal). Try it out See the answer LAG and LEAD It can often be useful to compare rows to preceding or following rows, especially if you've got the data in an order that makes sense. You can use LAG or LEAD to create columns that pull values from other rows—all you need to do is enter which column to pull from and how many rows away you'd like to do the pull. LAG pulls from previous rows and LEAD pulls from following rows: SELECT start_terminal, duration_seconds, LAG(duration_seconds, 1) OVER (PARTITION BY start_terminal ORDER BY duration_seconds) AS lag, LEAD(duration_seconds, 1) OVER (PARTITION BY start_terminal ORDER BY duration_seconds) AS lead FROM tutorial.dc_bikeshare_q1_2012 WHERE start_time < '2012-01-08' ORDER BY start_terminal, duration_seconds

SQL UNION

SQL joins allow you to combine two datasets side-by-side, but UNION allows you to stack one dataset on top of the other. Put differently, UNION allows you to write two separate SELECT statements, and to have the results of one statement display in the same table as the results from the other statement. Note that UNION only appends distinct values. More specifically, when you use UNION, the dataset is appended, and any rows in the appended table that are exactly identical to rows in the first table are dropped.

*

Select all. Can use SELECT *, case when column

LOWER/UPPER

Sometimes, you just don't want your data to look like it's screaming at you. You can use LOWER to force every character in a string to become lower-case. Similarly, you can use UPPER to make all the letters appear in upper-case: SELECT incidnt_num, address, UPPER(address) AS address_upper, LOWER(address) AS address_lower FROM tutorial.sf_crime_incidents_2014_01

Things that affect query runtimes

Table size, joins, aggregations, other users running the queries, & software optimization

DATE_TRUNC

The DATE_TRUNC function rounds a date to whatever precision you specify. The value displayed is the first value in that period. So when you DATE_TRUNC by year, any value in that year will be listed as January 1st of that year: SELECT cleaned_date, DATE_TRUNC('year' , cleaned_date) AS year, DATE_TRUNC('month' , cleaned_date) AS month, DATE_TRUNC('week' , cleaned_date) AS week, DATE_TRUNC('day' , cleaned_date) AS day, DATE_TRUNC('hour' , cleaned_date) AS hour, DATE_TRUNC('minute' , cleaned_date) AS minute, DATE_TRUNC('second' , cleaned_date) AS second, DATE_TRUNC('decade' , cleaned_date) AS decade FROM tutorial.sf_crime_incidents_cleandate

LENGTH

The LENGTH function returns the length of a string. So LENGTH(date) will always return 28 in this dataset. Since we know that the first 10 characters will be the date, and they will be followed by a space (total 11 characters) SELECT incidnt_num, date, LEFT(date, 10) AS cleaned_date, RIGHT(date, LENGTH(date) - 11) AS cleaned_time FROM tutorial.sf_crime_incidents_2014_01

ORDER BY

The ORDER BY clause allows you to reorder your results based on the data in one or more columns. SELECT * FROM tutorial.billboard_top_100_year_end ORDER BY artist DESC, group; Can order by case columns just not aggregate.

TRIM

The TRIM function is used to remove characters from the beginning and end of a string. Here's an example: SELECT location, TRIM(both '()' FROM location) FROM tutorial.sf_crime_incidents_2014_01. The TRIM function takes 3 arguments. First, you have to specify whether you want to remove characters from the beginning ('leading'), the end ('trailing'), or both ('both', as used above). Next you must specify all characters to be trimmed. Any characters included in the single quotes will be removed from both beginning, end, or both sides of the string. Finally, you must specify the text you want to trim using FROM

HAVING

The WHERE clause won't work for this because it doesn't allow you to filter on aggregate columns—that's where the HAVING clause comes in: SELECT year, month, MAX(high) AS month_high FROM tutorial.aapl_historical_stock_price GROUP BY year, month HAVING MAX(high) > 400 ORDER BY year, month

Creating an alias for a window clause

The WINDOW clause, if included, should always come after the WHERE clause.

Basic windowing syntax

The first part of the above aggregation, SUM(duration_seconds), looks a lot like any other aggregation. Adding OVER designates it as a window function. You could read the above aggregation as "take the sum of duration_seconds over the entire result set, in order by start_time."

Joining tables with identical column names

The results can only support one column with a given name—when you include 2 columns of the same name, the results will simply show the exact same result set for both columns even if the two columns should contain different data. You can avoid this by naming the columns individually. It happens that these two columns will actually contain the same data because they are used for the join key, but the following query technically allows these columns to be independent:

Why join on multiple keys?

There are couple reasons you might want to join tables on multiple foreign keys. The first has to do with accuracy. The second reason has to do with performance. SQL uses "indexes" (essentially pre-defined joins) to speed up queries.

JOIN ON

Though the ON statement comes after JOIN, it's a bit easier to explain it first. ON indicates how the two tables (the one after the FROM and the one after the JOIN) relate to each other.

ILIKE

To ignore case when you're matching values, you can use the ILIKE command. SELECT * FROM tutorial.billboard_top_100_year_end WHERE "group" ILIKE 'snoop%'

EXTRACT

Used for dates. SELECT cleaned_date, EXTRACT('year' FROM cleaned_date) AS year, EXTRACT('month' FROM cleaned_date) AS month, EXTRACT('day' FROM cleaned_date) AS day, EXTRACT('hour' FROM cleaned_date) AS hour, EXTRACT('minute' FROM cleaned_date) AS minute, EXTRACT('second' FROM cleaned_date) AS second, EXTRACT('decade' FROM cleaned_date) AS decade, EXTRACT('dow' FROM cleaned_date) AS day_of_week FROM tutorial.sf_crime_incidents_cleandate

GROUP BY

Used to aggregate only part of a table. Can group by multiple columns just must separate column names with commas. Can also substitute with numbers. Aggregate numbers are not allowed in group by.

CAST or CONVERT

Used to change data type. For example, CAST(column_name AS integer) and column_name::integer produce the same result.

Aliases in SQL

When performing joins, it's easiest to give your table names aliases. benn.college_football_players is pretty long and annoying to type—players is much easier. You can give a table an alias by adding a space after the table name and typing the intended name of the alias. As with column names, best practice here is to use all lowercase letters and underscores instead of spaces. Once you've given a table an alias, you can refer to columns in that table in the SELECT clause using the alias name.SELECT teams.conference AS conference, AVG(players.weight) AS average_weight FROM benn.college_football_players players JOIN benn.college_football_teams teams ON teams.school_name = players.school_name GROUP BY teams.conference ORDER BY AVG(players.weight) DESC

Can you asc in multiple columns?

Yes- Works well for year and month

EXPLAIN

You can add EXPLAIN at the beginning of any (working) query to get a sense of how long it will take. It's not perfectly accurate, but it's a useful tool. You can put before SELECT.

NOW()

You can add the current time (at the time you run the query) into your code using the NOW()function

_

You can also use _ (a single underscore) to substitute for an individual character:

CONCAT

You can combine strings from several columns together (and with hard-coded values) using CONCAT. Simply order the values you want to concatenate and separate them with commas. If you want to hard-code values, enclose them in single quotes. Here's an example:

Arithmetic in SQL

You can perform arithmetic in SQL using the same operators you would in Excel: +, -, *, /. However, in SQL you can only perform arithmetic across columns on values in a given row. To clarify, you can only add values in multiple columns from the same row together using +—if you want to add values across multiple rows, you'll need to use aggregate functions, which are covered in the Intermediate SQL section of this tutorial. The example below illustrates the use of +: SELECT year, month, west, south, west + south AS south_plus_west FROM tutorial.us_housing_units

LEFT / RIGHT

You can use LEFT/RIGHT to pull a certain number of characters from the left side of a string and present them as a separate string. The syntax is LEFT(string, number of characters) OR The syntax is RIGHT(string, number of characters)

Where can you use subqueries?

You can use subqueries in conditional logic (in conjunction with WHERE, JOIN/ON, or CASE). However, IN is the only type of conditional logic that will work when the inner query contains multiple results. Note that you should not include an alias when you write a subquery in a conditional statement. This is because the subquery is treated as an individual value (or set of values in the IN case) rather than as a table.

NTILE

You can use window functions to identify what percentile (or quartile, or any other subdivision) a given row falls into. The syntax is NTILE(*# of buckets*). In this case, ORDER BY determines which column to use to determine the quartiles (or whatever number of 'tiles you specify).

Window Functions limitations

You can't use window functions and standard aggregations in the same query. More specifically, you can't include window functions in a GROUP BY clause.

FULL JOIN

You're not likely to use FULL JOIN (which can also be written as FULL OUTER JOIN) too often, but it's worth covering anyway. LEFT JOIN and RIGHT JOIN each return unmatched rows from one of the tables—FULL JOIN returns unmatched rows from both tables. It is commonly used in conjunction with aggregations to understand the amount of overlap between two tables.

DESC

Z-> A, New to old, low to high

interval

a series of integers that represent a period of time. The following query uses date subtraction to determine how long it took companies to be acquired (unacquired companies and those without dates entered were filtered out). You can introduce intervals using the INTERVAL function as well

IN

allows you to specify a list of values you'd like to include. Is a logical operator in SQL that allows you to specify a list of values that you'd like to include in the results. As with comparison operators, you can use non-numerical values, but they need to go inside single quotes. Regardless of the data type, the values in the list must be separated by commas. SELECT * FROM tutorial.billboard_top_100_year_end WHERE year_rank IN (1, 2, 3) OR SELECT * FROM tutorial.billboard_top_100_year_end WHERE artist IN ('Taylor Swift', 'Usher', 'Ludacris')

POSITION

allows you to specify a substring, then returns a numerical value equal to the character number (counting from left) where that substring first appears in the target string. SELECT incidnt_num, descript, POSITION('A' IN descript) AS a_position FROM tutorial.sf_crime_incidents_2014_01. Importantly, both the POSITION and STRPOS functions are case-sensitive. If you want to look for a character regardless of its case, you can make your entire string a single by using the UPPER or LOWER functions described below.

LEFT JOIN

command tells the database to return all rows in the table in the FROM clause, regardless of whether or not they have matches in the table in the LEFT JOIN clause.

UNION ALL

f you'd like to append all the values from the second table, use UNION ALL. You'll likely use UNION ALL far more often than UNION.

integer vs. float

integer is a number w/out a decimal; float is a number with a decimal

LIKE

is a logical operator in SQL that allows you to match on similar values rather than exact ones. Is case-sensitive, meaning that the above query will only capture matches that start with a capital "S" and lower-case "noop." SELECT * FROM tutorial.billboard_top_100_year_end WHERE "group" LIKE 'Snoop%'

relational database

refers to the fact that the tables within it "relate" to one another—they contain common identifiers that allow information from multiple tables to be combined easily.

SELECT DISTINCT

specifies that the statement is going to be a query that returns unique values in the specified column(s). If you include two (or more) columns in a SELECT DISTINCT clause, your results will contain all of the unique pairs of those two columns: Note: You only need to include DISTINCT once in your SELECT clause—you do not need to add it for each column name. SELECT COUNT(DISTINCT month) AS unique_months FROM tutorial.aapl_historical_stock_price

CASE

statement is SQL's way of handling if/then logic. The CASE statement is followed by at least one pair of WHEN and THEN statements—SQL's equivalent of IF/THEN in Excel. Because of this pairing, you might be tempted to call this SQL CASE WHEN, but CASE is the accepted term. Every CASE statement must end with the END statement. The ELSE statement is optional, and provides a way to capture values not specified in the WHEN/THEN statements. The CASE statement always goes in the SELECT clause CASE must include the following components: WHEN, THEN, and END. ELSE is an optional component. You can make any conditional statement using any conditional operator (like WHERE ) between WHEN and THEN. This includes stringing together multiple conditional statements using AND and OR. You can include multiple WHEN statements, as well as an ELSE statement to deal with any unaddressed conditions. Completed in order. Best practice is to not overlap case statements.

%

used above represents any character or set of characters. In this case, % is referred to as a "wildcard." SELECT * FROM tutorial.billboard_top_100_year_end WHERE "group" ILIKE 'snoop%'

Using comparison operators with joins

you can enter any type of conditional statement into the ON clause. This technique is especially useful for creating date ranges as shown above.


संबंधित स्टडी सेट्स

informed consent/ restraints questions

View Set

Dental Assisting— Radiology Ch. 39, 40, 41, 42 FINAL EXAM

View Set

for the real music nerds😝🤠

View Set

Ch 18: Eating/Feeding Disorders QUESTIONS (Varcarolis Psych-Mental Health Nursing)

View Set

LearnSmart Chapter 5 Conceptual Questions

View Set

Anatomy and Physiology Midterm SG

View Set

Ch. 7: Bone Structure and Function (Learnsmart Quiz)

View Set