My SQL
Combining aggregate functions with WHERE
Aggregate functions can be combined with the WHERE clause to gain further insights from your data. For example, to get the total budget of movies made in the year 2010 or later: SELECT SUM(budget) FROM films WHERE release_year >= 2010;
How do you think ORDER BY sorts a column of text values by default?
Alphabetically (A-Z)
The LIMIT Operator
If you only want to return a certain number of results, you can use the LIMIT keyword to limit the number of rows returned: SELECT * FROM people LIMIT 10;
How to get the records with missing values?
null
A word of warning: SQL will return an error if you try to SELECT a field that is not in your GROUP BY clause without using it to calculate some kind of value about the entire group.
Note that you can combine GROUP BY with ORDER BY to group your results, calculate something about them, and then order your results. For example, SELECT sex, count(*) FROM employees GROUP BY sex ORDER BY count DESC; Note also that ORDER BY always goes after GROUP BY
GROUP BY
Now you know how to sort results! Often you'll need to aggregate results. For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, GROUP BY allows you to group a result by one or more columns, like so: SELECT sex, count(*) FROM employees GROUP BY sex; This might give, for example: sex count -- male 15 -- female 19 Commonly, GROUP BY is used with aggregate functions like COUNT() or MAX(). Note that GROUP BY always goes after the FROM clause!
COUNT
COUNT(*) tells you how many rows are in a table. SELECT COUNT(*) FROM people; However, if you want to count the number of non-missing values in a particular column, you can call COUNT on just that column. For example, to count the number of birth dates present in the people table: SELECT COUNT(birthdate) FROM people;
Compound Join Conditions
icon tool - yellow key - composite primary key
Database
- a collection of organized data that allows access, retrieval, and use of data - is a collection of data stored in a format that can easily be accessed
The IS NULL Operator
In SQL, NULL represents a missing or unknown value. You can check for NULL values using the expression IS NULL. For example, to count the number of missing birth dates in the people table: SELECT COUNT(*) FROM people WHERE birthdate IS NULL; As you can see, IS NULL is useful when combined with WHERE to figure out what data you're missing. Sometimes, you'll want to filter out missing values so you only get results which are not NULL. To do this, you can use the IS NOT NULL operator. For example, this query gives the names of all people whose birth dates are not missing in the people table. SELECT name FROM people WHERE birthdate IS NOT NULL;
HAVING a great time
In SQL, aggregate functions can't be used in WHERE clauses. For example, the following query is invalid: SELECT release_year FROM films GROUP BY release_year WHERE COUNT(title) > 10; This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the HAVING clause comes in. For example, SELECT release_year FROM films GROUP BY release_year HAVING COUNT(title) > 10; shows only those years in which more than 10 films were released.
ORDER BY
In SQL, the ORDER BY keyword is used to sort results in ascending or descending order according to the values of one or more columns. By default ORDER BY will sort in ascending order. If you want to sort the results in descending order, you can use the DESC keyword. For example, SELECT title FROM films ORDER BY release_year DESC; gives you the titles of films sorted by release year, from newest to oldest.
The WHERE Clause
In SQL, the WHERE keyword allows you to filter based on both text and numeric values in a table. There are a few different comparison operators you can use: = equal <> not equal/not ! < less than > greater than <= less than or equal to >= greater than or equal to For example, you can filter text records such as title. The following code returns all films with the title 'Metropolis': SELECT title FROM films WHERE title = 'Metropolis'; Notice that the WHERE clause always comes after the FROM statement!
The SELECT Statement The SELECT Clause
In SQL, you can select data from a table using a SELECT statement. For example, the following query selects the name column from the people table: SELECT name FROM people; selects two columns, name and birthdate, from the people table: SELECT name, birthdate FROM people; Sometimes, you may want to select all columns from a table. Typing out every column name would be a pain, so there's a handy shortcut: SELECT * FROM people;
A note on arithmetic
In addition to using aggregate functions, you can perform basic arithmetic with symbols like +, -, *, and /. So, for example, this gives a result of 12: SELECT (4 * 3); However, the following gives a result of 1: SELECT (4 / 3); What's going on here? SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing! If you want more precision when dividing, you can add decimal places to your numbers. For example, SELECT (4.0 / 3.0) AS result; gives you the result you would expect: 1.333.
Sorting multiple columns
ORDER BY can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example, SELECT birthdate, name FROM people ORDER BY birthdate, name; sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. The order of columns is important!
Aggregate functions
Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this. For example, SELECT AVG(budget) FROM films; gives you the average value from the budget column of the films table. Similarly, the MAX function returns the highest budget: SELECT MAX(budget) FROM films; The SUM function returns the result of adding up the numeric values in a column: SELECT SUM(budget) FROM films; MIN function
Get the number of decades the films table covers. Alias the result as number_of_decades. The top half of your fraction should be enclosed in parentheses.
SELECT (MAX(release_year) - MIN(release_year)) / 10.0 AS number_of_decades FROM films;
Get all details for all films except those released in 2015 and order them by duration.
SELECT * FROM films WHERE release_year <> 2015 ORDER BY duration;
Inner join
SELECT * FROM left_table INNER JOIN right_table ON left_table.id = right_table.id;
Count the number of unique birth dates in the people table.
SELECT COUNT (DISTINCT birthdate) FROM people;
Get the names of people whose names have 'r' as the second letter. The pattern you need is '_r%'
SELECT name FROM people WHERE name LIKE '_r%';
Get the name and birth date of the person born on November 11th, 1974. Remember to use ISO date format
SELECT name, birthdate FROM people WHERE birthdate = '1974-11-11';
When combining AND and OR, be sure to enclose the individual clauses in parentheses
SELECT title FROM films WHERE (release_year = 1994 OR release_year = 1995) AND (certification = 'PG' OR certification = 'R'); Otherwise, due to SQL's precedence rules, you may not get the results you're expecting!
Get the title and duration in hours for all films. The duration is in minutes, so you'll need to divide by 60.0 to get the duration in hours. Alias the duration in hours as duration_hours.
SELECT title, duration/60.0 AS duration_hours FROM films;
Get the title and net profit (the amount a film grossed, minus its budget) for all films. Alias the net profit as net_profit.
SELECT title, gross-budget AS net_profit FROM films;
Write a query to get the title and release year of films released in the 90s which were in French or Spanish and which took in more than $2M gross.
SELECT title, release_year FROM films WHERE (release_year >= 1990 AND release_year < 2000) AND (language = 'French' OR language = 'Spanish') AND (gross > 2000000)
Sorting single columns (DESC)
To order results in descending order, you can put the keyword DESC after your ORDER BY. For example, to get all the names in the people table, in reverse alphabetical order: SELECT name FROM people ORDER BY name DESC;
Common use
USE SELECT FROM WHERE -- AND, OR, NOT -- IN -- BETWEEN -- LIKE -- REGEXP ORDER BY DESC/ACES??? LIMIT
The LIKE/NOT LIKE Operator
WHERE clause can be used to filter text data. However, so far you've only been able to filter by specifying the exact text you're interested in. In the real world, often you'll want to search for a pattern rather than a specific text string. In SQL, the LIKE operator can be used in a WHERE clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE: The % wildcard will match zero, one, or many characters in text. For example, the following query matches companies like 'Data', 'DataC' 'DataCamp', 'DataMind', and so on: SELECT name FROM companies WHERE name LIKE 'Data%'; The _ wildcard will match a single character. For example, the following query matches companies like 'DataCamp', 'DataComp', and so on: SELECT name FROM companies WHERE name LIKE 'DataC_mp'; You can also use the NOT LIKE operator to find records that don't match the pattern you specify.
WHERE -- IN
WHERE is very useful for filtering results. However, if you want to filter based on many conditions, WHERE can get unwieldy. For example: SELECT name FROM kids WHERE age = 2 OR age = 4 OR age = 6 OR age = 8 OR age = 10; Enter the IN operator! The IN operator allows you to specify multiple values in a WHERE clause, making it easier and quicker to specify multiple OR conditions! So, the above example would become simply: SELECT name FROM kids WHERE age IN (2, 4, 6, 8, 10);
WHERE -- OR
What if you want to select rows based on multiple conditions where some but not all of the conditions need to be met? -- For this, SQL has the OR operator. For example, the following returns all films released in either 1994 or 2000: SELECT title FROM films WHERE release_year = 1994 OR release_year = 2000; Note that you need to specify the column for every OR condition, so the following is invalid: SELECT title FROM films WHERE release_year = 1994 OR 2000;
It's AS simple AS aliasing
You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example, SELECT MAX(budget) FROM films; gives you a result with one column, named max. But what if you use two functions like this? SELECT MAX(budget), MAX(duration) FROM films; Well, then you'd have two columns named max, which isn't very useful! To avoid situations like this, SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS keyword For example, in the above example we could use aliases to make the result clearer: SELECT MAX(budget) AS max_budget, MAX(duration) AS max_duration FROM films; Aliases are helpful for making results more readable!
The REGEXP Operator
regular expression '^Field' (Caret sign) - Last name must start/begin with Field 'Field$' (Dollar sign) - Last name must end with Field 'Field|Mac' (Pipe sign) - Last name has the word Field or Mac '[gim]e' (square brackets sign) - Last name contains ge, ie, me '[a-h]e' - Last name contains a-h before e
WHERE -- BETWEEN
you can use the following query to get titles of all films released in and between 1994 and 2000: SELECT title FROM films WHERE release_year >= 1994 AND release_year <= 2000; Checking for ranges like this is very common, so in SQL the BETWEEN keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above: SELECT title FROM films WHERE release_year BETWEEN 1994 AND 2000; It's important to remember that BETWEEN is inclusive, meaning the beginning and end values are included in the results!
WHERE -- AND
you'll want to select data based on multiple conditions. You can build up your WHERE queries by combining multiple conditions with the AND keyword. For example, SELECT * FROM films WHERE language = 'Spanish' AND release_year > 2000 AND release_year < 2010