Chapter 13. Basic SQL Management

Ace your homework & exams now with Quizwiz!

However, values and data are case sensitive, so these are all different:

insert into name values('SEAN'); -- The name will be stored all in capitals insert into name values('sean'); -- The name will be stored in lowercase insert into NAME values('Sean'); -- The table is called name, not NAME

Common data types are

int—An integer smallint, mediumint, bigint—Integer types with various limits character, varchar—Character strings, which require a length to be passed such as varchar(255) text—A longer string or document that is larger than a database dependent limitation blob—A binary object such as a picture float—A floating point number decimal—A number with a decimal place stored in fixed precision format Boolean—A true or false value Date and datetime—Either a date or a date with a timestamp

The statements used to create the tables for this book are (with formatting added for clarity):

sqlite> .schema CREATE TABLE author ( id integer primary key, first_name varchar(255), last_name varchar(255)); CREATE TABLE book ( id integer primary key, title varchar(255), year integer, author_id integer, written integer);

Deleting data is similar to querying data as you're deleting the results of a query:

sqlite> DELETE FROM book WHERE author_id IS NULL; This query deletes any book where the author_id is missing.

So to insert a new book:

sqlite> INSERT INTO book (title, year, author_id) VALUES ('Sketch of the Analytical Engine', 1842, 3); sqlite> SELECT * FROM book WHERE author_id = 3;

An alternate form of INSERT is to specify values for all the columns, which eliminates the need to tell the database which columns you will use.

sqlite> INSERT INTO book VALUES (NULL, "LPIC-1/CompTIA Linux+ Cert Guide", NULL, 3, 2015);

The SELECT statement is the command you use to extract information from your database. The simplest way to use it is to ask for a single string:

sqlite> SELECT 'Hello, world!'; Hello, world!

Typing in column names can be exhausting, so SQL lets you alias tables within the query with the AS keyword. The AS is used right after the first mention of the table name and is directly followed by the alias. The query used so far can be rewritten as follows:

sqlite> SELECT * FROM author AS a JOIN book AS b ON b.author_id = a.id;

The join clause used in the preceding example always mentions the name of the table when talking about the column. book.author_id = author.id means the author_id from book and the id from author. If we forget the names of the tables, the database has problems figuring out where the id column comes from, as there are two:

sqlite> SELECT * FROM author JOIN book ON author_id=id;

In the simple book database you can obtain a list of books and their authors:

sqlite> SELECT * FROM author JOIN book ON book.author_id = author.id;

Sorting your results helps you to make more sense of the report. Rows normally come back in an indeterminate manner as the query gets more complicated. The ORDER BY clause comes toward the end of the query. ORDER BY typically expects a column name, or set of column names, on which to sort:

sqlite> SELECT * FROM book ORDER BY written, year;

It would be far more interesting to get information out of a table. This requires the FROM keyword to indicate the name of the table. The FROM clause happens after the SELECT clause.

sqlite> SELECT * FROM book;

NULL is more than an empty value. NULL means "information is missing." Therefore you can't compare anything to NULL:

sqlite> SELECT 1=1; 1=1 ---------- 1 sqlite> SELECT NULL=NULL; NULL=NULL ----------

GROUP BY expects a comma-separated list of columns. Rows that have the same values in those columns are rolled up and counted by aggregate functions in the SELECT clause. This query provides a list of authors and the number of books in the database:

sqlite> SELECT first_name, last_name, COUNT(title) AS books FROM author LEFT JOIN book ON author_id=author.id GROUP BY first_name, last_name;

When you query for authors and books and get back an author who hasn't written a book, the database needs to fill in that blank spot somehow. Similarly, a field that has no value, such as the publication date of an unpublished book, needs some kind of placeholder. This placeholder is called a NULL.

sqlite> SELECT first_name, last_name, title FROM author LEFT JOIN book ON author_id=author.id WHERE author.id = 3;

As a simple example, one can retrieve the author IDs where the author's name is Sean with a simple query:

sqlite> SELECT id FROM author WHERE first_name = "Sean";

Those results can be injected into another query, such as to find all the books written by that author:

sqlite> SELECT title FROM book WHERE author_id IN (SELECT id FROM author WHERE first_name = "Sean");

In the previous example, the query first sorts by the year of writing, with the publishing year column used for a tie breaker. Sorts are performed in ascending order by default, which can be overridden with the ASC and DESC keywords:

sqlite> SELECT title from book ORDER by title DESC;

You're not limited to one question in your WHERE clause. Conditions may be put together with AND and OR. An AND requires that both conditions are true, an OR requires that only one is true. So you can find all the Exam Cram 2 books published in 2005 or later, or any book where the writing started in 2000:

sqlite> SELECT title, year FROM book WHERE (title LIKE '%Exam Cram 2' AND year >= 2005) OR written = 2000;

Searching ranges is possible through the BETWEEN operator, which takes two numbers and matches values between the two:

sqlite> SELECT title, year FROM book WHERE year BETWEEN 2004 AND 2005;

You can also search for specific numbers in a set, such as to find books published in 2001 or 2005 using the IN keyword. This keyword expects a list of values to check, enclosed in parentheses.

sqlite> SELECT title, year FROM book WHERE year IN (2001, 2005);

You can get more specific columns by asking for them individually:

sqlite> SELECT title, year FROM book;

The preceding result set was sorted by title in descending order, so the Wireless book is displayed before the Linux books. Another form of ORDER BY refers to the column number rather than the name:

sqlite> SELECT title, year from book ORDER by 2 DESC;

Tables may have thousands or millions of records and looking at them all would become impossible. A WHERE clause lets you filter results based on one or more conditions. This clause comes after the FROM clause.

sqlite> SELECT year, title FROM book WHERE year >= 2005;

It's also possible to set a value based on another column, such as if you want to set the year the book was written to the same as the publication year if it's unspecified:

sqlite> UPDATE book SET written=year WHERE written IS NULL;

To set a publication year for the book entered in the last section:

sqlite> UPDATE book SET year=2015 WHERE id=6;

The database doesn't know how to match up the rows if you don't give it conditions. It will match up every row on the left with every row on the right:

sqlite> select * from author join book;

Run sqlite3 with the path to a database file to get started. If the file doesn't exist, it is created.

$ sqlite3 newdatabase

The basic numeric comparison operators are

= - equals <> - not equals < - less than <= - less than or equal to > - greater than >= - greater than or equal to

normalization—

A database term where entities are split up into various relationships to reduce duplication. For example, a book may be related to an author so that there is less duplication as an author writes more books.

database schema—

A description of all the tables, columns, and relationships of a relational database.

A join is part of a query that tells the database how to match rows between two tables. Tables that are joined are also available to the other clauses such as WHERE, ORDER BY, and SELECT.

A join is performed with the JOIN keyword, which comes after the FROM clause. The query is built from left to right, so you need to join something onto the table you specified in the FROM clause.

many-to-many relationship—

A relationship between two entities that allows each side to have multiple relationships. A parent may have many children, and a child may have many parents.

many-to-one relationship—

A relationship between two entities where one side belongs to another. A car has many tires, but a given tire only belongs to one car.

null value—

A value in a database that represents an unknown value.

Consider this query: SELECT * FROM products LEFT JOIN sales on products.name = sales.product_name Which of the following are true? (Choose two.) a. All products will be included even if they have no sales. b. All sales will be included even if the product is not missing. c. The sales table contains the product's name in the product_name column. d. This query will calculate total sales per product.

A, C. A is correct because the left join of products onto sales ensures all products will be included. C is correct because the join matches the product name to the product_name column in sales. B is not correct because the sales need to match up to the product. D is not correct because there is no aggregate clause like SUM or a GROUP BY clause.

A NULL in a column means: a. The value is missing. b. It is the same as zero. c. It is the same as a blank string. d. The column will match anything.

A. A is correct because a NULL is a placeholder for a missing value. NULL is not comparable to anything; therefore, it is not zero, a blank string, or able to match anything.

Your employee table has a column for last name and office location. How would you produce a report of people sorted first by office and then by last name? a. SELECT * FROM employee ORDER BY office_location, last_name; b. SELECT * FROM employee ORDER BY last_name GROUP BY office_location; c. SELECT * FROM employee ORDER BY last_name, office_location; d. SELECT * FROM employee ORDER BY office_location GROUP BY last_name;

A. A is correct because it uses a single sort with two columns, where office location is before last name. B is not correct because grouping by office location will return one row per office. C is not correct because the sort columns are reversed. D is not correct because of similar reasons to B.

Which of the following are case sensitive when writing SQL queries? a. Names of tables b. Keywords such as SELECT and INSERT c. Logic operators such as AND and OR d. Names of database servers

A. A is correct because table names are case sensitive; that is, table is different from Table. B is not correct because SQL keywords can be written in any case. C is not correct because AND and OR are also SQL keywords and are therefore case insensitive. D is not correct because the server name is not part of SQL.

Statements in SQL are separated by: a. ; b. A blank line c. A new line d. .

A. A is correct because the semicolon is the statement delimiter. New lines, blank or not, are ignored. Periods are not special in SQL.

There are more aggregate functions, depending on your database:

AVG(column)—Returns the average of the column (NULLs are removed.) COUNT(column)—Counts the number of non-NULL instances of column, or total rows if * is used instead of a column name MIN(column)—Returns the minimum non-NULL value in the column MAX(column)—Returns the maximum non-NULL value in the column SUM(column)—Adds up all the non-NULL values in the column

Given a table for managers and employees, how would you produce a report of the managers with their employees, ignoring managers with no employees? a. SELECT * FROM managers, employees; b. SELECT * FROM managers AS m JOIN employees AS ee ON (ee.manager_id = m.id); c. SELECT * FROM managers AS m LEFT JOIN employees AS ee ON (ee.manager_id = m.id); d. SELECT * FROM managers JOIN employees(id=manager_id);

B. B is correct because it performs an inner join so that the managers without employees are excluded. A is not correct because that returns a cross join with all managers and employee combinations. C is not correct because a left join includes managers without employees. D is not correct because the join syntax is invalid.

Your employee table tracks years of service, but there are many employees where the years are not known and have a NULL in that column. How could you change everyone in this situation to have 20 years of experience? a. UPDATE employee SET years_of_service = 20 WHERE years_of_service = NULL; b. UPDATE employee SET years_of_service = 20 WHERE years_of_service IS NULL; c. UPDATE employee SET years_of_service = 20; d. UPDATE employee.years_of_service = 20 WHERE years_of_service IS NULL;

B. B is correct because it uses an UPDATE with an appropriate WHERE clause, remembering that you can't directly compare NULL and need to use IS NULL. A is not correct because it tries to compare directly to NULL. C is not correct because there is no WHERE clause, so everyone will be set to 20 years. D is not correct because the format is not correct: The UPDATE specifies the table name and the SET specifies the column.

Your employee table has a column called years_of_service that tracks how many years each employee has worked with the company. It is year end and you've been asked to increase this number by one for every employee. Which command does this? a. INSERT INTO employee SET years_of_service=years_of_service+1; b. UPDATE employee SET years_of_service=years_of_service+1; c. UPDATE years_of_service SET years_of_service+1; d. UPDATE employee(years_of_service) += 1;

B. B is correct because it uses the UPDATE command and contains the required elements—a table name and instructions for the SET clause. A is not correct because the problem requires data be updated, not inserted. C is not correct because the word that follows the UPDATE keyword should be a table name. D is not correct because the column to be updated needs to be specified with the SET keyword.

Which of the following is incorrect about relational databases? a. Tables are arranged in rows and columns. b. New data elements can be added to a row without changing the schema. c. SQL is used to query the data. d. Rows are given unique identifiers.

B. B is correct because variable elements are a property of schemaless databases, not relational databases. A is incorrect because a table is a series of columns corresponding to data elements, with each record forming its own row. C is incorrect because SQL is the typical language used to query a relational database. D is not correct because each row has a primary key that identifies it.

Which of the following queries will successfully return people from the employee table who are at least 30 years old? a. SELECT * FROM Employee WHERE age >= 30; b. SELECT * WHERE age >= 30 c. SELECT * FROM employee WHERE age >= 30; d. SELECT * WHERE age >=30 FROM employee;

C. C is correct because it has the proper ordering of clauses: SELECT, FROM, and WHERE. It also uses the correct case for the table. A is not correct because the table name is capitalized in the answer, but the table given in the question is all lowercase. B is not correct because it is missing the FROM clause and does not end with a semicolon. D is not correct because the FROM and WHERE clauses are reversed.

You have an employee table that includes a job description and a salary column. How do you calculate the average salary per position using SQL? a. SELECT *, SUM(salary) FROM employee GROUP BY description; b. SELECT description, SUM(salary) FROM employee; c. SELECT description, SUM(salary) FROM employee GROUP BY description; d. SELECT description, SUM(salary) FROM employee GROUP BY salary;

C. C is correct because it uses the GROUP BY keyword to group on description and sums the salary. A is not correct because columns are rolled up so SELECT * is meaningless. B is not correct because the query is missing a GROUP BY clause. D is not correct because the query needs to group by description, not salary.

A table with rows and columns can be found in a(n): a. Flat file database b. Key-value database c. Relational database d. Schemaless database

C. C is correct because relational databases store entities in tables, which are comprised of rows and columns. A is not correct because flat files have no structure. B is not correct because keyvalue databases have a key and a value rather than a table. D is not correct because schemaless databases store documents and don't have a set of formal rows and columns.

Given a table of cars and a table recalled models, which of the following uses a subselect correctly to find cars where the model has been recalled since January 1, 2010? cars: id, model, owner_name recalls: id, model, recall_date a. SELECT * FROM cars LEFT JOIN recalls ON (recalls.model=cars.model) WHERE (recall_date >= '2010-01-01'); b. SELECT * FROM cars WHERE model IN (SELECT * FROM recalls WHERE recall_date >= '2010-01-01'); c. SELECT * FROM cars WHERE model IN (SELECT model FROM recalls WHERE recall_date >= '2010-01-01'); d. SELECT * FROM cars WHERE model IN (SELECT model FROM recalls) WHERE recall_date >= '2010-01-01';

C. C is correct because the subselect includes a single column of models, and the query uses the IN clause. A is not correct because it does not use a subselect; it uses a join. B is not correct because the subselect includes all the columns and cannot be executed by the database. D is not correct because the WHERE clause is outside the subselect; cars does not have a recall_date.

Creating Tables Unless you're a developer, there's a good chance you'll never have to create a table. However, for completeness, tables are created with the CREATE TABLE command. The general form of this command is

CREATE TABLE tablename ( ColumnName1 type1 options1, ColumnName2 type2 options2 ); Where the ColumnNames are the names you give the column, the type describes what is stored in the column, and the options are some optional adjustments to what is stored in the table.

What do databases use to look up data quickly? a. Binary object b. Client-server c. Embedded database d. Index

D. D is correct because indexes provide fast lookups to large sections of data. A is not correct because a binary object is some data, such as a picture, that is not text in nature. B is not correct because client-server is a method used to communicate with a database but does not contribute to looking up data quickly. C is not correct because embedding a database reduces the footprint of the application but does not contribute to speed the same way an index would.

Which of the following queries properly use table aliases? a. SELECT * FROM employee WHERE employee AS ee.hire_date > '2010-01-01' AND ee.status IS NOT NULL. b. SELECT * FROM employee WHERE employee.hire_date > '2010-01-01' AND ee.status IS NOT NULL. c. SELECT * FROM employee AS ee WHERE ee.hire_date > '2010-01-01' AND employee.status IS NOT NULL. d. SELECT * FROM employee ee WHERE ee.hire_date > '2010-01-01' AND ee.status IS NOT NULL

D. D is correct because it aliases the table name in the FROM clause and then uses the alias throughout the query. A is not correct because the table is not aliased in the FROM clause. B is not correct because the table is not aliased at all, but the WHERE clause includes a use of the alias. C is not correct because the second condition doesn't use the alias.

You have a table containing employee data, and one of the fields is years_of_service. How would you find everyone who has worked for at least 5 years, but no more than 20? a. SELECT * FROM employee WHERE years_of_service > 5 AND years_of_service < 20; b. SELECT * FROM employee where years_of_service BETWEEN 5, 20; c. SELECT * FROM employee where years_of_service BETWEEN 5 AND 21; d. SELECT * FROM employee where years_of_service BETWEEN 5 AND 20;

D. D is correct because it selects people who have between 5 and 20 years of service, inclusive. A is not correct because the conditionals miss people who have been working for 5 or 20 years. B is not correct because the syntax of the BETWEEN keyword requires an AND, not a comma. C is incorrect because the ranges are inclusive, and this includes people with 21 years experience.

Given two tables representing cars and their owners: cars: id, model, owner_id owners: id, name How would you display a list of all cars and their owners, including cars that don't have owners? a. SELECT * FROM owners LEFT JOIN cars ON owners.id = cars.owner_id; b. SELECT * FROM cars, owners WHERE cars.owner_id = owners.id; c. SELECT * FROM cars JOIN owners ON owners.id = cars.owner_id; d. SELECT * FROM cars LEFT JOIN owners ON owners.id = cars.owner_id;

D. D is correct because it takes all the cars and left joins the owners. A is not correct because it returns all the owners even if they don't have cars, not the other way around. B is not correct because that is doing an inner join, showing only cars that have an owner. C is not correct because it is also doing an inner join.

Which of the following statements is correct about the following SQL statement: INSERT INTO employee (name, age) VALUES ('Alice', 30), ('Bob', 32); a. The values have the rows and columns mixed; it should be ('Alice', 'Bob'), (30, 22);. b. The statement needs to specify all the columns in employee. c. The statement needs to be split into two. d. The statement inserts two rows.

D. D is correct because the statement is a valid INSERT with two data items. A is not correct because each row is enclosed in parentheses. B is not correct because the statement only specifies two columns and that is allowed as long as there are no constraints on missing data. C is not correct because you can insert multiple rows at a time by separating each row with a comma.

The subselects are enclosed in parentheses and usually given as an argument to a WHERE column IN clause. The subselect must return only one column; otherwise, you might get an error like this:

Error: only a single result allowed for a SELECT that is part of an expression

If you have many rows to insert you can add them all in one command:

INSERT INTO book (title) VALUES ('title1'), ('title2');

The SQL commands themselves are case-insensitive, but table names and data strings are case-sensitive. Therefore the following statements are all the same:

INSERT INTO name VALUES('Sean'); insert into name values('Sean'); InSeRt INto name VALues('Sean');

The INSERT command is the primary way to get new rows into your tables. The format of the INSERT command is

INSERT INTO tablename (columns) VALUES (values);

Check to see whether your system has the sqlite3 command by typing which sqlite3.

If you get back a path to the binary, you have it. If not, install it with apt-get install sqlite3 for Debian and Ubuntu systems or yum install sqlite on Red Hat and Fedora. A SQLite database is a single file. Any SQLite client can open this file and manipulate it by either using the shared libraries or the command line client.

A key-value database

Key-value databases can be so simple that they don't even need to have a server running. They can be simple files on disk that a local client opens and reads. Tools like rpm and apt store their data in keyvalue databases because there is no overhead in running a server, and the application benefits from the speed of looking up keys and indexes that can make searches of the values themselves fast.

Database Basics

Most databases follow a client-server model where an application, called the client, connects over the network to a database server that stores the data. The client issues requests to the server and reads the response back over the network. Sometimes the client and the server are on the same machine but are separate entities and must still talk over some kind of connection.

The options give some more context to the column and allow for better data consistency:

NOT NULL—Do not allow NULL values to be stored. PRIMARY KEY—This column is a primary key that is indexed and must be unique for each row. UNIQUE—This column is not the primary key but is not allowed to have any duplicates.

SELECT * asks for all the available columns, and FROM book indicates that the database should search the book table. In the preceding example, the table has two rows with identifiers of 1 and 2, respectively.

Naming things is hard. Should the table be called book or books? Tables are usually named after the singular version of the entity they contain. In this case each row is a book; therefore, the table is called book. If you're working in a database that already uses the plural form, then stay with that.

Relational Databases

Relational databases are given their name because they deal with both data and the relationships between the data. A table stores information about a single entity such as a book with multiple columns storing different pieces of information such as the title, price, and date of first print. The database manages relationships between tables such as between a book and its authors. The database also provides a way to piece the tables together and perform queries, which is almost always the Structured Query Language (SQL).

Comments start with two dashes (--) and can appear anywhere on a line except within quotes. The following two examples show comments in both styles:

SELECT * -- I want all the columns FROM t1; -- t1 holds sales data

LIKE behaves the same as = if there are no metacharacters. The following two statements have the same output:

SELECT * FROM author WHERE first_name = "Sean"; SELECT * FROM author WHERE first_name LIKE "Sean";

SQL Basics

SQL is a programming language and it is therefore important to be precise when writing code to get the results you want. It is unlike other programming languages in that you rarely use loops and iterators. Instead you are writing expressions to filter and manipulate data and then looking at that data.

Using SQLite

SQL is an embedded database rather than a network-aware database. It's designed to give applications a way to build in a SQL database without the overhead of running a separate server. SQLite is used on everything from Android phones to most web browsers.

The key here is the JOIN book ON book.author_id = author.id. The query already has author through the FROM clause and this joins in book based on the condition that the book's author_id is the same as the author's id.

The condition specified after the ON keyword is important. Think of the two tables as sitting next to each other and you've asked the database to join them together. The database goes down the rows of one table and tries to make any matches it can to the other table.

The UPDATE command changes existing rows. The format of the command is

UPDATE table SET column1=value1, column2=value2 WHERE conditions.

The database used for this book can be found in the downloads section at the book's home page: http://www.pearsonitcertification.com/title/9780789754554

You can download this file and have some sample data to play with.

Finally you can perform a substring match on a column with the LIKE operator. While the equality operator, =, only matches exact strings, LIKE lets you look for strings within the column by using one of two metacharacters:

_ matches a single character. % matches zero or more characters.


Related study sets

Chapter 4. Adjustment, Financial Statment, and Financial Results

View Set

Exam Respiratory System 4 Physiology

View Set

Oxford Latin Course: Part 1 Chapter 10 Translations and Answers

View Set