5. Relational Databases and mySQL

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Q - What is the primary goal of relational database design?

Reducing redundant data

Q - What best describes the order in which data should be added to a database?

Data must be added to primary key fields before adding data to related foreign key fields.

Q - How are NULL values inserted into a column?

Explicitly stating NULL in an INSERT statement When adding a new column to an existing table using an ALTER TABLE statement Leaving a nullable column out of an INSERT statement

D - MySQL

MySQL is a relational database management system (RDBMS) that runs as a server providing multi-user access to multiple databases.

Q - A well-designed ERD includes:

- The tables required by the database, including the name of each table - A list of fields contained in each table - Metadata about each field, including the name of the field, its data type, and whether or not it is nullable - Identification of all primary and foreign key fields - How each table is related to each other - A more detailed ERD may contain information about indexes, how the keys are named, and details about the cardinality of each relationship, but for now, we'll focus on the basics. An ERD never includes data - just a description of the kind of data that the table will hold. That said, it is often useful to have access to sample data sets, to help you identify what fields the ERD should include and what the appropriate data type is for each field.

Q - What statement will remove a field from a table?

ALTER TABLE tableName -> DROP fieldName;

Given a table with the following structure, which statements will correctly add one record to the table? person person_id INT AUTO_INCREMENT PRIMARY KEY first_name VARCHAR(25) last_name VARCHAR(50) birthday DATE

INSERT INTO person (first_name, last_name, birthday) VALUES ('Elizabeth', 'Jackson', '2003-12-1'); INSERT INTO person VALUES ('Elizabeth', 'Jackson', '2003-12-1');

Q - Which data manipulation keyword is used to combine the records from two or more tables?

JOIN

Q - How many primary keys can be present in a table?

One at the most.

D - Entity Integrity

One of the keystones of relational database design is entity integrity, which guarantees that each record in a table is unique within that table. All RDBMSs enforce entity integrity automatically, but the database creator has to appropriately define a primary key in each table for this to work. As data is added to a table, the RDBMS will check two properties to ensure that the new record is unique: - That no other existing record in the table has the same primary key value as the new entry. - That there is a value entered for each field of the primary key. If a new record fails to meet both of these criteria, the RDBMS will reject the record and prevent it from being added to the table.

Q - The following query includes a syntax error. Identify the error. SELECT brand_name, COUNT (DISTINCT customer_name) FROM depositor, account WHERE depositor.account_number = account.account_number GROUP BY branch_id

The GROUP BY clause must use fields listed in the SELECT clause.

Q - What is true about creating a join between two tables?

The fields used in the join can have different names. The fields used in the join must have the same data type.

D - Different kind of keys

1. Natural Key - Using a piece of data that will be tracked anyway as the key (ex. using the entry date, a phone number, email) 2. Surrogate Key - A field/collection of fields created specifically to identify each record in a database. (Ex. Bank account number) 3. Candidate Key - A field that is unique to each record but may not act as the primary key (Ex. Social security number - it can be unique, but for security reasons it won't be used)

Q - What is the largest value you can store in field`DECIMAL(10,3)`?

9,999,999.999 10 digits total, three on the right of the decimal point.

D - Database

A database is a structured representation of data that we can read from and write to, often stored separately from any application that uses the data. It can be made available to multiple applications, such as allowing access to a contact database from a calendar app or posting photos from an image database using a social media application. While any application that uses a specific database must know how to access the data, the data itself is simply a pool that any authorized app can pull from.

Q - What is true about importing data from a csv file?

Each column in the original dataset must use a data type that is compatible with the target column in the relational database. The columns in the original dataset must be delineated in some way, using a comma or pipe, for example.

Q - What conditions are applied for first normal form (1NF)?

Each record must be unique Each field includes a single value

Q - Functional dependency is a relationship between or within _____.

Fields (Column categories)

Q - What will happen if a foreign key field is defined to use cascade delete?

If a primary key record is deleted, the related foreign key record will also be deleted.

Q - When writing a SELECT query, you can include an alias for any field or table. In which of the following cases is an alias required?

If a table's foreign key references the primary key of the same table, you must use aliases to create two different references to the same table.

Q - Which of the following statements is correct regarding primary keys? a. A table can include up to three primary keys. b. A primary key must be related to a field in another table. c. Only one candidate key can be the primary key. d. A primary key cannot include more than one column.

Only one candidate key can be the primary key.

Q - The database you are working with includes two tables: customer and customer_order. customer customer_id (primary key) customer_first_name customer_last_name customer_order order_id (primary key) customer_id (foreign key) order_date order_total You want to generate a list of the customers who have never placed an order with your company. Which of the following statements will retrieve the desired results?

SELECT c.customer_id, c.customer_first_name, c.customer_last_name FROM customer c [[LEFT OUTER JOIN]] customer_order co ON c.customer_id = co.customer_id WHERE co.order_id IS NULL;

Q - What statement will return a list of products that include the word iron in the product name?

SELECT product_id, product_name FROM product WHERE product_name LIKE '%iron%';

Q - What SQL statement will change "Thomas" to "Michel" in the LastName column in the Users table?

UPDATE Users SET LastName = 'Michel' WHERE LastName = 'Thomas';

Q - Given a table named person with the following data, which statements will correctly change Elizabeth Jackson's name to Elizabeth Hunter? person_id first_name last_name birthday 1ElizabethJackson2003-12-01 2RoryJackson1966-01-15 3HowardCornish1966-04-29 4CapriceConnelly1967-01-21 5DanielWalls1969-04-28

UPDATE person SET last_name = 'Hunter' WHERE person_id = 1; UPDATE person SET last_name = 'Hunter' WHERE first_name = 'Elizabeth' AND last_name = 'Jackson';

Q - What statement to work on/switch to a database?

USE databaseName;

D - Denormalization

When optimizing for extreme performance, there are times that you are less concerned with data protection and choose to move back towards 1NF. This is called denormalization. Because a well-normalized database tends to be slower, it is common for business intelligence and data warehousing systems to have both a normalized version of the database (especially for active data) and a denormalized version. These systems are used to process massive amounts of data and having to do dozens of table joins and calculations over huge result sets can be very slow. The accepted solution to this problem is to periodically poll your database for changes and move the changed data into a denormalized structure, and do any known calculations (sales by day, month, quarter, year, etc.) in advance. Then when a user requests the reports, they can use this pre-optimized data to provide a quick response.

Q - MySQL uses what default DATE format?

YYYY-MM-DD

Q - Which of the following statements is correct regarding candidate keys? a. A candidate key must be related to a field in another table. b. Any field that can identify a record in a table is a candidate key. c. No table can have more than one candidate key. d. A candidate key cannot include more than one field.

b. Any field that can identify a record in a table is a candidate key. (ex. SSN)

Q - Left outer join preserves tuples only in the relation named before the ____.

left outer join operation In a left outer join, all the rows from the left table (specified before the "LEFT OUTER JOIN" keywords) are included in the result set, along with matching rows from the right table (specified after the "LEFT OUTER JOIN" keywords). If there are no matching rows in the right table, the result set will contain NULL values for the columns in the right table.

Q - What statement will create a database?

CREATE DATABASE databaseName;

D - Data Redundancy + anomalies it could cause

Data redundancy is the act of storing the same piece of data multiple times in the database. (Ex. A person with multiple bank accounts) Update Anomaly - Repeated data not being consistent during updates Delete Anomaly - Data that is deleted but also present in other data causing an error

Q - The database you are working with includes two tables: customer and customer_order. customer customer_id (primary key) customer_first_name customer_last_name customer_order order_id (primary key) customer_id (foreign key) order_date order_total You want to get a list of customers whose average order total is at least $1,000, with the highest average order on the first row. Which statement will retrieve these results?

SELECT c.customer_id, c.customer_first_name, c.customer_last_name, AVG(co.order_total) average_order_total FROM customer c JOIN customer_order co ON c.customer_id = co.customer_id GROUP BY c.customer_id HAVING AVG(co.order_total) >= 1000 ORDER BY AVG(co.order_total) DESC;

Q - You want to retrieve a list of customers who live in Kentucky (KY) and Ohio (OH). What statements will work?

SELECT last_name, first_name, state_abbr FROM customer WHERE state_abbr = 'KY' OR state_abbr = 'OH'; SELECT last_name, first_name, state_abbr FROM customer WHERE state_abbr IN ('KY', 'OH');

Q - You want to retrieve all records from a table named product_price, where the price is less than $50. Which of the following SELECT statements will work?

SELECT product_name, price FROM product_price WHERE price < 50;

Q - What statement will display the number of customers from each state?

SELECT state_abbr, COUNT(*) num_customers FROM customers GROUP BY state_abbr;

Q - What is entity integrity?

A condition that requires every record to have a primary key value. (aka, a unique identity. no duplicates!!)

Q - What statement will delete a table from the current database?

DROP TABLE tableName;

Q - What is correct regarding composite keys?

A composite key must be a candidate key.

D - Functional Dependency

A functional dependency, as the name implies, is a dependency relationship. That is, "column A depends on column B", or "columns A, B, and E depend on columns C and D". In a well-designed table, all columns will depend on at least one column in the table. If there are columns that are independent of the others, they are candidates to be moved to a separate table. (Ex. In a table listing employee information, the name is functionally dependent on the social security number. That means if we know the SSN, we can find the name. However, the reverse isn't true, since names aren't inherently unique. Therefore, the SSN is not functionally dependent on the name.)

Q - The database you are working with includes the following employee table: employee employee_id (primary key) emp_first_name emp_last_name manager_id (foreign key relates to employee.employee_id) What results would you expect from the following query? SELECT concat(m.emp_last_name,', ',m.emp_first_name) manager_name, concat(e.emp_last_name,', ',e.emp_first_name) employee_name FROM employee m JOIN employee e ON m.employee_id = e.manager_id ORDER BY m.emp_last_name, m.emp_first_name;

A list of manager names in the first column and the employees under each manager in the second column.

D - Normalization

A properly-normalized design improves performance and reduces the complexity of relationships by minimizing data duplication (redundancy). Codd showed that is possible for all data domains to be reduced to simple table relationships. A database where all relations are reduced in this manner, following the process of normalization, is said to be normalized. Designing databases is a creative endeavor that is a mixture of art and science. Even with set guidelines for normalization, it's possible to have different designs for the same database, in part because of how the data is used and in other part because of how the designer thought through the systems during the design process. Moving a design from 1NF to 3NF has a great impact on the data integrity of your system, but at the cost of complexity and potentially performance. The way you design a database for a dozen users is typically much less formal than the way you would design a database for a million users. Regardless of the situation you find yourself in, an awareness of good normalization techniques and an understanding of the application's data and needs is vital to building correct database designs. In the long run, it is most important to have a well-designed database from the outset, and taking time to normalize to 3NF is good to that end. After identifying a 3NF structure for the database, it may be appropriate to denormalize it a little, but you should do so only with justification.

D - Database management system (DBMS)

A software system that manages databases. The DBMS executes commands, provides security, enables network access, and provides admin tools for Database Administrators (DBAs) to work with database files. A subset of DBMSs includes relational database management systems (RDBMS) that are designed specifically to work with relational databases. There are many options for RDBMSs, including MySQL, PostgreSQL, Microsoft SQL Server, Oracle Database, and DB2. The choice of DBMS determines some factors of how the data itself is defined organized, but in large part, all RDBMSs do the same thing.

D - Second Normal Form (2NF)

A table is in 2NF if and only if: - It is in 1NF - Every non-primary-key column is functionally dependent on the entire primary key but not functionally dependent on any proper subset of the primary key. (Ex. A series of orders where a customer could hypothetically order more than one item. A way to satisfy 1NF, we can make two separate tables - one for all products, with unique ProductIDs, and one with all customers, with unique CustomerIDs. Then, we can combine the OrderID and ProductID into a single key (composite key) so that the OrderID represents the full order, while each ProductID represents each item ordered. The TotalPrice is then dependent on both the order and the product, meaning it meets the criteria for 2NF.)

Q - What is the difference between CHAR, VARCHAR, TINYTEXT, and MEDIUMTEXT?

CHAR items are a fixed length. They're the fastest the store/retrieve, but have more storage space. They have a max length of 255. VARCHAR can have variable length. They're slower to store/retrieve but take up less space. They have a max length of 65,535. TEXT is for written data that may include multiple paragraphs. TINYTEXT has a max length of 255 characters. MEDIUMTEXT has a max length of 16,777,215 characters. LONGTEXT has a max length of 4,294,967,295 characters.

Q - If a SQL statement is set up to use a left outer join, what records will the statement return?

All records in the table named first in the JOIN statement and only the matching records in the second table.

D - Indexing a key

By default, primary keys are indexed, which means that all fields associated with any primary key in the database when a database is opened are loaded in to memory and sorted in a specific order. The fact that they are in memory means that the database can access them very quickly, and the fact that they are sorted means that the database does not have to search through more records than necessary to find the required key values. The primary key is connected to the rest of the data in the associated record, so the database simply has to pull data from that record and can safely ignore all other records in the table. Ex. An index in a book. A book's index normally appears at the end of the book, where it is easy to locate. In addition, the terms in the index are sorted in alphabetic order, so the user can easily find the term they are looking for. The index also tells the reader where to find the term in the book, so that the reader can go straight to the correct location in the book.

Q - What is the purpose of an index set on a table?

An index improves the speed of data retrieval operations on a table. (Ex. Book index)

Q - What is referential integrity?

Any value used as a foreign key must exist as a primary key in the related table.

Q - The database you are working with includes two tables: customer and customer_order. customer customer_id (primary key) customer_first_name customer_last_name customer_order order_id (primary key) customer_id (foreign key) order_date order_total You want to generate a list of the ten customers who have ordered the most from your company based on total purchase amounts. Which of the following statements will retrieve the desired results?

B - SELECT c.customer_first_name, c.customer_last_name, SUM(co.order_total) AS total_to_date FROM customer c JOIN customer_order co ON c.customer_id = co.customer_id GROUP BY c.customer_id ORDER BY SUM(co.order_total) DESC LIMIT 10; - SUM is for combining multiple things - c.customer_id = co.customer_id - Group by customer_id

Q - Given a primary key named bookID in a table named book, what would create a foreign key that references that field in a table named bookFormat? Assume that the primary key and the foreign key fields have the same name.

CONSTRAINT FK_bookFormat FOREIGN KEY (bookID) REFERENCES book(bookID)

Q - What statement will output a list of fields and field properties for a table?

DESCRIBE tableName;

Q - What statement will delete a database?

DROP DATABASE databaseName;

D - Third Normal Form (3NF)

For a table to be in third normal form, it must meet the following criteria: - It is in 2NF (and by extension, in 1NF). - No non-key field depends on another non-key field. In essence, the goal of 3NF is to make sure that all data in a given table is relevant to the object (or entity) described by the table. Boiled down, this means that only data immediately relevant to a customer should be in a Customer table and only data immediately relevant to a product should be in a Product table. Ex. Telephone ID, which is dependent on TelephoneNumber, PhoneTypeID, and ContactID

Q - What happens if a foreign key field is defined to use cascade update/delete?

If the value of a primary key field is changed/deleted, the foreign key value will be changed to use the new value/deleted automatically.

Q - What is the correct order of clauses in a SQL statement?

SELECT, WHERE, GROUP BY, HAVING

Q - Which statement will show you a list of available databases on the current SQL server?

SHOW DATABASES;

Q - What statement will output a list of tables in the current database?

SHOW TABLES;

Q - Structured vs Unstructured Databases

Structured - The data is organized in a specific pattern. - Easy to control what is available, what to find - Developer can limit the kinds of data stored to improve data integrity (reliability and accuracy) - Trade-off: Creating + finding is relatively slower - Good for predictable data, such as bank accounts, personnel records, inventories Un-structured - Sans strict controls (can have some kind of structure) - Faster, but prone to duplicate data - Good for unpredictable/irregular data, such as social media posts, online reviews, and other user-generated content

Q - In which clause are aggregate functions defined in a SQL statement?

The SELECT clause Aggregate functions are defined in the SELECT clause of a SQL statement. The SELECT clause is used to specify the columns that should be returned by the query, and it can also include expressions that use aggregate functions to perform calculations on groups of rows in the result set. Some commonly used aggregate functions in SQL include COUNT, SUM, AVG, MAX, and MIN.

Q - What is denormalization?

The process of removing strict normalization rules to make the database function more efficiently.

Q - You are building a query that pulls data from four different tables in a database. Which of the following statements describes how many JOIN clauses the query will need?

The query will need three JOIN clauses, one for each pair of tables required by the query.

The database you are working with includes two tables: customer and customer_order. customer customer_id (primary key) customer_first_name customer_last_name customer_order order_id (primary key) customer_id (foreign key) order_date order_total What results would you expect from the following query? SELECT customer_id, COUNT(*) order_count, AVG(order_total) average_order_total FROM customer c JOIN customer_order co ON c.customer_id = co.customer_id GROUP BY customer_id;

The query will not run because the customer_id field is not qualified.

D - First Normal Form (1NF) and exceptions to rules

To achieve 1NF, the table must satisfy the following conditions: - There is no top-to-bottom ordering to the rows. - There is no left-to-right ordering to the columns. - Every row can be uniquely identified. - Every row/column intersection (field) contains only one value. As for the violation of condition #4 (Every field contains only one value), whenever you end up with a situation like this, the best option is to create a new table for that data.

Q - You just ran a DELETE statement and realized after the fact that it deleted the wrong set of records. What is your best option for restoring the data?

Use the most recent backup script to restore the database to the most recent backed-up version. There's no 'undo' son!!

D - Entity Relationship Diagram (ERD)

When discussing an abstract form of a database, we typically use the term entity to reference each object (such as people, things, places, and events) that we want to manage in a database, and we use entity relationship diagrams (ERDs) to represent the entities and the relationships between the entities. An ERD, a visual representation of the database structure, gives us the following advantages: - It helps identify places where the proposed structure may not work. While normalization should help you identify exactly how the tables are related to each other, you may discover while building the ERD either that one or more tables aren't related to any of the other tables or that some of the relationships don't make sense when you try to map them to the normalized tables. Identifying these problems in the design phase will help avoid problems that might otherwise appear in the SQL scripts used to define the database. - It allows the team to see the database's structure, which helps each team member quickly identify what fields are in which tables and how the tables are related. - It gives you a single, condensed representation of the structure, which helps you write SQL statements more quickly, especially when the SQL statement references two or more tables.


Ensembles d'études connexes

ECO-251: Chapter 1 - Limits, Alternatives, and Choices

View Set

TBR Psych 3 - Learning and Memory

View Set

Experimental Psychology '19 Review Questions - 1st Exam

View Set