SQL - Week 7

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Example: Designing the invoicing database by diagramming tool

*1. Creating the Database* - Create a small database called Invoice. - Right-clicking the Databases node of my server and selecting New Database. *2. Adding the Diagram and Initial Tables* - Expand the node for your database - Right click the Diagrams node and select New Database Diagram *Adding table Orders* *Data Types:* - Because OrderID is the primary key for the table, be sure that you allow enough room for your values to be unique as you insert more and more data. - OrderDate uses the Date data type. You may want to utilize one of the DateTime data types instead if you want the time of day on your order in addition to just that date. - CustomerNo: If leading zeros do not matter, and there is no character, then int. (Faster on lookups and Smaller in size) *Adding table x... x3* *Adding the Relationships* - Think about what kind of relationships you need. - All of the relationships that you'll draw with the relationship lines in the SQL Server diagram tool are going to be one-to-zero, one, or many relationships. 1. A one-to-many relationship with Customers as the parent (the one) and Orders as the child (the many) table. 2. Establish a one-to-many relationship from Orders to OrderDetails (there will be one order header for one or more order details) based on OrderID. 3. You need a similar relationship going from Products to OrderDetails (there will be one Products record for many OrderDetails records) based on ProductID. - *Adding Some Constraints* - The part number is formatted as 9A9999 where 9 indicates a numeric digit 0-9 and A indicates a alpha (non-numeric) character. - Right clicking the Products table and selecting Check Constraints to add check constraints. - Expression: -- (PartNo LIKE '[0-9][A-Z][0-9][0-9][0-9][0-9]') - For diagrams for this process, along with slides that aren't jotted down here as well, see normalization2 slides.

O: Other Normal Forms: BCNF

- *Boyce-Codd Normal Form* (a variation on third normal form): address situations in which you have multiple overlapping candidate keys. This can happen only if: -- *a.* All the candidate keys are composite keys (that is, it takes more than one column to make up the key). -- *b.* There is more than one candidate key. -- *c.* The candidate keys each have at least one column that is in common with another candidate key. - This is typically a situation whereby any number of solutions work. It is almost never logically thought of outside the academic community.

Other Design Issues beyond Normalization

- *Choosing Data Types* -- If you're trying to store months (as the number, 1-12) -- those can be done in a single byte by using a tinyint instead of int (4 bytes) - Don't use an nchar or nvarchar if you're never going to do anything that requires Unicode -- these data types take up two bytes for every one as compared to their non-Unicode cousins.

Index Types

- *Clustered* indexes - *Non-clustered* indexes, which comprise: -- Non-clustered indexes on a heap -- Non-clustered indexes on a clustered index - The way the physical data is stored varies between clustered and non-clustered indexes. - The way SQL Server traverses the B-Tree to get to the end of the data varies between all three index types.

Page Types

- *Data:* -- Data pages are the actual data in your table - *Index:* -- Index pages hold both non-leaf and the leaf level of a non-clustered index, as well as the non-leaf level pages of a clustered index.

O: Other Normal Forms: Fifth Normal Form

- *Fifth Normal Form:* This deals with non-loss and loss decompositions. - Essentially, there are certain situations where you can decompose a relationship such that you cannot logically recompose it into its original form. - Again, these are rare, largely academic.

O: Other Normal Forms: Fourth Normal Form

- *Fourth Normal Form:* deals with issues surrounding multi-valued dependence. - This is the situation whereby, for an individual row, no column depends on a column other than the primary key and depends on the whole key (meeting third normal form). - However, there can be rather odd situations where one column in the primary key can depend separately on other columns in the primary key. - These are rare, and don't usually cause any real problems. As such, they are largely ignored in the database world.

SQL Server Storage

- *The Database* -- The highest level of storage - *The Extent* -- The basic unit of storage, used to allocate space for tables and indexes. -- Once an extent is full, the next record takes up not just the size of the record, but the size of a whole new extent. - *The Page* -- The unit of allocation within a specific extent. There are eight pages to every extent. -- The number of rows per page is not fixed -- that depends on the size of the row - *Rows*

Database Normal Forms

- 1NF (First Normal Form) - 2NF (Second Normal Form) - 3NF (Third Normal Form) - BCNF (Boyce-Codd Normal Form) - 4NF (Fourth Normal Form) - 5NF (Fifth Normal Form) - 6NF (Sixth Normal Form) *In most practical applications, normalization achieves its best in 3rd Normal Form.*

Clustered Indexes

- A *clustered index* is unique for any given table -- you can only have one per table. - The leaf level of a clustered index is the actual data -- that is, the data is re-sorted to be sorted in the same physical order of the index sort criteria state. - Once you get to the leaf level of the index, you're at the data. - For a diagram of a B-tree for a clustered index, see the Index slides.

One-to-One

- A one-to-one relationship is one where the fact that you have a record in one table means that you have *exactly* one matching record in another table. *Example:* - In a school database, each student has only one student ID, and each student ID is assigned to only one person.

Clustered Indexes: Searching

- A search begins at the root. -- SQL Server can locate the root node based on an entry in the system metadata view called sys.indexes. - Determine the next page it needs to examine. - Continues the process - Getting to the leaf level of the index means that SQL Server has also reached the desired row(s) and desired data. - Since this is an ordered list, if the next page has records that should be included, the rest of this page must be included.

Brief Summarization of Noramlization

- A super-brief summarization of what normalization is about out to the third normal form. - *The key, the whole key, and nothing but the key.*

Index Navigation

- All SQL Server indexes have leaf level and non-leaf level pages. -- The leaf level holds the "key" to identifying the record -- The non-leaf level pages are guides to the leaf level - The indexes are built over either a clustered table (if the table has a clustered index) or what is called a heap (what's used for a table without a clustered index). - *Clustered tables:* Any table that has a clustered index on it. -- The data is physically stored in a designated order. Individual rows are uniquely identified through the use of the *cluster key* -- the columns that define the clustered index. O: - *Heaps:* Any table that does not have a clustered index on it. -- In this case, a unique identifier, or *row ID (RID)*, is created based on a combination of the extent, pages, and row offset (places from the top of the page) for that row. A RID is necessary only if there is no cluster key available (no clustered index).

One-to-One or Many

- An example table, OrderDetails (the "one or many" side of the relationship) doesn't make much sense without an Orders header to belong to. Likewise, it doesn't make much sense to have an order if there wasn't anything actually ordered. - This relationship, however, gives you the same basic problems that you had with the one-to-one relationship. It's still that chicken or egg thing -- which came first? - In SQL Server, the only way to implement this fully is by restricting all data to be inserted or deleted via stored procedures. - For a diagram, see normalization2 slides.

O: Normal Form Violations: BCNF

- Assume we have a table with the following columns: *studnet_id, subject*, professor Why is this table not in BCNF? Both student_id and the subject columns form the primary key, which means the subject columns is a *prime attribute*. But, there is another dependency here. professor -> subject. While the subject column is a prime attribute, professor is a *non-prime attribute*, which is not allowed by BCNF.

Updating B-Trees

- B-Trees are referred to as *self-balancing* because the way new data is added to the tree generally prevents them from becoming lopsided. - When you add data to the tree, a node eventually becomes full and you need to split it. - Because in SQL Server, a node equates to a page, this is called a *page split.*

Many-to-Many

- Both sides of the relationship may have several records. -- A given order may have many products. Likewise, any given product may be ordered many times.

Choosing Clustered Index

- By default, your primary key is created with a clustered index. - If you decide not to have primary key as clustered index: CREATE TABLE MyTableKeyExample (Column1 int IDENTITY PRIMARY KEY NONCLUSTERED, Column2 int) - *The Pros* - Clustered indexes are the best for queries when the column(s) in question is frequently the subject of a ranged query. - *The Cons* - There's a better place to use it - doing a lot of inserts in a non-sequential order

Summary of Indexes

- Clustered indexes are usually faster than non-clustered indexes. - Only place non-clustered indexes on columns where you are going to get a high level of selectivity (that is, 95 percent or more of the rows are unique). - All Data Manipulation Language (DML: INSERT, UPDATE, DELETE, SELECT) statements can benefit from indexes, but inserts, deletes, and updates (remember, they use a delete and insert approach) are slowed by indexes. - Indexes take up space. - Indexes are used only when the first column in the index is relevant to your query.

Creating Relationships in Diagrams

- Drag and drop. - Start and end your drag in the columns you meant to.

Check Constraints

- Either right click the table heading and choose Check Constraints or choose the Check Constraints icon from the Table Designer toolbar.

Indexes/Keys

- Either right click the table heading and choose Indexes/Keys or choose the Manage Indexes and Keys icon from the Table Designer toolbar.

Normal Forms: Characteristics

- First Normal Form: -- Involves the process of eliminating groups of data and guaranteeing atomicity (the data is self-contained and independent). - Second Normal Form: -- To achieve 2NF, the database must meet the rules for 1NF. -- Furthermore, each column must depend on the *whole* key. - Third Normal Form -- To achieve 3NF, the database has to meet the requirements for 2NF. -- Moreover, no column can have any dependency on any other non-key column. -- Also, derived data is not allowed.

O: Satisfying BCNF

- For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions: -- It should be in the *Third Normal Form*. -- And, for any dependency A -> B, A should be a *super key*. - For a dependency A -> B, A cannot be a *non-prime attribute*, if B is a *prime attribute*. [What does this even mean?]

Overhead of Non-Clustered Indexes on a Heap

- From a physical read standpoint, your system may have to retrieve records from all over the file. - Fetching data from the same page several separate times with non-clustered index -- SQL Server has no way of knowing it has to come back to that physical location because there is no link between the data. - With the clustered index, it knows that's the physical sort, and can therefore grab it all in just one visit to the page[, unlike the non-clustered index]. - For a diagram of this, see the Index slides.

One-to-One Example

- Imagine that you are a subsidiary of a much larger company. Your parent company wants to be able to track all of its customers, and to be able to tell the collective total of each customer's purchases -- regardless of which subsidiary(s) the customer made purchases with. - One way to track all customer information would be to create a master customer database owned by the parent company. - The subsidiaries would then maintain their own customer tables, but do so with a one-to-one relationship to the parent company's customer table. - Any customer record created in the parent company would imply that you needed to have one in the subsidiaries also. Any creation of a customer record in a subsidiary would require that one also be created in the parent company's copy. Hence, "one-to-one."

Normal Form Violations: Practice

- Imagine there's a table containing courses that are taken in a certain semester, with the columns like so: *CourseID, Semester*, #Places, CourseName Assume the column names in bold are the primary key. In this case, this is a *2NF violation.* Notice that CourseName does not rely on the semester. Therefore, the fourth column does not rely on the *entire* key-- but only a part of it. The course name is dependent on the Course's ID, but has nothing to do with which semester it's taken in. Assume we have a similar table with the column names as follows: *Course, Semester*, #Places, TeacherID, TeacherName Note that TeacherName has a dependency on TeacherID, and while TeacherName does likely rely on both the Course and Semester, this violates 3NF because it relies on something other than the key. *Not in 3NF.* To fix this, take TeacherName out of this table, and put it in its own, which has TeacherID as the key.

Implied Indexes Created with Constraints

- Implied indexes are created when one of two constraints is added to a table: -- A PRIMARY KEY -- A UNIQUE constraint (aka, an alternate key)

Non-leaf level nodes

- In most situations there is too much data to reference from the root node, so the root node points at intermediate nodes -- called *non-leaf level nodes.* - Non-leaf level nodes are nodes that are somewhere in between the root and the node that tells you where the data is physically stored. - Non-leaf level nodes can then point to other non-leaf level nodes, or to *leaf level nodes.*

Choosing Wisely: Deciding which Index

- Indexes are a critical part of database planning and system maintenance. - They provide SQL Server with additional ways to look up data and take shortcuts to that data's physical location. - Adding the right index can cut huge percentages of time off your query executsion.

Introduction to Indexes

- Indexes are used to retrieve data from the database more quickly than otherwise. - The users cannot see the indexes, they are just used to speed up searches/queries.

Non-Clustered Index on a Clustered Table

- Just as with non-clustered indexes on a heap, the non-leaf level of the index looks pretty much as it did for a clustered index. - At the leaf level, with non-clustered indexes on a clustered table, it finds the *cluster key* -- That is, it finds enough information to make use of the clustered index. - For a diagram of a non-clustered index on a clustered table, see the Index slides.

Leaf Level nodes

- Leaf level nodes are the nodes where you obtain the real reference to the actual physical data. - Much like the leaf is the end of the line for navigating the tree, the node you get to at the leaf level is the end of the line for your index. From there, you can go straight to the actual data node that has your data on it.

What is Normalization?

- Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update, and Deletion anomalies. (Modification Anomalies) - Normalization rules divide larger tables into smaller tables and links them using relationships. The purpose of Normalization in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.

How Data is Accessed in SQL Server

- Only two ways: -- Using *table scan* -- Using an *index* - Which method SQL Server uses to run your particular query will depend on -- what indexes are available -- what columns you are asking about -- the kind of joins you are doing -- the size of your tables

Overhead of Splitting

- Page splitting adds a substantial amount of overhead, as it involves -- Creating a new page -- Migrating rows from the existing page to the new page -- Adding your new row to one of the pages -- Adding another entry into the parent node --- This entry in the parent node also has the potential to cause a page split at that level, and the process starts all over again. - Page splits have a very negative impact on system performance.

Balanced trees (B-Trees)

- Provides a consistent and relatively low-cost method of finding your way to a particular piece of information. - Every time the tree branches, approximately half the data is on one side and half is on the other side. - For a diagram of a B-Tree, see the Index slides.

Enforcing Many-to-Many

- SQL Server has no way of physically establishing a direct many-to-many relationship. - You have to have an intermediate table to organize the relationship. - Some tables create your many-to-many relationships almost by accident as a normal part of the normalization process -- others are created entirely for the sole purpose of establishing this kind of relationship. - For a diagram, see normalization2 slides. - By using the join syntax, you can relate one product to the many orders that it's been part of, or you can go the other way and relate an order to all the products on that order.

Searching through a Non-Clustered Index on a Clustered Table

- SQL Server starts off with a ranged search. It does one single lookup in the index and can look through the non-clustered index to find a continuous range of data that meets the criterion (LIKE 'T%'). This kind of lookup is called a *seek*. - The second kind of lookup then starts -- the lookup using the clustered index. This second lookup is very fast; the problem lies in the fact that it must happen multiple times. - SQL Server retrieved a list from the first index lookup (a list of all the names that start with "T"), but that list doesn't logically match up with the cluster key in any continuous fashion. Each record needs to be looked up individually.

Rows

- Size limit: 8KB. - column-count limit: a maximum of 1,024 standard (non-sparse) columns.

Searching B-Trees

- Start at the root and look through the records until you find the last page that starts with a value less than what you're looking for. - You then obtain a pointer to that node and look through it until you find the row that you want. - For a diagram of a simple B-Tree to search, see the Index slides.

Zero or One-to-One Example

- Take the parent company versus subsidiary example: -- You might prefer to create a relationship where the parent company needs to have a matching record for each subsidiary's records, but the subsidiary doesn't need the information from the parent. - You could have subsidiaries that have very different customers (such as a railroad and construction company). The parent company wants to know about *all* of the customers regardless of which business they came from, but your construction company probably doesn't care about your railroad customers. In such a case, you would have *zero or one* construction customers to *one* parent company customer record.

Non-Clustered Indexes on a Heap

- The leaf level is not the data -- instead, it is the level at which you can obtain a pointer to that data. - This pointer comes in the form of a row identifier or RID -- Which is made up of the extent, page, and row offset for the particular row the index is pointing to. - Even though the leaf level is not the actual data (instead, it has the RID), you can find the data with one more step[,] because the RID has the full information on the location of the row.

Use of Indexes

- The optimizer takes a look at all the available indexes and chooses the best one - Once that index is chosen, SQL Server navigates the tree structure to the point of data that matches your criteria and again extracts only the records it needs. *Difference* from table scans - because the data is sorted, the query engine knows when it has reached the end of the current range it is looking for. It can then end the query.

SQL Server and Handling Zero or one-to-one Relationships

- The reason SQL Server can handle a zero or one-to-one, but not a one-to-one relationship is due to the "which goes first" problem. - In a true one-to-one relationship, you can't inert into either table because the record in the other table isn't there yet -- it's a paradox. - However, with a zero or one-to-one, you can insert into the required table first (the "one"), and the optional table (the "zero or one"), if desired, second.

One-to-Zero, One, or Many

- The referencing field (the one in the table that has the foreign key constraint) is allowed to be null; that is, the fact that you have a record in the "one" table doesn't necessarily mean that you have any instances of matching records in the referencing table.

Understanding Relationsips

- There are three kinds of major relationships: -- One-to-one -- One-to-many -- Many-to-many - Each of these has some variation, depending on whether one side of the relationship is nullable. -- For example, instead of of a one-to-one relationship, you might have a zero or one-to-one relationship.

Zero or One-to-One

- This is essentially the same as a one-to-one, with the difference that one side of the relationship has the option of either having a record or not having one.

Diagramming Database

- To diagram a database, we visualize both the entities and their relationships. - *Tables* - Change views

Enforcing One-to-Zero, One, or Many

- To enforce this kind of relationship -- *FOREIGN KEY constraint:* You simply declare a FOREIGN KEY constraint on the table that serves as the "many" side of the relationship, and reference the table and column that is to be the "one" side of the relationship (you'll be guaranteed only one in the references table because you must have a PRIMARY KEY or UNIQUE constraint on the column(s) referenced by a foreign key). -- *Triggers:* You actually need to add two triggers -- one for each side of the relationship. Add a trigger to the table that is the "many" side of the relationship and check that any row inserted or changed in that table has a match in the table it depends on (the "one" side of the relationship). Then, you add a delete trigger and an update trigger to the other table -- this trigger checks records that are being deleted (or changed) from the referenced table to make sure that it isn't going to *orphan* (make it so it doesn't have a reference).

Understanding Indexes

- Webster's dictionary defines an *index* as: -- A list (as of bibliographical information or citations to a body of literature) arranged usually in alphabetical order of some specified datum (as author, subject, or keyword). - For an example index, see Index slides. - The index concept is similar in databases.

Page Splits

- When a page become full, it splits. -- A new page being allocated -- Approximately half the data from the existing page is moved to the new page.

Use of Table Scans

- When a table scan is performed, SQL Server starts at the physical beginning of the table, looking through every row in the table. - As it finds rows that match the criteria of your query, it includes them in the result set.

Index Navigation at the leaf level

- When it gets to the leaf level of the index: -- With a clustered index, it could have stopped right here, but with a non-clustered index, it still has more work to do. -- If the non-clustered index is on a heap, there is just one more level to go. It takes the row ID from the leaf level page and navigates to it. It is not until this point that it is at your actual data.

Adding and Deleting Tables

- You can add a new table to the diagram in one of two ways. - *For an existing table:* right click and select add table. - *For a new table:* right click and select new table.

Diagramming Relationships

- You get to this box by right-clicking the table heading and choosing Relationships or by selecting the Relationships icon on the Table Designer toolbar. - Much like it sounds, this dialog box allows you to edit the nature of the relationships between tables.

Enforcing Zero or One-to-One Relationships

- Zero or one-to-one relationships can be enforced in SQL Server through: -- *A combination of a unique or primary key with a foreign key constraint.* A foreign key constraint can enforce that *at least* one record must exist in the "one" (or parent company in our example) table, but it can't ensure that *only* one exists (there could be more than one). Using a primary key or unique constraint would ensure that one was indeed the limit. -- *Triggers.* Note that triggers would be required in both tables.

O: Prime Attribute

A key attribute; that is, an attribute that is part of a key or is the whole key. Basically, a column that makes up a key or is the key.

The CREATE INDEX Statement

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX <index name> ON <table or view name> (<column name> [ASC | DESC] [, . . .n]) INCLUDE (<column name> [, . . .n] [WHERE <condition>]) [WITH..... ... ... etc] See Index slides for full syntax. - Follows the same CREATE <object type> <object name> syntax. - Index isn't really a standalone kind of object. It has to go together with a table or view, and you need to state the table that the column(s) are ON.

Options in CREATE INDEX Statement

Everything after the ON <table or view name> (<column name>) clause is optional: *ASC/DESC* -- allows you to choose between an ascending and a descending sort order for your index. The default is ASC. *INCLUDE* -- INCLUDE columns, SQL Server only adds them at the leaf level of the index. *WHERE* -- place condition on which rows are included in your index. The syntax is the same as the WHERE clause you're familiar with from the SELECT statement *WITH* -- tells SQL Server that you'll indeed supply one or more of the options that follow ...

Creating, Altering, and Dropping Indexes

Indexes can be created in two ways: - Through an explicit *CREATE INDEX* command - As an implied object in a constraint is created

Page Split Details

When a page split occurs, data is automatically moved around to keep things balanced. The first half of the data is left on the old page, and the rest of the data is added to a new page -- thus you have about a 50-50 split, and your tree remains balanced. - For a diagram, see Index slides.


संबंधित स्टडी सेट्स

Unit 9 Environmental Science MCQ

View Set

Psych/Mental Health HESI Study Plan A

View Set

Astronomy II - test I (practice questions + quizzes + test questions)

View Set

ISSA Domain One Basic and applied science

View Set

P.M. Ch.8: Project Quality Management

View Set

GEO111 - Grundzüge und Sphären

View Set

Philosophy short answer questions

View Set