6 Data storage Indexes
Consider the following scenario: • A table has 800,000,000 rows. • Each row is 300 bytes. • Magnetic disk transfer rate is 2 gigabytes per second. • Assume 1 gigabyte is approximately 1,000,000,000 bytes. Assuming no free space, a table scan requires approximately how many seconds?
(800,000,000 * 300)=240,000,000,000 / 1,000,000,000= 240 / 2 = 120seconds
sorted table
A Sorted table can only have one sort column and therefore only one Primay or cluster index. Tables can have many secondary indexes. All indexes of a heap or hash table are secondary, since heap and hash tables have no sort column.
Indexes may also be dense or sparse:
A dense index contains an entry for every table row. A sparse index contains an entry for every table block. many rows are in a single block. Index on either a unique or non-unique table sort column. A sparse index is always on the sort column and contains one entry for each table block. Sparse indexes are much faster than dense indexes since sparse indexes have fewer entries and occupy fewer blocks.
Multi-level indexes
A multi-level index stores column values and row pointers in a hierarchy. The bottom level of the hierarchy is a sorted single-level index. The bottom level is sparse for primary and clustering indexes, or dense for secondary indexes.
To execute a SELECT query, the database can perform a table scan or an index scan
A table scan is a database operation that reads table blocks directly, without accessing an index. An index scan is a database operation that reads index blocks sequentially, in order to locate the needed table blocks.
Update
An update to a column that is not indexed does not affect the index. An update to an indexed column is like a delete followed by an insert. The index entry for the initial value is deleted and an index entry for the updated value is inserted.
how does a database choose what kind of scan it will do?
Hit ratio, also called filter factor or selectivity, is the percentage of table rows selected by a query. When a SELECT query is executed, the database examines the WHERE clause and estimates hit ratio. If hit ratio is high, the database performs a table scan. if hit ratio is low: -Looks for an indexed column in the WHERE clause. -Scans the index. -Finds values that match the WHERE clause. -Reads the corresponding table blocks. If the WHERE clause does not contain an indexed column, the database must perform a table scan.
binary search
If a single-level index is sorted, each value can be located with a binary search. In a binary search, the database repeatedly splits the index in two until it finds the entry containing the search value:
Inserts, updates, and deletes with single level dense indexes
Insert. When a row is inserted into a table, a new index entry is created. Since single-level indexes are sorted, the new entry must be placed in the correct location. To make space for the new entry, subsequent entries must be moved, which is too slow for large tables. Instead, the database splits an index block and reallocates entries to the new block, creating space for the new entry. It splits the block into two blocks. the 2nd block makes a space for the new entry in the right order.
can a sorted table have both a clustering and primary index?
No, it cant. -A sorted table has one sort column. -That sorted column can be unique, or not unique. -If the index is on the sorted unique column it is a primary -if the index is on a non-unique sorted column it is a Clustering
indexes on a sorted table may be
Primary Index- is an index on a unique sort column Clustering Index- is an index on a non-unique sort column secondary index- an index that is not on the sort column
Delete.
When a row is deleted, the row's index entry must be deleted. The deleted entry can be either physically removed or marked as 'deleted'. Since single-level indexes are sorted, physically removing an entry requires moving all subsequent entries, which is slow. For this reason, index entries are marked as 'deleted'. Periodically, the database may reorganize the index to remove deleted entries and compress the index.
With a sparse index Inserts, updates, and deletes
With a sparse index, each entry corresponds to a table block rather than a table row. Index entries are inserted or deleted when blocks split or merge. Since blocks contain many rows, block splits and mergers occur less often than row inserts and deletes. Aside from frequency, however, the behavior of sparse and dense indexes is similar.
multi-column index
an index can be defined on multiple columns. In a multi-column index, each index entry is a composite of values from all indexed columns. In all other respects, multi-column indexes behave exactly like indexes on a single column.
Consider the following scenario: • A table has 600,000,000 rows. • Each index block is 4 kilobytes. • Each index entry is 10 bytes. • Assume 1 kilobyte is approximately 1,000 bytes. Assuming the index is dense and sorted, a binary search for one row reads approximately how many blocks?
how many blocks are in the table? (600mill entries / entriesperblock) how many entries or rows are in a block? 4k / 10 = 400 entries or rows per block
Consider the following scenario: • A table is sorted on indexed column and has 50,000,000 rows. • Each row is 300 bytes. • Table and index blocks are 2 kilobytes. • Assume 1 kilobyte is approximately 1,000 bytes. Assuming the index is sparse, how many entries are in the index?
total size of table= 15billion bytes (50m * 300) number of entries in the index is 7.5 million bytes. found by: -15billonbytes(total table size) / 2000bytes (bytes per block in the index) number entries in the index = total size of all the blocks in the index.