Notes 2-Indexing and Hashing

Ace your homework & exams now with Quizwiz!

Describe the Insertion and Deletion Process for Indexing?

*Deletion* -Have to find the record -If last record with a particular search key value, delete that search key value from index. -For dense indexes this is like deleting a record in a file. -For sparse indexes, delete a key value by replacing key value's entry in index by next search key value. If that value already has an index entry, delete the entry. *Insertion* -Find place to insert. -Dense index: insert search key value if not present. -Sparse index: no change unless new block created.

Difference between Dense Index vs Sparse Index

*Dense Index* -An index record appears for every search key value in file (figure F.9a, p.F-11). -This index record contains search key value and a pointer to the actual record. *Sparse Index* -Index records are created for only some of the search key values (figure F.9b). -To locate a record, we find the index record with the "largest" search key value less than or equal to the search key value we are looking for. -We start at that record pointed to by the index pointer, and proceed inside the file (that is, sequentially) until we find the desired record. -This only works if we can proceed through the file in order once we arrive at a particular record.

Decribe Index Performance

*For primary and Clustering* -Indexes on attribute(s) where the file is ordered on those attribute(s). -Means we only have to point to the first one, and the rest (if any) will follow in the file. -Only need entries for first occurrence of each search key value. --E.g. only need entries for the first record in each department, if file sorted by department. -Index file has entries for each search key, and the block they first occur in. --As there can be several entries with the desired search key value (why?), entries you want can span blocks of disk storage. -Relation must be stored in sorted order on search key field. *For secondary* -Index on a field that the file is not sorted on. -If the field is a key, there will be only one occurrence, but they are not ordered in the file. -If field is not a key, we can also have many occurrences for each search key value, dispersed at random through the relation.

What is the Difference between Indexing and Hashing?

*Indexing* -An index for a file works like an index in a book. -Entries in alphabetical order tell us where to find detailed information. *Hashing* -Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value

What is the difference between Primary vs Clustering Vs Secondary Indexes?

*Primary Index* -If a file is ordered on key attributes specified in the index, the index is called a primary index. *Clustering Index* -If a file is ordered on non-key attribute(s) specified in the index, the index is called a clustering index. *Secondary Index* If the file is not ordered on the attributes specified in the index, the index is called a secondary (and thus non-clustering) index.

What are the evaluation methods for Indexing & Hashing?

-Access type - what kinds of access can be done efficiently, e.g. find a specific value, or a range of values. -Access time - time to find a particular data item or set of items. -Insertion time - time taken to insert a new data item (includes time to find the right place to insert). -Deletion time - time to delete an item (includes time taken to find item, as well as to update the index structure). -Space overhead - additional space occupied by an index structure.

What is a B+ Tree

-B+-tree file structure maintains its efficiency despite insertions and deletions, but it also imposes some overhead. -Every insertion or deletion balances the tree, so that levels are inserted (or removed) automatically when required by growth or shrinking. -A B+-tree index is a balanced tree in which every path from the root to a leaf is of the same length. The rules for a B+-tree are as follows (from Connolly & Begg, Appendix F, p. F-13): -If the root is not a leaf node, it must have at least two children. -For a tree of order n, each node (except the root and leaf nodes) must have between n/2 and n pointers and children. -If n/2 is not an integer, the result is rounded up (e.g. ). -For a tree of order n, the minimum number of key values in a leaf node must be between (n - 1)/2 and (n - 1) pointers and children (records or buckets). -If (n - 1)/2 is not an integer, the result is rounded up (e.g. ). -The number of key values contained in a non-leaf node is 1 less than the number of pointers (last pointer is used to link leaf nodes together). -The tree must always be balanced: that is, every path from the root node to a leaf must have the same length. -Leaf nodes are linked in order of key values. -A typical node (above figure F.11, pp.F13-14) contains up to n-1 search key values K1, K2, ..., Kn-1, and n pointers P1, P2, ..., Pn -For leaf nodes (figure F.11), pointer Pi points to either --a file record with search key value Ki, or --a bucket of pointers to records with that search key value. -Bucket structure is used if search key is not a key, and file is not sorted in search key order (thus there could be many records to point to). -Pointer Pn (nth pointer in the leaf node) is used to chain leaf nodes together in linear order (search key order). This allows efficient sequential processing of the file. .

What is a B- Tree and Discuss its Advantage and Disadvantage?

-B-tree indexes are similar to B+-tree indexes. -Difference is that B-tree eliminates the redundant storage of search key values. -In B+-tree of figure F.12, some search key values appear twice. -A corresponding B-tree allows search key values to appear only once. -Thus we can store the index in less space. Advantages: -Lack of redundant storage. -Some searches are faster (key may be in non-leaf node). Disadvantages: -Leaf and non-leaf nodes are of different size, which complicates storage. -Deletion may occur in a non-leaf node (more complicated) Generally, the structural simplicity of B+-tree is preferred. -Oracle uses an unbalanced B+-tree type index structure. -Path from root to leaf nodes can vary. -Unbalanced means less overhead on deletion and insertion. -DBA must occasionally rebuild index.

What is Binary Search

-Binary search algorithm -Read the record in the middle of the file. -If this is your desired record, quit, else look to see your record is before or after this middle record. -Discard the half your record is not in (plus the middle one), go back to step 1. -How does it perform? -At each step we discard half the remaining records from our search. -We might get lucky and find our record quickly. -If we don't, the worst that can happen is that we do steps, where n is the number of records

What is Extendable Hashing?

-Extendable hashing splits and coalesces buckets as database size changes. -This imposes some performance overhead, but space efficiency is maintained. -As reorganization is on one bucket at a time, overhead is acceptably low. *How does it work?* -We choose a hash function that is uniform and random that generates values over a relatively large range. -Range is b-bit binary integers (typically b=32). --232 is over 4 billion, so we don't generate that many buckets! --Instead we create buckets on demand, and do not use all b bits of the hash initially. --The i bits are used as an offset into a table of bucket addresses. --Value of i grows and shrinks with the database.

What is the Index Structure?

-In order to allow fast random access, an index structure may be used. -The idea is to have a separate file of ordered search keys with pointers to the corresponding records in the data file. -Each index entry is just the search key and a pointer. -We still have to find our desired entry in the index file. -This index file is smaller and faster to search (and it is ordered!). -Actual relation may be composed of very long records.

What is Static Hashing?

-Index schemes force us to traverse an index structure. -Hashing avoids this. -Hashing involves computing the address of a data item by computing a function on the search key value. -A hash function h is a function from the set of all search key values K to the set of all bucket addresses B. -We choose a number of buckets to correspond to the number of search key values we will have stored in the database. -To perform a lookup on a search key value Ki, we compute h(Ki), and search the bucket with that address. -If two search keys i and j map to the same address, because h(Ki) = h(Kj), then the bucket at the address obtained will contain records with both search key values. --We call this a collision. -In this case we will have to check the search key value of every record in the bucket to get the ones we want. -A good hash function gives an average-case lookup that is a small constant, independent of the number of search keys.

What are the Advantages and Disadvantage of Extended Hashing

Advantages: -Extendable hashing provides performance that does not degrade as the file grows. -Minimal space overhead - no buckets need be reserved for future use. Bucket address table only contains one pointer for each hash value of current prefix length. Disadvantages: -Extra level of indirection in the bucket address table -Added complexity


Related study sets

Practice Exam incorrect answers/ review

View Set

Nervų sistema, endokrininės liaukos

View Set

Film and Culture (5th Ed) Quiz 3

View Set

RSHW Englisch - South Africa (Nelson Mandela)

View Set

Chapter 4 - Cultural Anthropology

View Set