Data Structures - Sets, Maps, and Hash Tables
set
A collection that contains no duplicate elements and at most one NULL element Operations on Sets: -Testing for membership -Adding elements -Removing elements -Union AUB -Intersection A∩B -Difference A-B -Subset A⊂B Required Methods: -Testing set membership (find) -Testing for an empty set (empty) -Determining set size (size) -Creating an iterator over the set (begin, end) -Adding an element (insert) -Removing an element (erase) -Set union, intersection, or difference member functions are defined in the algorithm header
Chaining
Alternative to open addressing Each table element references a linked list that contains all of the items that hash to the same table index Advantages relative to Open Addressing: Only items that have same value for their hash codes are examined when looking for an object You can store more elements in the table than the number of table slots (indices) Once you determine an item is not present, you can insert it at the beginning or end of the list To remove an item, you simply delete it; you do not need to replace it with a dummy item or mark it as deleted
STL pair
Defined in header <utility> Simple grouping of two values of different types Members named first and second Pairs used as return type from functions that need to return two values, such as set::insert function, and as the element type for maps Declared as struct
map
Facilitates efficient search and retrieval of entries that consist of pairs of objects -The first object in the pair is the key (must be unique) -The second object is the information associated with that key Onto mapping: All elements of values have a corresponding member in keys A map is a template class that takes the following template parameters: -Key_Type: The type of keys contained in the key set -Value_Type: The type of the values in the value set -Compare: A function class that determines the ordering of the keys; by default this is the less-than operator -Allocator: The memory allocator for key objects; we will use the library-supplied default
multimaps
Like the multiset, the multimap removes the restriction that the keys are unique The subscript operator is not defined for the multimap Instead, lower_bound and upper_bound must be used to obtain a range of iterators that reference the values mapped to a given key
Open Addressing
Linear probing can be used to access an item in a hash table -If the index calculated for an item's key is occupied by an item with that key, we have found the item -If that element contains an item with a different key, increment the index by one -Keep incrementing until you find the key or a NULL entry (assuming the table is not full) Deleting an Item Using Open Addressing -When an item is deleted, you can't set its table entry to null -If we search for an item that may have collided with the deleted item, we may conclude incorrectly that it is not in the table -Instead, store a dummy value or mark the location as available, but previously occupied -Deleted items reduce search efficiency which is partially mitigated if they are marked as available -You can't replace a deleted item with a new item until you verify that the new item is not in the table Quadratic probing can reduce the effect of clustering -Increments form a quadratic series (1 + 2^2 + 3^2+...) The disadvantage of quadratic probing is that the next index calculation is time-consuming, involving multiplication, addition, and modulo division A more efficient way to calculate the next index is: - k+=2; - index = (index + k) % table.size(); A more serious problem is that not all table elements are examined when looking for an insertion index; this may mean that -an item can't be inserted even when the table is not full -the program will get stuck in an infinite loop searching for an empty slot
Associative Containers
Not indexed Do not reveal the order of insertion of items Enable efficient search and retrieval of information Allow removal of elements without moving other elements around Includes set and map containers
multisets
Same as the set except that it does not impose the requirement that the items be unique Insert function always inserts a new item, and duplicate items are retained However, the erase function removes all occurrences of the specified item because there may be duplicates lower_bound and upper_bound can be used to select the group of entries that match a desired value (return iterators)
Hash Coding
The basis of hashing is to transform the item's key value into an integer value which is then transformed into a table index The number of possible key values is much larger than the table size Generating good hash codes typically is an experimental process The goal is a random distribution of values A good hash function should be relatively simple and efficient to compute
Hash Tables
The goal of a hash table is to be able to access an entry based on its key value, not its location To Reduce collisions use a prime number for the size of the table Reduce Collisions by expanding the table size -Allocate a new hash table with twice the capacity of the original -Reinsert each old entry that has not been deleted into the new hash table -Reference the new table instead of the original
Load Factor
The number of filled cells divided by the table size Has the greatest effect on hash table performance The lower the load factor, the better the performance as there is a smaller chance of collision when a table is sparsely populated If there are no collisions, performance for search and retrieval is O(1) regardless of table size
Performance of Open Addressing vs. Chaining
Using chaining, if an item is in the table, on average we must examine the table element corresponding to the item's hash code and then half of the items in each list c = 1 + L/2 where L is the average number of items in a list (# items / table size)