Ch. 5 Hashing
hashing
- retrieves value using index obtained from the key without performing a search - technique used for performing insertions, deletions and searches in constant average time - term to frequently describe the implementation of hash tables
Traits of an ideal hash function
- should distribute keys evenly among the indexes - maps each search key to a different index in hash table (known as perfect hash function)
What is a good hash function where the keys are String objects?
/** 2 * A hash routine for String objects. 3 * @param key the String to hash. 4 * @param tableSize the size of the hash table. 5 * @return the hash value. 6 */ 7 public static int hash( String key, int tableSize ) 8 { 9 int hashVal = 0; 10 11 for( int i = 0; i < key.length( ); i++ ) 12 hashVal = 37 * hashVal + key.charAt( i ); 13 14 hashVal %= tableSize; 15 if( hashVal < 0 ) 16 hashVal += tableSize; 17 18 return hashVal; 19 }
What are the methods of dealing with a collision?
1) separate chaining 2) open addressing
What is a collision?
During insertion of an item, the items key hashes to the same value of an already-inserted item
What are optimal load factors for separate chaining and probing, respectively?
For separate chaining, you want load factors to be close to 1 (although performance does not go down unless it becomes very large) For probing, load factor should not exceed 0.5. For linear probing, performance degenerates rapidly as load factor approaches 1.
What is separate chaining?
Inside each bucket of hash table, you keep a LinkedList of all elements that hash to the same value
How do you decide the table size of the hash table (using chaining for collision mgmt)?
Make table size about as large as number of elements expected AND a prime #
What is the general rule concerning tableSize when using "separate chaining hashing?"
Make tableSize just about as large as number of items expected
How can you take care of degenerating performance due to an increasing load factor?
Rehashing can be implemented to allow the table to grow ( and shrink), thus maintaining a reasonable load factor.
What happens when you call the rehashing function once the load factor exceeds a certain threshold?
You increase the hash table size and rehash all the entries in the map to a larger hash table
hash table
an array of some fixed size, where items are mapped into the "buckets," or cells, of the array via a hashing function
How do you perform an "insert" in a SeparateChainingHashTable?
first check the appropriate list to see whether item is already present. it item isn't already contained in list, insert it at the front of the list.
What does hashing use to map a key to an index?
hash function
Why is it a good idea to keep the tableSize prime?
it helps to ensure a good distribution among the items hashed
Why is hashing so advantageous?
it is superefficient, taking O(1) time to search, insert and delete an element
hash function
maps a key to an index
What is a reasonable strategy for a hash function when the input keys are integers?
return Key mod TableSize
How do you perform a search in a SeparateChainingHashTable?
use hash function to determine which list to traverse, then search the list.
What is the avg length of each list stored in the indices of the hash tables array?
λ
What is the load factor, or λ, of a hash table?
λ = ratio of number of items in hash table to the table size