C949 Hash tables (Data Structure)
Double hashing: Formula for insertion
(h1(Key) + i * h2(Key)) mod (table Size) o starting with i = 0 o repeatedly search hash table buckets until an empty bucket is found
Techniques are used to handle collisions during insertions
- Chaining - Open addressing - Linear probing - Quadratic probing - Double Hashing
Common Hash functions
- Good hash function - Perfect hash function - Modulo hash function
Chaining: Searching
1. First determines the bucket 2. Search's the bucket's list
Linear probing: Removal Steps
1. The remove algorithm uses the sought item's key to determine the initial bucket, probing buckets to find a matching item. 2. If the matching item is found, the item is removed, and the bucket is marked empty-after-removal. 3. Remove algorithm probes each bucket until either the matching item or an empty-since-start bucket is found. 4. If the matching item is found, the bucket is marked empty-after-removal.
Linear probing: Searching Steps
1. The search algorithm uses the sought item's key to determine the initial bucket, and then linearly probes each bucket until a matching item is found. 2. If search reaches the last bucket without finding a matching item or empty-since-start bucket, the search continues at bucket 0. 3. - If an empty-after-removal bucket is encountered, the algorithm continues to probe the next bucket. - If an empty-since-start bucket is encountered, the search algorithm returns null.
Chaining: Insert operation
1. Uses the item's key to determine the bucket 2. Inserts the item in the bucket's list
Linear Probing: Insert Steps
1. Uses the item's key to determine the initial bucket. 2 . Linearly probes (or checks) each bucket until an empty bucket is found. 3. Item is inserted into the next empty bucket. 4. If probing reaches the last bucket w/o finding an empty bucket, the probing continues at bucket 0. 5. Linearly probes each bucket until an empty bucket is found.
HashInsert(hashTable, item)
1. Uses the item's key to determine the mapped bucket 2. Then inserts the item in that bucket's list.
HashRemove(hashTable, item)
1. Uses the item's key to determine the mapped bucket 2. Then removes the item in that bucket's list.
HashSearch(hashTable, key)
1. Uses the item's key to determine the mapped bucket 2. Then searches for the item in that bucket's list. - Returns null if not found
Quadratic probing: Insert
Can insert into empty-since-start and empty-after-removal buckets. - Whichever bucket is encountered first in the probing sequence will be used for the insertion. 3)
Double hashing: Searching
Checks each bucket using the probing sequence defined by the two hash functions until o matching item is found o empty-since-start bucket is found (null) o buckets are probed without a match (null)
Chaining
Collision resolution technique that uses a list for each bucket → each list may store multiple items - Insert, searching, and removing operations Ex: Bucket 5's list would become 55, 75
Open addressing
Collision resolution technique where collisions are resolved by looking for an empty bucket elsewhere in the table (so 75 might be stored in bucket 6)
hash function
Computes a bucket index from the item's key. - good hash function will distribute items into different buckets.
Hash Table
Data structure that stores unordered items by mapping (or hashing) each item to a location in an array (or vector). Ex: Given an array with indices 0..9 to store integers from 0..500, the modulo (remainder) operator can be used to map 25 to index 5 (25 % 10 = 5), and 149 to index 9 (149 % 10 = 9). - key - bucket - Hash function
bucket
Each hash table array element Ex: 100 element hash table has 100 buckets
Double hashing: Removal
First searches for the item's key → If found, removed
Linear probing
Handles a collision by 1. Starting at the key's mapped bucket (hashed location) 2. Then linearly searches subsequent buckets until an empty bucket is found.
Quadratic probing: To determine the item's index in the hash table:
If an item's mapped bucket is H (H + c1 * i + c2 * i2 ) mod (tableSize) o Each time an empty bucket is not found, i is incremented by 1
probing sequence
Iterating through sequential i values to obtain the desired table index
Mid-square hash function
N = number of buckets o Sequence: squares the key → extracts R (upper(log N)) digits from the result's middle → returns the remainder of: middleDigits / N
Well-designed has table: Searching requires runtime complexity of
O(1)
Collision
Occurs when an item being inserted into a hash table maps to the same bucket as an existing item in the hash table Ex: For a hash function of key % 10, 55 would be inserted in bucket 55 % 10 = 5; later inserting 75 would yield a collision because 75 % 10 is also 5
Double hashing
Open-addressing collision resolution technique that uses 2 different hash functions to compute bucket indices
Linear probing: Search algorithm (used for removal)
Probes each bucket until: o Matching item is found o Empty-since-start bucket is found o All buckets have been probed
Double hashing: Insert
Probes each bucket using the probing sequence → inserts the item in the next empty bucket (the empty kind doesn't matter).
Multiplicative string hash function
Repeatedly multiplies the hash value and adds the ASCII (or Unicode) value of each character in the string. o Function returns the remainder of: sum / N
Quadratic probing: Removal
Searches for the key to remove o If found, marks bucket as empty-after-removal
Quadratic probing
Starts at key's mapped bucket, and then quadratically searches subsequent buckets → until an empty bucket is found - probing sequence
Empty bucket types with Linear Probing
The distinction will be important during searches → searching only stops for empty-since-start, not for empty-after-removal - empty-since-start - empty-after-removal
Mid-square hash function base 2 implementation
The mid-square hash function is typically implemented using binary (base 2) →faster o Sequence: extracts middle R bits → returns the remainder of: middleBits / N o R >= upper(log2N)
Quadratic probing: Searching
Uses probing sequence until: o key is found o empty-since-start bucket is found
Linear Probing: Insert algorithm
Uses the key to determine the initial bucket - Linearly probes (or checks) each bucket - Inserts the item in the next empty bucket (the empty kind doesn't matter) If the probing reaches the last bucket, the probing continues at bucket 0. - Returns true if the item was inserted - Returns false if all buckets are occupied.
Modulo hash function
Uses the remainder from division of the key by hash table size N.
Linear probing: Removal
Uses the sought item's key to determine the initial bucket. - Algorithm probes each bucket until either a matching item is found, an empty-since-start bucket is found, or all buckets have been probed. - If the item is found, the item is removed, and the bucket is marked empty-after-removal.
Linear probing: Searching
Uses the sought item's key to determine the initial bucket. - Algorithm probes each bucket until either the matching item is found (returning the item), an empty-since-start bucket is found (returning null), or all buckets are probed without a match (returning null). - If an empty-after-removal bucket is found, the search algorithm continues to probe the next bucket.
A hash table's operations of insert, remove, and search each use the hash function to determine
an item's bucket. Ex: Inserting 113 first determines the bucket to be 113 % 10 = 3.
empty-after-removal
bucket had an item removed that caused the bucket to now be empty
empty-since-start
bucket has been empty since the hash table was created
A good hash function will distribute items into different
buckets.
modulo operator %
computes the integer remainder when dividing two numbers. Ex: For a 20 element hash table, a hash function of key % 20 will map keys to bucket indices 0 to 19.
Approach for a hash table algorithm determining whether a cell
depends on the implementation. - For example, if items are simply non-negative integers, empty can be represented as -1. - More commonly, items are each an object with multiple fields (name, age, etc.), in which case each hash table array element may be a pointer. - Using pointers, empty can be represented as null.
Hash tables support
fast search, insert, and remove.
Hash tables provide
fast search, using as few as one comparison.
A hash function's performance depends on the
hash table size and knowledge of the expected keys
When a hash table is initialized, all entries must be
initialized to empty-since-start.
Perfect hash function
maps items to buckets with no collisions o runtime for insert, search, and remove is O(1) o worst-case may require O(N).
Good hash function
minimizes collisions → faster hash table o Uniformly distribute items into buckets.
Common hash function uses the
modulo operator %
A modulo hash function will map
num_keys / num_buckets
Quadratic probing: c1 and c2
programmer-defined constants for quadratic probing
Hash table's main advantage
searching, inserting, or removing an item may require only O(1) - Contrast to O(N) for searching a list or to O(log N) for binary search.
key
the value used to map to an index - maps an item's key to the bucket index. - ideally unique
For all items that might possibly be stored in the hash table, every key is ideally
unique; So that the hash table's algorithms can search for a specific item by that key.