C949 Hash tables (Data Structure)

Ace your homework & exams now with Quizwiz!

Double hashing: Formula for insertion

(h1(Key) + i * h2(Key)) mod (table Size) o starting with i = 0 o repeatedly search hash table buckets until an empty bucket is found

Techniques are used to handle collisions during insertions

- Chaining - Open addressing - Linear probing - Quadratic probing - Double Hashing

Common Hash functions

- Good hash function - Perfect hash function - Modulo hash function

Chaining: Searching

1. First determines the bucket 2. Search's the bucket's list

Linear probing: Removal Steps

1. The remove algorithm uses the sought item's key to determine the initial bucket, probing buckets to find a matching item. 2. If the matching item is found, the item is removed, and the bucket is marked empty-after-removal. 3. Remove algorithm probes each bucket until either the matching item or an empty-since-start bucket is found. 4. If the matching item is found, the bucket is marked empty-after-removal.

Linear probing: Searching Steps

1. The search algorithm uses the sought item's key to determine the initial bucket, and then linearly probes each bucket until a matching item is found. 2. If search reaches the last bucket without finding a matching item or empty-since-start bucket, the search continues at bucket 0. 3. - If an empty-after-removal bucket is encountered, the algorithm continues to probe the next bucket. - If an empty-since-start bucket is encountered, the search algorithm returns null.

Chaining: Insert operation

1. Uses the item's key to determine the bucket 2. Inserts the item in the bucket's list

Linear Probing: Insert Steps

1. Uses the item's key to determine the initial bucket. 2 . Linearly probes (or checks) each bucket until an empty bucket is found. 3. Item is inserted into the next empty bucket. 4. If probing reaches the last bucket w/o finding an empty bucket, the probing continues at bucket 0. 5. Linearly probes each bucket until an empty bucket is found.

HashInsert(hashTable, item)

1. Uses the item's key to determine the mapped bucket 2. Then inserts the item in that bucket's list.

HashRemove(hashTable, item)

1. Uses the item's key to determine the mapped bucket 2. Then removes the item in that bucket's list.

HashSearch(hashTable, key)

1. Uses the item's key to determine the mapped bucket 2. Then searches for the item in that bucket's list. - Returns null if not found

Quadratic probing: Insert

Can insert into empty-since-start and empty-after-removal buckets. - Whichever bucket is encountered first in the probing sequence will be used for the insertion. 3)

Double hashing: Searching

Checks each bucket using the probing sequence defined by the two hash functions until o matching item is found o empty-since-start bucket is found (null) o buckets are probed without a match (null)

Chaining

Collision resolution technique that uses a list for each bucket → each list may store multiple items - Insert, searching, and removing operations Ex: Bucket 5's list would become 55, 75

Open addressing

Collision resolution technique where collisions are resolved by looking for an empty bucket elsewhere in the table (so 75 might be stored in bucket 6)

hash function

Computes a bucket index from the item's key. - good hash function will distribute items into different buckets.

Hash Table

Data structure that stores unordered items by mapping (or hashing) each item to a location in an array (or vector). Ex: Given an array with indices 0..9 to store integers from 0..500, the modulo (remainder) operator can be used to map 25 to index 5 (25 % 10 = 5), and 149 to index 9 (149 % 10 = 9). - key - bucket - Hash function

bucket

Each hash table array element Ex: 100 element hash table has 100 buckets

Double hashing: Removal

First searches for the item's key → If found, removed

Linear probing

Handles a collision by 1. Starting at the key's mapped bucket (hashed location) 2. Then linearly searches subsequent buckets until an empty bucket is found.

Quadratic probing: To determine the item's index in the hash table:

If an item's mapped bucket is H (H + c1 * i + c2 * i2 ) mod (tableSize) o Each time an empty bucket is not found, i is incremented by 1

probing sequence

Iterating through sequential i values to obtain the desired table index

Mid-square hash function

N = number of buckets o Sequence: squares the key → extracts R (upper(log N)) digits from the result's middle → returns the remainder of: middleDigits / N

Well-designed has table: Searching requires runtime complexity of

O(1)

Collision

Occurs when an item being inserted into a hash table maps to the same bucket as an existing item in the hash table Ex: For a hash function of key % 10, 55 would be inserted in bucket 55 % 10 = 5; later inserting 75 would yield a collision because 75 % 10 is also 5

Double hashing

Open-addressing collision resolution technique that uses 2 different hash functions to compute bucket indices

Linear probing: Search algorithm (used for removal)

Probes each bucket until: o Matching item is found o Empty-since-start bucket is found o All buckets have been probed

Double hashing: Insert

Probes each bucket using the probing sequence → inserts the item in the next empty bucket (the empty kind doesn't matter).

Multiplicative string hash function

Repeatedly multiplies the hash value and adds the ASCII (or Unicode) value of each character in the string. o Function returns the remainder of: sum / N

Quadratic probing: Removal

Searches for the key to remove o If found, marks bucket as empty-after-removal

Quadratic probing

Starts at key's mapped bucket, and then quadratically searches subsequent buckets → until an empty bucket is found - probing sequence

Empty bucket types with Linear Probing

The distinction will be important during searches → searching only stops for empty-since-start, not for empty-after-removal - empty-since-start - empty-after-removal

Mid-square hash function base 2 implementation

The mid-square hash function is typically implemented using binary (base 2) →faster o Sequence: extracts middle R bits → returns the remainder of: middleBits / N o R >= upper(log2N)

Quadratic probing: Searching

Uses probing sequence until: o key is found o empty-since-start bucket is found

Linear Probing: Insert algorithm

Uses the key to determine the initial bucket - Linearly probes (or checks) each bucket - Inserts the item in the next empty bucket (the empty kind doesn't matter) If the probing reaches the last bucket, the probing continues at bucket 0. - Returns true if the item was inserted - Returns false if all buckets are occupied.

Modulo hash function

Uses the remainder from division of the key by hash table size N.

Linear probing: Removal

Uses the sought item's key to determine the initial bucket. - Algorithm probes each bucket until either a matching item is found, an empty-since-start bucket is found, or all buckets have been probed. - If the item is found, the item is removed, and the bucket is marked empty-after-removal.

Linear probing: Searching

Uses the sought item's key to determine the initial bucket. - Algorithm probes each bucket until either the matching item is found (returning the item), an empty-since-start bucket is found (returning null), or all buckets are probed without a match (returning null). - If an empty-after-removal bucket is found, the search algorithm continues to probe the next bucket.

A hash table's operations of insert, remove, and search each use the hash function to determine

an item's bucket. Ex: Inserting 113 first determines the bucket to be 113 % 10 = 3.

empty-after-removal

bucket had an item removed that caused the bucket to now be empty

empty-since-start

bucket has been empty since the hash table was created

A good hash function will distribute items into different

buckets.

modulo operator %

computes the integer remainder when dividing two numbers. Ex: For a 20 element hash table, a hash function of key % 20 will map keys to bucket indices 0 to 19.

Approach for a hash table algorithm determining whether a cell

depends on the implementation. - For example, if items are simply non-negative integers, empty can be represented as -1. - More commonly, items are each an object with multiple fields (name, age, etc.), in which case each hash table array element may be a pointer. - Using pointers, empty can be represented as null.

Hash tables support

fast search, insert, and remove.

Hash tables provide

fast search, using as few as one comparison.

A hash function's performance depends on the

hash table size and knowledge of the expected keys

When a hash table is initialized, all entries must be

initialized to empty-since-start.

Perfect hash function

maps items to buckets with no collisions o runtime for insert, search, and remove is O(1) o worst-case may require O(N).

Good hash function

minimizes collisions → faster hash table o Uniformly distribute items into buckets.

Common hash function uses the

modulo operator %

A modulo hash function will map

num_keys / num_buckets

Quadratic probing: c1 and c2

programmer-defined constants for quadratic probing

Hash table's main advantage

searching, inserting, or removing an item may require only O(1) - Contrast to O(N) for searching a list or to O(log N) for binary search.

key

the value used to map to an index - maps an item's key to the bucket index. - ideally unique

For all items that might possibly be stored in the hash table, every key is ideally

unique; So that the hash table's algorithms can search for a specific item by that key.


Related study sets

Chapter 11: Pricing Products and Services

View Set

econ test 2 homeworks and tophat

View Set

The Eternal Family: Lesson 9 - my answers

View Set

Chapter 7 Water and Electrolytes

View Set

Maternity hesi prepared by me from evolve

View Set